AVM Testing Archives

Using Appraised Values vs. Arm’s-Length Transactions for Testing AVMs

Testing AVMs is a complex challenge, and when faced with dificult problems, creative solutions often arise. One such solution that has gained attention is using appraised values as benchmarks for AVM testing. The reasoning seems straightforward: both an AVM and an appraisal aim to estimate the market value of a property. So, in cases where a true market value from an arm’s-length transaction isn’t available, appraised values may seem like the next best option.

Appraisals are frequently conducted for reﬁnances, home equity loans, and other non-sales transactions, meaning many properties that don’t transact on the open market still have appraised values available. This availability might seem advantageous, and one might reason that if an AVM—designed to be quicker and cheaper—can match these appraisal estimates, it would be sufficient for testing.

On the surface, this reasoning seems sound: why not test AVMs against appraisals if they’re both providing value estimates? However, this approach is based on ﬂawed assumptions, and the problems with using appraised values as benchmarks for AVM testing begin at the fundamental level.

Appraised Values as Benchmarks

Inconsistency: Appraisals are conducted using structured processes with standardized guidelines, but the signiﬁcant variability in the types of appraisals (e.g., desktop, drive-by, hybrid, interior inspections) and their purposes (e.g., non-mortgage lending, relocation, litigation, HELOC, reﬁnancing) undermines consistency. This inconsistency makes appraised values less reliable as they may not fully reﬂect “Market Value,” potentially violating the uniformity required in AVM benchmark testing.
Appraisal Bias Study (quantiﬁable documented, systemic biases): Evidence shows that appraised values do not exhibit a normal distribution around actual sales prices, suggesting inherent bias*. In particular, appraisers often adjust values upwards to align with expected sales prices to avoid friction with clients. This artiﬁcial inflation undermines the accuracy of appraised values as benchmarks, illustrating that they may lead to inaccurate AVM testing.
Appraisal-Derived Data: Appraisal data has inherent limitations due to its sourcing methods, geographic constraints, and the relatively small number of observations. Additionally, it often focuses on speciﬁc assignment types and relies on limited-scope appraisals—such as desktop, drive-by, or hybrid data collection—which may not accurately capture the true market value of a property. This variability, coupled with the subjective nature of appraisals, makes them unreliable and impractical for AVM testing. Their limited availability by source and geography and potential for imprecise estimates further reduce their efectiveness as a comprehensive benchmark.
Subjectivity (individual judgment and professional biases): Despite adherence to guidelines, appraisals are subject to the judgment of individual appraisers, introducing bias**. Research supports that appraisals, particularly in certain residential contexts, are performed not to independently assess market value, but to justify predetermined loan amounts. This subjectivity compromises their reliability as benchmarks for AVM testing.
Measuring Error: Using appraised values introduces compounded errors. Testing an AVM against an estimate (appraised values) rather than the true standard (actual market transactions) results in a conﬂation of two errors—those of the AVM and those inherent in the appraisal. This distorts the AVM’s accuracy, as any bias or error in the appraisal will be reﬂected in the overall AVM Testing results.
Lag in Market Response: Appraisals are often “rear-looking,” based on historical closed sales, and thus do not capture rapid market ﬂuctuations. This is particularly problematic in volatile markets where real estate prices can change quickly. The lag in appraisals reﬂecting current market conditions can further distort AVM testing when outdated data is used.

Arm’s-Length Transactions as Benchmarks

Reﬂects True Market Value: Arm’s-length transactions occur between unrelated, independent parties, ensuring that both buyer and seller act in their best interests. As a result, these transactions accurately reﬂect the true market value of the property, making them the most reliable benchmarks for AVM testing.
Up-to-Date Information: Arm’s-length sales represent current market conditions, providing real-time data that ensures AVMs are tested against accurate, relevant information. This minimizes the risk of outdated or inaccurate benchmarks, improving the reliability of the test results.
Data Availability: Arm’s-length transactions are abundant, with approximately 4-6 million sales recorded annually across the U.S. This wealth of comparable sales data enables AVM testing in a variety of markets, including less active markets. The breadth of this data makes it a more practical and reliable option for establishing benchmarks.

Conclusion:

The use of appraised values as benchmarks for AVM testing is inherently flawed due to inconsistency, subjectivity, and the lag in reflecting current market trends. These limitations introduce bias, compounded error, and misrepresentation of accuracy in the testing process. This approach can lead to skewed and unreliable results, potentially violating new AVM quality control standards that emphasize the need for market-reflective testing.

In contrast, arm’s-length transactions provide the most reliable benchmarks, as they reflect true market values at the time of sale, accounting for factors like property exposure and typical days on market. To ensure objectivity and compliance with quality standards, AVM testing must prioritize arm’s-length transactions and ensure that the AVMs are blind to the most recent sale and listing prices***. This safeguards the integrity of the results by eliminating any recent influence of pre-existing market values or pricing.

Ultimately, appraised values should be avoided as benchmarks wherever possible, recognizing the inherent methodological limitations and risks to accuracy in testing results. For rigorous AVM validation, the reliance on arm’s-length transactions ensures more reliable, market-aligned outcomes.

*Systemic Risks in Residential Property Valuations: Perceptions and Reality, CATC June 2005 Page 13, “Full Appraisal Bias- Purchase Transactions”

**Other research corroborates the notion that certain residential appraisals are performed NOT to establish an independent and objective market value estimate, but to justify a loan amount. (Ferguson, J.T., After-Sale Evaluations: Appraisals or Justiﬁcations? Journal of Real Estate Research, 1988, 3, 19-26).

*** Current AVM Testing Methodologies versus our new Predictive Testing Methodology (PTM™) OR How Listing Prices Made Current AVM Testing Obsolete and How to Fix It (AVMNews September 2024)

AVM Testing Schedule Announced for 2025

In our menu above, under “AVM Information” we always have the latest version of our testing schedule. 2025 AVM Validation Testing Dates have been published there as of today.

Study: AVMetrics’ Predictive Testing Methodology

This content is restricted to subscribers

Introducing PTM™ – Revolutionizing AVM Testing for Accurate Property Valuations

When it comes to residential property valuation, Automated Valuation Models (AVMs) have a lurking problem. AVM testing is broken and has been for some time, which means that we don’t really know how much we can or should rely on AVMs for accurate valuations.

Testing AVMs seems straightforward: take the AVM’s estimate and compare it to an arm’s length market transaction. The approach is theoretically sound and widely agreed upon but unfortunately no longer possible.

Once you see the problem, you cannot unsee it. The issue lies in the fact that most, if not all, AVMs have access to multiple listing data, including property listing prices. Studies have shown that many AVMs anchor their predictions to these listing prices. While this makes them more accurate when they have listing data, it casts serious doubt on their ability to accurately assess property values in the absence of that information.

Three months of data showing estimates by three AVMs for a single property in Austin, TX. — Three AVMs valuing a home before and after it was listed in the MLS from Realtor.com’s RealEstimate^SM.

All this opens up the question: what do we want to use AVMs for? If all we want is to get a good estimate of what price a sale will close at, once we know the listing price, then they are great. However, if the idea is to get an objective estimate of the property’s likely market value to refinance a mortgage or to calculate equity or to measure default risk, then they are… well, it’s hard to say. Current testing methodology can’t determine how accurate they are.

But there is promise on the horizon. After five years of meticulous development and collaboration with vendors/models, AVMetrics is proud to unveil our game-changing Predictive Testing Methodology (PTM™), designed specifically to circumvent the problem that is invalidating all current testing. AVMetrics’ new approach will replace the current methods cluttering the landscape and finally provide a realistic view of AVMs’ predictive capabilities.¹

At the heart of PTM™ lies our extensive Model Repository Database (MRD™), housing predictions from every participating AVM for every residential property in the United States – an astonishing 100 to 120 million properties per AVM. With monthly refreshes, this database houses more than a billion records per model and thereby offers unparalleled insights into AVM performance over time.

But tracking historical estimates at massive scale wasn’t enough. To address the influence of listing prices on AVM predictions, we’ve integrated a national MLS database into our methodology. By pinpointing the moment when AVMs gained visibility into listing prices, we can assess predictions for sold properties just before this information influenced the models, which is the key to isolating confirmation bias. While the concept may seem straightforward, the execution is anything but. PTM™ navigates a complex web of factors to ensure a level playing field for all models involved, setting a new standard for AVM testing.

So, how do we restore confidence in AVMs? With PTM™, we’re enabling accurate AVM testing, which in turn paves the way for more accurate property valuations. Those, in turn, empower stakeholders to make informed decisions with confidence. Join us in revolutionizing AVM testing and moving into the future of improved property valuation accuracy. Together, we can unlock new possibilities and drive meaningful change in the industry.

¹The majority of the commercially available AVMs support this testing methodology, and there is over two solid years of testing that has been conducted for over 25 models.

AVM Testing Schedule Announced for 2024

In our menu above, under “AVM Information” we always have the latest version of our testing schedule. 2024 AVM Validation Testing Dates have been published there as of today.

Tag: AVM Testing