Using Appraised Values vs. Arm’s-Length Transactions for Testing AVMs

Testing AVMs is a complex challenge, and when faced with dificult problems, creative solutions often arise. One such solution that has gained attention is using appraised values as benchmarks for AVM testing. The reasoning seems straightforward: both an AVM and an appraisal aim to estimate the market value of a property. So, in cases where a true market value from an arm’s-length transaction isn’t available, appraised values may seem like the next best option.

Appraisals are frequently conducted for reﬁnances, home equity loans, and other non-sales transactions, meaning many properties that don’t transact on the open market still have appraised values available. This availability might seem advantageous, and one might reason that if an AVM—designed to be quicker and cheaper—can match these appraisal estimates, it would be sufficient for testing.

On the surface, this reasoning seems sound: why not test AVMs against appraisals if they’re both providing value estimates? However, this approach is based on ﬂawed assumptions, and the problems with using appraised values as benchmarks for AVM testing begin at the fundamental level.

Appraised Values as Benchmarks

Inconsistency: Appraisals are conducted using structured processes with standardized guidelines, but the signiﬁcant variability in the types of appraisals (e.g., desktop, drive-by, hybrid, interior inspections) and their purposes (e.g., non-mortgage lending, relocation, litigation, HELOC, reﬁnancing) undermines consistency. This inconsistency makes appraised values less reliable as they may not fully reﬂect “Market Value,” potentially violating the uniformity required in AVM benchmark testing.
Appraisal Bias Study (quantiﬁable documented, systemic biases): Evidence shows that appraised values do not exhibit a normal distribution around actual sales prices, suggesting inherent bias*. In particular, appraisers often adjust values upwards to align with expected sales prices to avoid friction with clients. This artiﬁcial inflation undermines the accuracy of appraised values as benchmarks, illustrating that they may lead to inaccurate AVM testing.
Appraisal-Derived Data: Appraisal data has inherent limitations due to its sourcing methods, geographic constraints, and the relatively small number of observations. Additionally, it often focuses on speciﬁc assignment types and relies on limited-scope appraisals—such as desktop, drive-by, or hybrid data collection—which may not accurately capture the true market value of a property. This variability, coupled with the subjective nature of appraisals, makes them unreliable and impractical for AVM testing. Their limited availability by source and geography and potential for imprecise estimates further reduce their efectiveness as a comprehensive benchmark.
Subjectivity (individual judgment and professional biases): Despite adherence to guidelines, appraisals are subject to the judgment of individual appraisers, introducing bias**. Research supports that appraisals, particularly in certain residential contexts, are performed not to independently assess market value, but to justify predetermined loan amounts. This subjectivity compromises their reliability as benchmarks for AVM testing.
Measuring Error: Using appraised values introduces compounded errors. Testing an AVM against an estimate (appraised values) rather than the true standard (actual market transactions) results in a conﬂation of two errors—those of the AVM and those inherent in the appraisal. This distorts the AVM’s accuracy, as any bias or error in the appraisal will be reﬂected in the overall AVM Testing results.
Lag in Market Response: Appraisals are often “rear-looking,” based on historical closed sales, and thus do not capture rapid market ﬂuctuations. This is particularly problematic in volatile markets where real estate prices can change quickly. The lag in appraisals reﬂecting current market conditions can further distort AVM testing when outdated data is used.

Arm’s-Length Transactions as Benchmarks

Reﬂects True Market Value: Arm’s-length transactions occur between unrelated, independent parties, ensuring that both buyer and seller act in their best interests. As a result, these transactions accurately reﬂect the true market value of the property, making them the most reliable benchmarks for AVM testing.
Up-to-Date Information: Arm’s-length sales represent current market conditions, providing real-time data that ensures AVMs are tested against accurate, relevant information. This minimizes the risk of outdated or inaccurate benchmarks, improving the reliability of the test results.
Data Availability: Arm’s-length transactions are abundant, with approximately 4-6 million sales recorded annually across the U.S. This wealth of comparable sales data enables AVM testing in a variety of markets, including less active markets. The breadth of this data makes it a more practical and reliable option for establishing benchmarks.

Conclusion:

The use of appraised values as benchmarks for AVM testing is inherently flawed due to inconsistency, subjectivity, and the lag in reflecting current market trends. These limitations introduce bias, compounded error, and misrepresentation of accuracy in the testing process. This approach can lead to skewed and unreliable results, potentially violating new AVM quality control standards that emphasize the need for market-reflective testing.

In contrast, arm’s-length transactions provide the most reliable benchmarks, as they reflect true market values at the time of sale, accounting for factors like property exposure and typical days on market. To ensure objectivity and compliance with quality standards, AVM testing must prioritize arm’s-length transactions and ensure that the AVMs are blind to the most recent sale and listing prices***. This safeguards the integrity of the results by eliminating any recent influence of pre-existing market values or pricing.

Ultimately, appraised values should be avoided as benchmarks wherever possible, recognizing the inherent methodological limitations and risks to accuracy in testing results. For rigorous AVM validation, the reliance on arm’s-length transactions ensures more reliable, market-aligned outcomes.

*Systemic Risks in Residential Property Valuations: Perceptions and Reality, CATC June 2005 Page 13, “Full Appraisal Bias- Purchase Transactions”

**Other research corroborates the notion that certain residential appraisals are performed NOT to establish an independent and objective market value estimate, but to justify a loan amount. (Ferguson, J.T., After-Sale Evaluations: Appraisals or Justiﬁcations? Journal of Real Estate Research, 1988, 3, 19-26).

*** Current AVM Testing Methodologies versus our new Predictive Testing Methodology (PTM™) OR How Listing Prices Made Current AVM Testing Obsolete and How to Fix It (AVMNews September 2024)