Home » Blog » Interagency Appraisal and Evaluation Guidelines

Tag: Interagency Appraisal and Evaluation Guidelines

How AVMetrics Tests AVMs

Testing an AVM’s accuracy can actually be quite tricky.  It is easy to get an AVM estimate of value, and you can certainly accept that a fair sale on the open market is the benchmark against which to compare the AVM estimate, but that is really just the starting point.

There are four keys to fair and effective AVM testing, and applying all four can be challenging for many organizations.

  1. Your raw data must be cleaned up, to ensure that there aren’t any “unusable” or “discrepant” characters in the data; differences such as “No.” “#” and “Num,” must be normalized.
  2. Once your test data is “scrubbed clean” it must be assembled in a universal format and it must be large enough to provide reliable test results, even down to the segment level for each property type within each price level within each county, etc. and this might require hundreds of thousands of records. 
  3. Timing must be managed so that each model receives the same sample data at the same time with the same response deadline.
  4. Last, and most difficult, the benchmark sales data must not be available to the models being tested.  In other words, if the model has access to the very recent sales price, it will be able to provide a near-perfect estimate by simply estimating that the value hasn’t changed (or changed very little) in the days or weeks since the sale. 

AVMetrics tests every commercially available AVM continuously and aggregates this testing into a report quarterly; AVMetrics’ testing process meets these criteria and many more, providing a truly objective measure of AVM performance. 

The process starts with the identification of an appropriate sample of properties for which benchmark values have very recently been established.  These are the actual sales prices for arm’s-length transactions between willing buyers and sellers—the best and most reliable indicator of market value.  To properly conduct a “blind” test, these benchmark values must be unavailable or “unknown” to the vendors testing their model(s).  AVMetrics provides in excess of a half million test records annually to AVM vendors (without information as to their benchmark values).  The AVM vendors receive the records simultaneously, run these properties through their model(s) and return the predicted value of each property within 48 hours, along with a number of other model-specific outputs.  These outputs are received by AVMetrics, where the results are evaluated against the benchmark values.  A number of controls are used to ensure fairness, including the following:

  • ensuring that each AVM vendor receives the exact same property list (so no model has any advantage)
  • ensuring that each AVM is given the exact same parameters (since many allow input parameters that can affect the final valuation)
  • ensuring through multiple checks that no model had access the recent sale data, which would provide an unfair advantage

In addition to quantitative testing, AVMetrics circulates a comprehensive vendor questionnaire twice annually.  Vendors that wish to participate in the testing process complete, for each model being tested, roughly 100 parameter, data, methodology, staffing and internal testing questions.  These enable AVMetrics, and more importantly our clients, to understand model differences within both testing and production contexts, and it enables us and our clients to satisfy certain regulatory requirements describing the evaluation and selection of models (see OCC 2010-42).

AVMetrics next performs a variety of statistical analyses on the results, breaking down each individual market, each price range, and each property type, and develops results which characterize each model’s success in terms of precision, usability and accuracy.  AVMetrics analyzes trends at the global, market and individual model levels, identifying where there are strengths and weaknesses, and improvements or declines in performance.

The last step in the process is for AVMetrics to provide an anonymized comprehensive comparative analysis for each model vendor, showing where their models stack up against all of the models in the test; this invaluable information facilitates the continuous improvement of each vendor’s model offerings.

Cascade vs Model Preference Table – What’s the Difference?

In the AVM world, there is a bit of confusion about what exactly is a “cascade.” It’s time to clear that up.  Over the years, the terms “cascade” and “Model Preference Table”TM have been used interchangeably, but at AVMetrics, we draw an important distinction that the industry would do well to adopt as a standard.

In the beginning, as AVM users contemplated which of several available models to use, they hit on the idea of starting with the preferred model, and if it failed to return a result, trying a second model, and then a third, etc.  This rather obvious sequential logic required a ranking, which was available from testing, and was designed to avoid “value shopping.”[1]  More sophisticated users ranked AVMs across many different niches, starting with geographical regions, typically counties.  Using a table, models were ranked across all regions, providing the necessary tool to allow a progression from primary AVM to secondary AVM and so on.

We use the term “Model Preference Table” for this straightforward ranking of AVMs, which can actually be fairly sophisticated if they are ranked within niches that include geography, property type and price range.

More sophisticated users realized that just because a model returned a value does not mean that they should use it.  Models typically deliver some form of confidence in the estimate, either in the form of a confidence score, reliability grade, a “forecasted standard deviation” (FSD) or similar measure derived through testing processes.  Based on these self-measuring outputs from the model, an AVM result can be accepted or rejected (based on testing results) in favor of the next AVM in the Model Preference Table.  This application reflects the merger of MPT rankings with decision logic, which in our terminology makes it a “cascade.”
MPT vs Cascade vs Custom Cascade

The final nuance is between a simple cascade and a “custom” cascade.  The former simply sets across-the-board risk/confidence limits and rejects value estimates when they fail to meet the standard.  For example, the builder of a simple cascade could choose to reject any value estimate with an FSD > 25%.  A “custom cascade” integrates the risk tolerances of the organization into the decision logic.  That might include lower FSD limits in certain regions or above certain property values, or it might reflect changing appetites for risk based on the application, e.g., HELOC lending decisions vs portfolio marketing applications.

We think that these terms represent significant differences that shouldn’t be ignored or conflated when discussing the application of AVMs.


Lee Kennedy, principal and founder of AVMetrics in 2005, has specialized in collateral valuation, AVM testing and related regulation for over three decades.  Over the years, AVMetrics has guided companies through regulatory challenges, helped them meet their AVM validation requirements, and commented on pending regulations. Lee is an author, speaker and expert witness on the testing and use of AVMs. Lee’s conviction is that independent, rigorous validation is the healthiest way to ensure that models serve their business purposes.

[1] OCC 2005-22 (and the 2010 Interagency Appraisal and Evaluation Guidelines) warn against “value shopping” by advising, “If several different valuation tools or AVMs are used for the same property, the institution should adhere to a policy for selecting the most reliable method, rather than the highest value.”