Q4’s update is remarkable for the amount of change in the map. Every quarter we analyze all the top AVMs and compile the results. This GIF shows the top AVM in each county for each quarter, and as it spools through the quarters, you can see where the top honors change hands.
The main point is how frequently AVM performance changes. That should be no surprise, since market conditions change, and AVM’s have different strengths and tendencies. Phoenix has more tract housing, and some AVMs are optimized for that. Cities in the northeast have more row housing, and some models are better there. But AVMs also change – a lot. Whole new models are introduced, but every model is constantly being improved as builders add new data feeds and use new techniques to get better results. (With respect to new techniques, over at the AVMNews, we curate articles about AVMs, and we highlight several dozen new research articles about AVMs every year.)
Q4 Change Highlights
As ever, if you watch a part of the map, you’ll see several changes. But, in Q4, with markets changing significantly as interest rates rose and then fell, we saw a real upending of the order. Here are some places to watch:
Most of the west coast changed from blue to the orange of Model B, except Orange County, ironically, which is tan for Model H.
Seattle and Portland changed from blue to the Model B orange.
Several upper Rocky Mountain states changed from pink to the green of Model K. (Visually it’s striking, but in terms of population, admittedly less important.)
Almost every county in Utah changed.
A lot of rural Texas changed from gray to the blue of Model A, so those guys took some territory back.
But, Model A also gave away leadership in Chicago and the surrounding counties, which went from blue to orange (Model B) or tan (Model H).
New York was completely shuffled. Surprisingly, the same changes held in NY City and upstate: counties changed from orange to blue (Model A got some more back), and those that were green or blue changed to orange or tan.
All the counties around Washington D.C. went from blue to orange (Model B wins again).
Just west of that, in West Virginia, everything changed from blue to the Kelly green of Model AA.
Things change – a lot. Don’t rely on the results from last year or earlier this year. Heck, you can’t even trust last quarter! We compile these results quarterly, but our testing is non-stop, and we can produce new optimizations monthly based on a rolling 3 months or any other time period. Often, 3 months’ of data are required to get a large enough sample in smaller regions, but we can slice it every way imaginable.
Use more than one AVM. It’s not obvious from a map showing just one AVM in each county, but if you think about what’s going on to produce these results, you’ll realize that AVMs have different strengths and there are a lot of them climbing all over each other to get to the top of the ranking. So, when you’re valuing a particular property, you just don’t know if it will be a good candidate for even the best AVM. When that AVM produces a result with low confidence, there’s a very good chance that another AVM will produce a reasonable estimate. Why not be able to take three bites at the apple?
We’ve got the update for Q3 2022. Our top AVM GIF shows the #1 AVM in each county going back 8 quarters. This graphic demonstrates why we never recommend using a single AVM. Again, there are 19 AVMs in the most recent quarter that are “tops” in at least one county!
The expert approach is to use a Model Preference Table® to identify the best AVM in each region. (Actually, our MPT® typically identifies the top 3 AVMs in each county.) Or, you could use a cascade to tap into the best AVM for whatever your application.
This time, the Seattle area and the Los Angeles region stayed light blue, just like the previous quarter. But, most of the populous counties in Northern California changed hands. Sacramento was the exception, but Santa Clara, Alameda, Contra Costa, San Mateo and some smaller counties like Calaveras (which means “skulls”) changed sweaters. Together they account for 6 million northern Californians who just got a new champion AVM.
A number of rural states changed hands almost completely… again. New Mexico, Wyoming, North Dakota, South Dakota, Montana and Nebraska as well as Arkansas, Mississippi, Alabama and rural Georgia crowned different champions for most counties. I could go on.
All that goes to show the importance of using multiple AVMs and getting intelligence on how accurate and precise each AVM is.
We’ve got the update for Q2 2022. Our top AVM GIF shows the #1 AVM in each county going back 8 quarters. This graphic demonstrates why we never recommend using a single AVM. There are 19 AVMs in the most recent quarter that are “tops” in at least one county (one more than in Q1)!
The expert approach is to use a Model Preference Table® to identify the best AVM in each region. (Actually, our MPT® typically identifies the top 3 AVMs in each county.)
One great example is the Seattle area. Over the last two years, you would need seven AVMs to cover the most populous 5 counties of the Seattle environs with the best AVM. What’s more, the King’s County champion AVM has included 3 different AVMs.
A number of rural states changed hands almost completely. New Mexico, Wyoming, North Dakota, South Dakota, Montana and Kansas crowned different champions for most counties.
All that goes to show the importance of using multiple AVMs and getting intelligence on how accurate and precise each AVM is.
The administration was encouraging more use of AVMs (e.g., via hybrids), and tempering that with calls for close monitoring of AVMs.
The de minimis threshold change foreshadowed an increase in reliance on AVMs in some lower value mortgages.
The Appraisal Subcommittee summit was focused on standardization across agencies and alternative valuation products, namely, AVMs. Conversation focused on quality and risk as well as speed.
We saw those trends pointing to increased AVM use balanced by a focus on risk, quality and efficiency.
Sure enough, the following events unfolded:
The de minimis threshold was indeed raised, right before the pandemic changed everything.
The appraisal business was turned upside down for a period during the pandemic.
Property Inspection Waivers (PIWs) took off in a big way as Fannie and Freddie skipped appraisals on a huge percentages of their originations (up to 40% at times).
Halt! About Face!
And then the new administration changed the focus entirely. No longer were the conversations about speed, efficiency, quality, risk and appraisers being focused on their highest and best use. Instead, conversations focused on bias.
Fannie produced a report on bias in appraisals. CFPB began moving on new AVM guidelines and proposed using the “fifth factor” to measure Fair Lending implications for AVMs. Congress held committee hearings on AVM bias.
The Task Force made specific recommendations, but first it helped educate regulators about the AVM industry.
One specific recommendation was to consider certification for AVMs. Another was to use the same USPAP framework for the oversight of AVMs as is used for the oversight of appraisals. It’s all laid out in the AVM Task Force Report.
Taking It All In
Our assessment three years ago was eerily accurate for the subsequent two years. Even the unexpected pandemic generally moved things in the direction that we were pointing to: increased use of AVMs through hybrids.
What we failed to anticipate back then was a complete change in direction with the new administration, and maybe that’s to be expected. It’s hard to see around the corner to a new administration, with new personnel, priorities and policy objectives.
The Task Force Report provides some very practical direction for regulations. But the recent emphasis on fair lending, which emerged after the Task Force began meeting and forming its recommendations, could influence the direction of things. The end result is a combination of more clarity and, at the same time, new uncertainty.
We’ve updated our Top AVM GIF showing the #1 AVM in each county going back 8 quarters. This graphic demonstrates why we never recommend using a single AVM. There are 18 AVMs in the most recent quarter that are “tops” in at least one county!
The expert approach is to use a Model Preference Table to identify the best AVM in each region. (Actually, our MPT® typically identifies the top 3 AVMs in each county.)
Take the Seattle area for example. Over the last two years, you would almost always need two or three AVMs to cover the most populous 5 counties of the Seattle environs with the best AVM. However, it’s not always the same two or three. There are four of them that cycle through the top spots.
Texas is dominated by either Model A, Model P or Model Q. But that domination is really just a reflection of the vast areas of sparsely inhabited counties. The densely populated counties in the triangle from Dallas south along I-35 to San Antonio and then east along I-10 to Houston cycle through different colors every quarter. The bottom line in Texas is that there’s no single model that is best in Texas for more than a quarter, and typically, it would require four or five models to cover the populous counties effectively.
Testing an AVM’s accuracy can actually be quite tricky. You might think that you simply compare an AVM valuation to a corresponding actual sales price – technically a fair sale on the open market – but that’s just the beginning. Here’s why it’s hard:
You need to get those matching values and benchmark sales in large quantities – like hundreds of thousands – if you want to cover the whole nation and be able to test different price ranges and property types (AVMetrics compiled close to 4 million valid benchmarks in 2021).
You need to scrub out foreclosure sales and other bad benchmarks.
And perhaps most difficult, you need to test the AVMs’ valuations BEFORE the corresponding benchmark sale is made public. If you don’t, then the AVM builders, whose business is up-to-date data, will incorporate that price information into their models and essentially invalidate the test. (You can’t really have a test where the subject knows the answer ahead of time.)
Here’s a secret about that third part: some of the AVM builders are also the same companies that are the premier providers of real estate data, including MLS data. What if the models are using MLS data listing price feeds to “anchor” their models based on the listing price of a home? If they are the source of the data, how can you test them before they get the data? We now know how.
We have spent years developing and implementing a solution because we wanted to level the playing field for every AVM builder and model. We ask each AVM to value every home in America each month. They each provide +/-110 million AVM valuations each month. There are over 25 different commercially available AVMs that we test regularly. That adds up to a lot of data.
A few years ago, it wouldn’t have been feasible to accumulate data at that scale. But now that computing and storage costs make it feasible, the AVM builders themselves are enthusiastic about it. They like the idea of a fair and square competition. We now have valuations for every property BEFORE it’s sold, and in fact, before it’s listed.
As we have for well over a decade now, we gather actual sales to use as the benchmarks against which to measure the accuracy of the AVMs. We scrub these actual sales prices to ensure that they are for arm’s-length transactions between willing buyers and sellers — the best and most reliable indicator of market value. Then we use proprietary algorithms to match benchmark values to the most recent usable AVM estimated value. Using our massive database, we ensure that each model has the same opportunity to predict the sales price of each benchmark.
AVMetrics next performs a variety of statistical analyses on the results, breaking down each individual market, each price range, and each property type, and develops results which characterize each model’s success in terms of precision, usability, error and accuracy. AVMetrics analyzes trends at the global, market and individual model levels. We also identify where there are strengths and weaknesses and where performance improved or declined.
In the spirit of continuous improvement, AVMetrics provides each model builder an anonymized comprehensive comparative analysis showing where their models stack up against all of the models in the test; this invaluable information facilitates their ongoing efforts to improve their models.
Finally, in addition to quantitative testing, AVMetrics circulates a comprehensive vendor questionnaire semi-annually. Vendors that wish to participate in the testing process answer roughly 100 parameter, data, methodology, staffing and internal testing questions for each model being tested. These enable AVMetrics and our clients to understand model differences within both testing and production contexts. The questionnaire also enables us and our clients to satisfy certain regulatory requirements describing the evaluation and selection of models (see OCC 2010-42 and 2011-12).
Appraisals are the gold standard when it comes to valuing residential real estate, but they aren’t always necessary. They’re expensive and time-consuming, and in the era of COVID-19, they’re inconvenient. What’s the alternative?
Well, Fannie and Freddie implemented a “Property Inspection Waiver” (PIW) alternative more than a decade ago. However, it’s been slow to catch on.
But now, maybe the tipping point has arrived during the pandemic. Recently published data by Fannie and Freddie show approximately 33% of properties were valued without a traditional appraisal! (Most, if not all, would have used an AVM as part of the appraisal waiver process.) Ed Pinto at AEI’s Housing Center calls it a hockey stick.
So, what changed? Here are some thoughts and hypotheses:
Guidelines changed a little. We can see in the data that Freddie did almost zero PIWs on cash out loans, but in May that changed, and at lease for LTVS below 70%, they did almost 15,000 cash out loans with no appraisal.
AVMs changed. Back when PIWs were introduced, AVMs operated in a +/- 10% paradigm. They were more concerned with hit rates than anything else, and they worked best on track homes. But, today they are operating in a +/- 4% world, hit rates are great, and cascades allow lenders to pick the AVM that’s most accurate for the application.
Borrowers changed. These days, borrowers have grown up with online tools that give them answers. They are more likely to read about their symptoms on WebMD before going to the doctor, and they are more likely to look their home up on Zillow before calling their realtor. In the past, if home was purchased with a low LTV, who was it that required an appraisal? Typically, it was borrowers that wanted the appraisal – more as a safety blanket than anything else. They wanted reassurance that they were not getting ripped off. Today, for some people, Zillow can provide that reassurance without the $500 expense.
Lenders changed. You would think that they are nimble and adaptable to new opportunities. But where the rubber meets the road, it’s still people talking to customers, and underwriters signing off on loans. If loan officers aren’t aware of the guidelines, they’ll just order an appraisal. Often ordering an appraisal, because it can take so long, is just about one of the first things done in the process, regardless of whether it’s necessary. After all, it’s usually necessary, and it takes SO long (relatively speaking, of course). I have known lenders who required their loan officers to collect money for an appraisal to demonstrate customer commitment. But, lenders are starting to incorporate PIWs into their processes and take advantage of those opportunities to present a loan option with $500 less in costs.
Accurate AVMs are a necessary but not sufficient criteria for PIWs, and now that AVMs are much more accurate, PIWs are much more practical, and we’re seeing much higher adoption.
So now what should we expect going forward? The trend will likely continue. There’s a lot of room left in some of those categories for PIWs to grab a larger share.
If agencies are doing it, everyone else will. If there are lenders not using PIWs to the extent possible, they are going to be at a disadvantage.
AVMs are not only fairly accurate, they are also affordable and easy to use. Unfortunately, using them in a “compliant” fashion is not as easy. Regulatory Bulletins OCC 2010-42 and OCC 2011-12 describe a lot of requirements that can be challenging for a regional or community institution:
ongoing independent testing and validation and documentation of testing;
understanding each AVM model’s conceptual and methodological soundness;
documenting policies and procedures that define how to use AVMs and when not to use AVMs;
establishing targets for accuracy and tolerances for acceptable discrepancies.
The extent to which these requirements are applied by your regulator is most likely proportional to the extent to which AVMs are used within your organization; if AVMs are used extensively, regulatory oversight will likely demand much tighter adherence to the requirements as well as much more comprehensive policies and procedures.
Although compliance itself is not a function that can be outsourced (it is the sole responsibility of the institution), elements of the regulatory requirements can be effectively handled outside the organization through outsourcing. As an example, the first bullet point, “ongoing independent testing and validation and documentation of testing,” requires resources with the competencies and influences to effectively challenge AVM models. In addition, the “independent” aspect is challenging to accomplish unless a separate department within the institution is established that does not report up through the product and/or procurement verticals (e.g. similar to Audit, or Model Risk Management, etc.). Whether your institution is a heavy AVM user or not, the good news is that finding the right third-party to outsource to will facilitate all of the bullet points above:
documentation is included as part of an independent testing and validation process and it can be incorporated into your policies and procedures;
the results of the testing will help you shape your understanding of where and when AVMs can and cannot be used;
the results of the testing will inform your decisions regarding the accuracy and performance thresholds that fit within your institution’s risk appetite. In addition,
an outsourced specialist may also be able to provide various levels of consultation assistance in areas where you may not have the internal expertise.
Before deciding whether outsourcing makes sense for you, here are some potential considerations. If you can answer “no” to all of these questions, then outsourcing might be a good option, especially if you don’t have an independent Analytics unit in-house that has the resource bandwidth to accommodate the AVM testing and validation processes:
Is this process strategically critical? I.e., does your validation of AVMs benefit you competitively in a tangible way?
If your validation of AVMs is inadequate, can this substantially affect your reputation or your position within the marketplace?
Is outsourcing impractical for any reason? I.e., are there other business functions that preclude separating the validation process?
Does your institution have the same data availability and economies of scale as a specialist?
The Way Forward
Here are some suggestions on how to go about preparing yourself for selecting your outsource partner:
Specify what you need outsourced. If you already have Policies and Procedures documented and processes in place, there may be no need to look for that capability, but there will necessarily still be the need to incorporate any testing and validation results into your existing policies and procedures. If you have previously done extensive evaluations of the AVMs that you use, in terms of their models’ conceptual soundness and outcomes analysis, there’s no need to contract for that, either. See our article on Regulatory Oversight to get some ideas about those requirements.
Identify possible partners, such as AVMetrics, and evaluate their fit. Here’s what to look for:
Expertise. It’s a technical job, requiring a fair amount of analysis and a tremendous amount of knowledge about regulatory requirements in general, and specifically knowledge relative to AVMs; check the résumés of the experts with whom you plan to partner.
Independence. A vendor who also sells, builds, resells, uses or advocates for certain AVMs may be biased (or may appear to be biased) in auditing them; validation must be able to “effectively challenge” the models being tested.
Track record. Stable partners are better, and a long term relationship lowers the cost of outsourcing; so look for a partner with a successful track record in performing AVM validations.
Open up conversations with potential partners early because the process can take months, particularly if policies and procedures need to be developed; although validations can be successfully completed in a matter of days, that is not the norm.
Make sure your staff has enough familiarity with the regulatory requirements so as to be able to oversee the vendor’s work; remember that the responsibility for compliance is ultimately on you. Make sure the vendor’s process and results are clearly and comprehensively documented and then ensure that Internal Audit and Compliance are part of that oversight. “Outsource” doesn’t mean “forget about it;” thorough and complete understanding and documentation is part of the requirements.
Have a plan for ongoing compliance, whether it is to transition to internal resources or to retain vendors indefinitely. Set expectations for the frequency of the validation process, which regulations require to be at least annually or more often, commensurate with the extent of your AVM usage.
AVM testing and validation is only one component in your overall Valuation and evaluation program. Unlike Appraisals and some other forms of collateral valuation, AVMs, by their nature as a quantitative predictive model, lend themselves to just the type of statistically-based outcomes analysis the regulators set forth. Recognizing this, elements of the requirements can be an outsourced process, but it must be a compliment to enterprise-wide policies and practices around the permissible, safe and prudent use of valuation tools and technologies.
The process of validating and documenting AVMs may seem daunting at first, but for the past 10 years AVMetrics has been providing ease-of-mind for our customers, whether as the sole source of an outsourced testing and validation process (that tests every commercial AVM four times a year), or as a partner in transitioning the process in-house. Our experience, professional resources and depth of data have enabled us to standardize much of the processing while still providing the customization every institution needs. And probably one of the most critical boxes you can check off when outsourcing with AVMetrics is the very large one that requires independence. It also bears mentioning that having been around as long as we have, our customers have generally all been through at least one round of regulatory scrutiny, and the AVMetrics process has always passed regulatory muster. Regulatory reviews already present enough of a challenge, so having a partner with established credentials is critical for a smooth process.
Hit Rate is a key metric that AVM users care about. After all, if the AVM doesn’t provide a valuation, what’s the point? But savvy users understand that not all hits are created equal. In fact, they might be better off without some of those “hits.”
Every AVM builder provides a “confidence score” along with each valuation. Users often don’t know how much confidence to put in the confidence score, so we did some analysis to clarify just how much confidence is warranted.
In the first quarter of 2020, we grouped hundreds of thousands of AVM valuations from five AVMs by their confidence score ranges. For convenience’s sake, we grouped them into “high,” “medium,” “low” and “fuhgeddaboutit” (aka, “not rated”). And, we analyzed the AVM’s performance against benchmarks in the same time periods. What we found won’t surprise anyone at first glance:
Better confidence scores were highly correlated with better AVM performance.
The lower two tiers were not even worth using.
The majority of valuations are in the top one or two tiers.
However, consider that unsophisticated users might simply use a valuation returned by an AVM regardless of the confidence score. One rationale is that any value estimate is better than nothing, and this is the valuation that is available. Other users may not know how seriously to take the “confidence score;” they may figure that the AVM supplier is simply hedging a bit more on this valuation.
Figure 1 shows the correlation for Model #4 in our test between the predicted price and the actual sales price for each group of model-supplied confidence scores. As you can see, as the confidence score goes up so does the correlation of the model and the accuracy of the prediction as evidenced by the drop in the Average Variance.
Table 1 lays out 4 key performance metrics for AVMs. They demonstrate markedly different performance for different confidence score buckets. For example, the “high” confidence score bucket for Model 1 performs significantly better in every metric than the other buckets, and what’s more that confidence bucket makes up 80% of the AVM valuations returned by Model 1.
Avg Variance  of 0.7% shows valuations that center very near the benchmarks, whereas lower confidence scores show a strong tendency to overvalue by 4-7%.
PPE10  of 90% means that 90% of “high” confidence score valuations are within +/- 10%. Other confidence buckets range from 67% to even below 50%.
PPE>20  measures excessive overvaluations (greater than 20%), which can create very high-risk situations for lenders. In the “high” confidence bucket, they are almost nonexistent at 1.8%, but in other buckets they are 13%, 28% or even 31.6%.
This last metric mentioned is instructive. Model 1 is a very-high-performing AVM. However, in a certain small segment (about 3%), acknowledged by very low confidence scores, the model has a tendency to over-value properties by 20% or more almost one-third of the time.
The good news is that the model warns users of the diminished accuracy of certain estimates, but it’s up to the user to realize when to disregard those valuations. A close look at the table shows that with different models, there are different cut-offs that might be appropriate. Not every user’s risk appetite is the same, but we’ve highlighted certain buckets that might be deemed acceptable.
Model 2 and Model 5, for example, have very different profiles. Whereas Model 1 produced a majority of valuations with a “high” confidence level, Model 2 and Model 5 put very few valuations into that category. “Confidence scores” don’t have a fixed method of calculation that is standardized between Models. It’s possible that Model 2 and Model 5 use their labels more conservatively. That’s one more reason that users should test the models that they use and not simply expect them to perform similarly and use labels consistently.
That leads into a third conclusion that leaps out of this analysis. There’s a huge advantage to having access to multiple models and the ability to pick and choose between them. It’s not immediately apparent from this analysis, but these models are not all valuing the same properties with “high” confidence (this will be analyzed in two follow-up papers in this series). Model 4 is our top-ranked model overall. However, as shown in Table 2, there are tens of thousands of benchmarks that Model 4 valued with only “medium” or “low” or even “not rated” confidence but for which Model 1 had “high” confidence valuations.
Different models have strengths in different geographic areas, with different property types or even in different price ranges. The ideal situation is to have several layers of backups, so that if your #1 model struggles with a property and produces a “low” confidence valuation, you have the ability to turn to a second or third model to see if they have a better estimate. This last point is the purpose of Model Preference Tables®. They specify which model ranks first second and third across every geography, property type and price tranche. And, users may find that some models are only valuable as a second or third choice in some regions, but by adding them to the panel, the user can avoid that dismal dilemma: “Do I use this valuation that I expect is awful – what other choice do I have?”
 We grouped valuations as follows: <70% were considered “not rated,” 70-80% were considered “low,” 80-90% “medium,” and 90+ “high.”
 In fact, this isn’t wrong in some cases. For example, in the case of Model 2, the “medium” and “high” confidence valuations don’t differ significantly.
 The correlation coefficient indicates the strength of the relationship between two variables can be found using the following formula:
rxy – the correlation coefficient of the linear relationship between the variables x and y