Home » Blog » Articles

Category: Articles

The Wild, Wild West of Automated Valuations

Recently the OCC, FDIC and the Federal Reserve proposed raising the de minimis threshold for residential properties below which appraisals are not required to complete a home loan. Currently, most homes transacting at $250K and above require an appraisal, but Federal regulators propose to raise that level to $400K. A November 30th Wall Street Journal article raises some interesting issues about the topic. They reported that the number of appraisers is down 21% since the housing crisis, but more homes require an appraiser, since more and more homes exceed the threshold each year. The article also states that these factors open the door for cheaper, faster and “largely untested” property valuations based on computer algorithms, also known as Automated Valuation Models (AVMS).

At AVMetrics, we have been continuously testing AVMs for over 15 years, so we’ve seen how they’ve performed over time. As an example, the accompanying chart shows model performance accuracy as measured by mean absolute error, a statistical metric of valuation error.  We utilize many statistical measures of evaluating model accuracy and precision, and they all show significant improvement in AVMs over time. And, as these automated tools get better and the workforce of appraisers continues to shrink, the FFIEC members’ proposed change seems warranted, but that doesn’t mean they don’t have their critics.

Mean Absolute Error of all tested AVM models for the last 10 years

Ratish Bansal of Appraisal Inc was quoted in The Journal describing the state of AVMs as “a wild, wild West,” inviting, “abuse of all kind.” Furthermore, he contrasts that with the voluminous regulatory standards covering the use of appraisals.

We note much of those voluminous standards represent nearly the same quality control that was in place before the Credit Crisis.  In other words, appraisals are not a guarantee against collateral risk.  They are simply one tool in the toolbox – an effective, but comparatively time consuming and expensive tool. Also of note, far from being the “wild, wild west,” AVMs are also governed by regulators, most notably, Appendix B of the Appraisal and Evaluation Guidelines (OOC 2010-42) and Model Risk Management guidance (OCC 2011-12). These regulatory guidelines require that AVM developers be qualified, users of AVMs use robust controls, incentives be appropriate, and models be tested regularly and thoroughly with out-of-sample benchmarks. They require documentation of risk assessments and stipulate that a Board of Directors must oversee the use of all models. In other words, if AVMs were the “the wild, wild west” they would be rooted in a town with oversight of the legendary Wyatt Earp.

My strong feeling is that appraisals should not be a sole and exclusive tool when evaluations can be effectively employed in appropriate, lower-risk scenarios. Appraisers are a valuable and limited resource, and they should be employed at (to use appraisal terminology) their highest and best use.  Trying to be a “manual AVM” is not the highest and best use of a highly qualified appraiser.  Their expertise should be focused on the qualitative aspects of property valuation such as the property condition and market and locational influences. They should also be focused on performing complex valuation assignments in non-homogeneous markets.  AVMs do not capture and analyze the qualitative aspects of a property very well, and they still stumble in markets with highly diverse house stock or houses with less quantifiable attributes such as view properties.

However, several companies are developing ways of merging the robust data processing capabilities of an AVM with the qualitative assessment skills of appraisers.  Today, these products typically use an AVM at their core and then satisfy additionally required evaluation criteria (physical property condition, market and location influences) with an additional service.  For example, a lender can wrap a Property Condition Report (PCR) around the AVM and reconcile that data in support of a lending decision.  This type of “Hybrid valuation” is on the track we’re headed down.  Many companies have already created these types of products for commercial and proprietary use.

We at AVMetrics believe in using the right tool for the job, and we believe there is a place for automated valuations in prudent lending practices. We think the smarter approach would be to marginally raise the de minimis threshold, but simultaneously to provide additional guidance for considering other aspects of a lending decision, specifically, collateral considerations and eligibility criteria for appraisal exemptions such neighborhood homogeneity, property conformity, market conditions and more.

Cascade vs Model Preference Table® – What’s the Difference?

In the AVM world, there is a bit of confusion about what exactly is a “cascade.” It’s time to clear that up.  Over the years, the terms “cascade” and “Model Preference Table®” have been used interchangeably, but at AVMetrics, we draw an important distinction that the industry would do well to adopt as a standard.

In the beginning, as AVM users contemplated which of several available models to use, they hit on the idea of starting with the preferred model, and if it failed to return a result, trying a second model, and then a third, etc.  This rather obvious sequential logic required a ranking, which was available from testing, and was designed to avoid “value shopping.”[1]  More sophisticated users ranked AVMs across many different niches, starting with geographical regions, typically counties.  Using a table, models were ranked across all regions, providing the necessary tool to allow a progression from primary AVM to secondary AVM and so on.

We use the term “Model Preference Table” for this straightforward ranking of AVMs, which can actually be fairly sophisticated if they are ranked within niches that include geography, property type and price range.

More sophisticated users realized that just because a model returned a value does not mean that they should use it.  Models typically deliver some form of confidence in the estimate, either in the form of a confidence score, reliability grade, a “forecasted standard deviation” (FSD) or similar measure derived through testing processes.  Based on these self-measuring outputs from the model, an AVM result can be accepted or rejected (based on testing results) in favor of the next AVM in the Model Preference Table.  This application reflects the merger of MPT rankings with decision logic, which in our terminology makes it a “cascade.”

Criteria AVM MPT® Cascade “Custom” Cascade
Value Estimate X X X X
AVM Ranking X X X
Logic + Ranking X X
Risk Tolerance + Logic + Ranking X

 

The final nuance is between a simple cascade and a “custom” cascade.  The former simply sets across-the-board risk/confidence limits and rejects value estimates when they fail to meet the standard.  For example, the builder of a simple cascade could choose to reject any value estimate with an FSD > 25%.  A “custom cascade” integrates the risk tolerances of the organization into the decision logic.  That might include lower FSD limits in certain regions or above certain property values, or it might reflect changing appetites for risk based on the application, e.g., HELOC lending decisions vs portfolio marketing applications.

We think that these terms represent significant differences that shouldn’t be ignored or conflated when discussing the application of AVMs.

 

Lee Kennedy, principal and founder of AVMetrics in 2005, has specialized in collateral valuation, AVM testing and related regulation for over three decades.  Over the years, AVMetrics has guided companies through regulatory challenges, helped them meet their AVM validation requirements, and commented on pending regulations. Lee is an author, speaker and expert witness on the testing and use of AVMs. Lee’s conviction is that independent, rigorous validation is the healthiest way to ensure that models serve their business purposes.

[1] OCC 2005-22 (and the 2010 Interagency Appraisal and Evaluation Guidelines) warn against “value shopping” by advising, “If several different valuation tools or AVMs are used for the same property, the institution should adhere to a policy for selecting the most reliable method, rather than the highest value.”

How AVMetrics Tests AVMs

Testing an AVM’s accuracy can actually be quite tricky.  It is easy to get an AVM estimate of value, and you can certainly accept that a fair sale on the open market is the benchmark against which to compare the AVM estimate, but that is really just the starting point.

There are four keys to fair and effective AVM testing, and applying all four can be challenging for many organizations.

  1. Your raw data must be cleaned up, to ensure that there aren’t any “unusable” or “discrepant” characters in the data; differences such as “No.” “#” and “Num,” must be normalized.
  2. Once your test data is “scrubbed clean” it must be assembled in a universal format and it must be large enough to provide reliable test results, even down to the segment level for each property type within each price level within each county, etc. and this might require hundreds of thousands of records. 
  3. Timing must be managed so that each model receives the same sample data at the same time with the same response deadline.
  4. Last, and most difficult, the benchmark sales data must not be available to the models being tested.  In other words, if the model has access to the very recent sales price, it will be able to provide a near-perfect estimate by simply estimating that the value hasn’t changed (or changed very little) in the days or weeks since the sale. 

AVMetrics tests every commercially available AVM continuously and aggregates this testing into a report quarterly; AVMetrics’ testing process meets these criteria and many more, providing a truly objective measure of AVM performance. 

The process starts with the identification of an appropriate sample of properties for which benchmark values have very recently been established.  These are the actual sales prices for arm’s-length transactions between willing buyers and sellers—the best and most reliable indicator of market value.  To properly conduct a “blind” test, these benchmark values must be unavailable or “unknown” to the vendors testing their model(s).  AVMetrics provides in excess of a half million test records annually to AVM vendors (without information as to their benchmark values).  The AVM vendors receive the records simultaneously, run these properties through their model(s) and return the predicted value of each property within 48 hours, along with a number of other model-specific outputs.  These outputs are received by AVMetrics, where the results are evaluated against the benchmark values.  A number of controls are used to ensure fairness, including the following:

  • ensuring that each AVM vendor receives the exact same property list (so no model has any advantage)
  • ensuring that each AVM is given the exact same parameters (since many allow input parameters that can affect the final valuation)
  • ensuring through multiple checks that no model had access the recent sale data, which would provide an unfair advantage

In addition to quantitative testing, AVMetrics circulates a comprehensive vendor questionnaire twice annually.  Vendors that wish to participate in the testing process complete, for each model being tested, roughly 100 parameter, data, methodology, staffing and internal testing questions.  These enable AVMetrics, and more importantly our clients, to understand model differences within both testing and production contexts, and it enables us and our clients to satisfy certain regulatory requirements describing the evaluation and selection of models (see OCC 2010-42).

AVMetrics next performs a variety of statistical analyses on the results, breaking down each individual market, each price range, and each property type, and develops results which characterize each model’s success in terms of precision, usability and accuracy.  AVMetrics analyzes trends at the global, market and individual model levels, identifying where there are strengths and weaknesses, and improvements or declines in performance.

The last step in the process is for AVMetrics to provide an anonymized comprehensive comparative analysis for each model vendor, showing where their models stack up against all of the models in the test; this invaluable information facilitates the continuous improvement of each vendor’s model offerings.

Raising the De Minimis Threshold – Fear Not!

Background

There is a lot of controversy about appraisals and Appraisers these days, and the FFIEC proposed rule change – increasing the de minimis threshold to $500,000 – allowing for an appraisal exemption and the use of an evaluation in lieu of an appraisal – has sparked anxiety in the world of collateral risk.  Our colleagues at the Collateral Risk Network (CRN) expressed their opposition to the proposal. Not surprisingly for a group of its size, there are diverse opinions at the individual membership level of the group.  Our opinion is that the change – far from being the catastrophe imagined – will in fact have some important benefits.

A Place for De Minimis

While the CRN and certain appraiser blogs expressed skepticism – to put it mildly – we believe that there is a place for an appropriate de minimis level, even the $500,000 level now being considered.  On low risk transactions, evaluations (as opposed to full appraisals) can be appropriate and even beneficial for risk management of the overall lending system.

Here’s why.  Lending volumes tend to scale up and down faster than the supply of appraisers.  As a result, boom cycles in the lending business can place extreme pressure on appraisers.  This scenario makes quality control extremely challenging.  The option to leverage efficient evaluations on low risk transactions can improve the risk management of the entire system by devoting limited appraisal resources to their highest and best use.  In other words, when you place strain on a system, something has to give, and raising the de minimis threshold enables lenders to focus scarce resources on the riskier transactions.

Evaluations and the Credit Crisis

The CRN expressed concern about allowing the mistakes of the recent Credit Crisis to be repeated, and we could not be in more agreement.  However, their letter insinuated that evaluations (specifically BPOs and AVMs) were to blame for inflated valuations.  Of the vast number and type of quality problems experienced during the credit crisis, evaluations were not a major contributing factor.  In fact, we are not aware of any reported cases of AVMs being blamed for the quality problems experienced during the credit crisis.

Appraisals as a Source of Market Analysis

Strangely, the CRN comments suggested that reviewing individual appraisals is an important source of market trend analysis for investors during overheated markets.  We find this highly improbable.  The typical single-family appraisal may contain microanalysis of neighborhoods or small markets that lenders may find informative, but most Investors already access market and economic trend data via other sources, including their own or 3rd party economic analyses and risk management tools.

Existing Quality Control Infrastructure for Appraisals

The CRN letter makes the case that appraisals benefit from an extensive regulatory framework and quality control infrastructure surrounding their use, making them inherently safer for the industry to rely upon.  We note that much of the same quality control infrastructure and practices were in place before the last crisis.  Much of that appraisal quality control depends on the same people and practices – e.g., “desk appraisals” performed by other appraisers – making them subject to similar risk factors.  In other words, appraisals are not a guarantee against risk.  They are simply one tool in the toolbox – an effective and comparatively expensive tool – but they should not be an exclusive tool when evaluations can be effectively employed in lower-risk scenarios. .

Application of Evaluations

We believe in using the right tool for the job, and we believe that there is a place for evaluations in prudent lending practices. Relying on additional risk measurements, rather than just focusing on a one size fits all de minimis level can provide a formula for better risk management.  For example: A $350,000 transaction at a 40% LTV for a pay stub borrower has less need for an appraisal; an evaluation might be able to suffice.  Better to allocate that valuable appraisal resource to a $225,000 transaction at 90% LTV.  Raising the de minimis, while providing additional guidance for other measures, provides lenders and investors more flexibility to make smarter risk management decisions, and it releases valuable appraisal resources to be used where they can have the most benefit.

Now that the FFIEC has recently closed its commentary period regarding the proposed de minimis lending threshold of $500,000, we expect to receive final communication from the FFIEC during 2016.
We anticipate that lenders will adapt to the new regulations incrementally, with quality controls designed for the new thresholds, not discarded with the bathwater.

Lee Kennedy & Mike Coyne,

AVMetrics, LLC.

Same Scandal, New Perpetrator

It seems like only yesterday we were lamenting the hubris of Volkswagen, loading software into their TDI models to fake out emissions tests on tens of millions of vehicles.  Here we are again, this time with Mitsubishi.  The only real surprise is that these companies don’t learn.

Hyundai in 2012, Ford in 2014, Volkswagen in 2015, and now Mitsubishi, although this is not even their first scandal.  In the early 2000s, Mitsubishi was embarrassed by defects that were covered up.

It’s surprising that these companies cannot identify the root cause is faulty business processes.  Instead, they root out the responsible parties and do a mea culpa, or the CEOs resign in shame for their leadership failures (as in the case of Volkswagen last year).  Why doesn’t anyone realize that if your system is to self-test for emissions and mileage, eventually you are going to have a problem, because that is not a foolproof system?

The faulty business process is their lack of independent testing.  These emissions and mileage results are vital business inputs, and the integrity of those results is mission critical.  Where are their controls?

Our industry is financial services, where federal regulations have long required independent testing in many areas.  Our specific segment of the industry is the Automated Valuation Model (AVM) business, which has a regulatory mandate for independent validation.  Financial institutions use many different kinds of computer models to improve decision making, and AVMs are one kind of model.  They estimate property values, and for banks that makes loans on property, that comes in handy in dozens of ways.

But, if there are systematic problems with AVMs, for example, if they over-valued everything by 20%, it could cause a huge problem for banks and credit unions.  This is where we come in.  We independently test and validate every commercially available residential AVM on a continuous basis, thoroughly, rigorously and impartially.  And, the beneficiaries are everyone.  Banks and credit unions benefit, borrowers benefit, and even the AVM developers benefit because of the feedback we provide to them as well as the broader consumer confidence in their products.

Certainly it is incumbent upon leaders to create a culture of integrity.  One way of doing that is to do more than admonish people to be honest.  Instead, create a system where there is independent testing, and make sure that everyone knows that their results will be tested.  Voila!  When people know they are being checked, integrity soars, and everyone wins.  Don’t just demand integrity; build it into the process!

Lee Kennedy, principal and founder of AVMetrics in 2005, has specialized in collateral valuation, AVM testing and related regulation for over three decades.  Over the years, AVMetrics has guided companies through regulatory challenges, helped them meet their AVM validation requirements, and commented on pending regulations to help bring clarity and sanity to the situation.  Lee is an author, speaker and expert witness on the testing and use of AVMs.  Lee’s conviction is that independent, rigorous validation is the healthiest way to ensure that models serve their business purposes.  Every commercially available AVM vendor trusts AVMetrics to provide feedback to them on their models, facilitating each model’s continuous improvement.

Cascade vs Model Preference Table – What’s the Difference?

In the AVM world, there is a bit of confusion about what exactly is a “cascade.” It’s time to clear that up.  Over the years, the terms “cascade” and “Model Preference Table”TM have been used interchangeably, but at AVMetrics, we draw an important distinction that the industry would do well to adopt as a standard.

In the beginning, as AVM users contemplated which of several available models to use, they hit on the idea of starting with the preferred model, and if it failed to return a result, trying a second model, and then a third, etc.  This rather obvious sequential logic required a ranking, which was available from testing, and was designed to avoid “value shopping.”[1]  More sophisticated users ranked AVMs across many different niches, starting with geographical regions, typically counties.  Using a table, models were ranked across all regions, providing the necessary tool to allow a progression from primary AVM to secondary AVM and so on.

We use the term “Model Preference Table” for this straightforward ranking of AVMs, which can actually be fairly sophisticated if they are ranked within niches that include geography, property type and price range.

More sophisticated users realized that just because a model returned a value does not mean that they should use it.  Models typically deliver some form of confidence in the estimate, either in the form of a confidence score, reliability grade, a “forecasted standard deviation” (FSD) or similar measure derived through testing processes.  Based on these self-measuring outputs from the model, an AVM result can be accepted or rejected (based on testing results) in favor of the next AVM in the Model Preference Table.  This application reflects the merger of MPT rankings with decision logic, which in our terminology makes it a “cascade.”
MPT vs Cascade vs Custom Cascade

The final nuance is between a simple cascade and a “custom” cascade.  The former simply sets across-the-board risk/confidence limits and rejects value estimates when they fail to meet the standard.  For example, the builder of a simple cascade could choose to reject any value estimate with an FSD > 25%.  A “custom cascade” integrates the risk tolerances of the organization into the decision logic.  That might include lower FSD limits in certain regions or above certain property values, or it might reflect changing appetites for risk based on the application, e.g., HELOC lending decisions vs portfolio marketing applications.

We think that these terms represent significant differences that shouldn’t be ignored or conflated when discussing the application of AVMs.

 

Lee Kennedy, principal and founder of AVMetrics in 2005, has specialized in collateral valuation, AVM testing and related regulation for over three decades.  Over the years, AVMetrics has guided companies through regulatory challenges, helped them meet their AVM validation requirements, and commented on pending regulations. Lee is an author, speaker and expert witness on the testing and use of AVMs. Lee’s conviction is that independent, rigorous validation is the healthiest way to ensure that models serve their business purposes.

[1] OCC 2005-22 (and the 2010 Interagency Appraisal and Evaluation Guidelines) warn against “value shopping” by advising, “If several different valuation tools or AVMs are used for the same property, the institution should adhere to a policy for selecting the most reliable method, rather than the highest value.”

What Volkswagen Needs Most

2000px-Volkswagen_logo_2012.svgRight now, what Volkswagen needs most is probably not what you think.  Yes, they need lawyers and public relations help, and probably more lawyers.  And, they need some internal investigations and a new mission statement.  But, what they need most is that magic thing that money simply cannot buy: Volkswagen needs to regain the consumer’s (and the world’s) trust, since the damage to their reputation will take years to repair; this was not simply a faulty product to recall and repair, as staggering as those reparations can also be, this was a direct assault on our belief systems, and that gets personal.

So what shocked you about this deception?  Was it the tons of emissions pumped into the air by drivers who thought that they were driving “clean diesel?” Was it the brazen advertising of “clean diesel” by VW that (someone at) VW knew was not clean?  Was it the deliberate creation of code loaded into every single TDI 2.0 liter engine that was designed to foil emissions tests?  Not for me.  The most shocking detail was that the car companies self-certify their emissions and mileage stats.

The reality is that we should never expect that people (or industries or corporations) will always be ethical.  In almost all arenas of society we put systems of oversight in place to detect malicious acts in order to deter them.  The police are trusted to enforce the laws and use lethal force if necessary; yes, they have an Internal Affairs division, but their oversight is ultimately civilian.  The military has a civilian Commander in Chief.  The federal government’s three branches are designed to provide mutual oversight.

Federal regulations in banking have long since recognized that banks and credit unions cannot self-regulate either. And as technology and social changes advance far more quickly than the regulators can keep up, they put controls in place to allow carefully vetted entities to serve as their proxy for oversight, most commonly in the form of independent divisions within an organization or third party service providers.

As a case in point, the financial industry depends heavily on sophisticated models to facilitate their business decisions. Specifically within the property valuation segment of the industry, Automated Valuation Models (AVMs) are the most common models used.

Banks use AVMs to estimate property values, quickly, cheaply and accurately.  AVMs aren’t as accurate as an appraisal, but they cost a few dollars instead of a few hundred dollars.  Banks use them for all sorts of things: portfolio valuation, sales, marketing, servicing, appraisal quality control, even equity lending decisions.  And if lenders base their decisions on systematically erroneous data, or improperly built models, they can run into big problems not only to their bottom line, but also to their reputation. Understanding the risk that this creates, regulators have required all such models to be independently validated.  There are two critical aspects to this precaution.

First, validation must be independent.  The validation must be done by a person or team that is separate from the developers, users and buyers (most AVMs are built by independent companies that specialize in their development).  Very specifically, in the regulatory guidance, the model’s builders cannot be relied upon as an objective source of validation.

Second, the validation must be conducted by staff qualified in modeling or analytics, with the adequate authority to blow the whistle if they find issues.  The validation must be performed in real-world conditions, it should be ongoing, and it should be reported on at least annually.  When there are changes to the models, the business environment or the marketplace, the models need to be re-validated.

Banks and credit unions can do their own AVM validation, but most find it generally difficult to meet all of the requirements, including the authority to blow the whistle if they find issues. These validation caveats are the best justifications for looking to a company like AVMetrics to provide this service for you.

AVMetrics is in no way beholden to banks and credit unions, the AVM developers, or the resellers in any way; we draw no income from selling, developing or using AVM products.  AVMetrics conducts extensive ongoing AVM validation and has done so for over a decade.  Independence is essential, because self-certification can be an invitation to abuse, and as Volkswagen just found out, it can be devastating.  Our industry implemented independent testing years ago.  It is rigorous, exhaustive and frequent, but that is appropriate because there is a lot at stake.

If Volkswagen ever hopes to get the public’s trust back, they need to open the curtains and put on all the lights. In addition, every result and claim will need to be independently validated, since no one is likely to believe them otherwise.

These are common-sense controls, and while the application to the auto industry is not perfectly analogous to the testing of AVMs, the basic lesson is clear: we should not trust companies to test and certify themselves.

Lee Kennedy, principal and founder of AVMetrics in 2005, has specialized in collateral valuation, AVM testing and related regulation for over three decades.  Over the years, AVMetrics has guided companies through regulatory challenges, helped them meet their AVM validation requirements, and commented on pending regulations to help bring clarity and sanity to the situation.  Lee is an author, speaker and expert witness on the testing and use of AVMs.  Lee’s conviction is that independent, rigorous validation is the healthiest way to ensure that models serve their business purposes.  Every commercially available AVM vendor trusts AVMetrics to provide feedback to them on their models, facilitating each model’s continuous improvement.