When we talk about open mortgage data, we mean data about mortgage loan applications that is disclosed to the public. The regulatory agency Consumer Financial Protection Bureau (CFPB), for example, releases the most comprehensive data set of mortgage activity in the US: the Home Mortgage Disclosure Act data.
The amount of open mortgage data increased greatly starting in 2018 when the CFPB required lenders to submit more details about their loan transactions under Regulation C (HMDA). The industry tried to resist but the relatively new regulatory agency prevailed. We can still hear voices from various corners of the mortgage industry warning about the danger of reporting and disclosing so much data. And we agree - when data is so large and so complex, there's always room for shortcuts, such as selectively omitting data, cherry picking data, or even torturing the data to support a hypothesis. For example, one of the most frequent uses of the HMDA data has been to announce that a Lender denied more applications submitted by minority applicants than non-minority applicants. Another use of the HDMA data is to produce rankings of lenders by volume or units for specific product, geography, and borrower segments. Often these claims have been ill-informed or outright wrong.
At the same time, as a data science company, we are excited about the potential of these vast amount of details about the mortgage application to improve products and processes which we hope will lead to better outcomes of lending activities for all communities.
As a society, we still have many issues that need to be addressed. One of the challenges often highlighted is the wide gap between the homeownership rate of minority population vs. non-minority. And often, mortgages are the means of achieving homeownership in the U.S. so mortgage lenders' activities are closely scrutinized. Below we highlight the major benefits of using HMDA data in mortgage lenders' decision making.
Geographic granularity. We all know the cliche of Location! Location! Location! in real estate. Mortgage originations are secured by real estate property which has specific location (address). Lenders can muster the power of HMDA data to understand the dynamics of its market by census tract and aggregate the data for a higher level analysis at the same time. Using analytics allows lenders to generate multiple perspectives of its customers and to educate its staff about opportunities to serve them better. You can read about these use cases in our blog here.#nbsp;
Loan Lifecycle. The HDMA data shows the full lifecycle of mortgage applications.#nbsp;The Action Taken field shows whether the outcome of the application, if not a purchased loan, was:
Once on the books, the HDMA data also shows which loans are then sold and to which investors. The importance of this data set is that lenders can calculate metrics that describe how efficient their mortgage processes are in comparison to their peers. For example, a bank may calculate its originations rate and compare it to another similar size bank in its footprint. We offer more information in our blog.#nbsp;
Know Your Customer.#nbsp;2022 is the fourth year the CFPB is releasing an expanded set of HMDA data points on applicant demographics. In combination with loan features, geo location, property type, and lender information, this data allows you to quickly understand the buying patterns, for example, of various age groups and income levels, as we discuss in detail here.
In HMDAVision, we expose the unabbreviated and non-summarized ~90 million loan-level transactions for the last 4 years. The significance of this is that users can now explore the entire data set without limitation and with the speed of thought, and look for patterns. We are encouraging leaders to use the data to ask questions that will drive impactful business decisions. For example,#nbsp;
One lofty goal behind the idea of CFPB to require and to disclose more data is that more data will create better transparency. But more data by itself is not enough. If the wrong analytics are applied to bigger data set, we just get bigger misrepresentations. The risks from the expanded HMDA data is that some analysts are using the data set to look for associations and correlations and present these as causation. This kind of lazy approach leads to conclusions and arguments that can be harmful to the reputation of lenders. In addition, this kind of analyses can be used to influence of housing policy decisions that might lead to long-term detrimental effect on communities across the country. We always recommend that the data is used with utmost caution and that analysts apply data science techniques that would provide the right insight. If the goal is to establish causation, then analysts should run causal inference models to derive answers.#nbsp;