Polygon Research Blog

HMDAVision Now Live with 2018 National Loan-Level HMDA Data
September 3, 2019
The CFPB had a big data release on Friday August 30, 2019, hitting their target to release the HMDA national loan-level dataset before the end of the summer. HMDAVision is now live with this data.

Each year stakeholders of the US mortgage industry anxiously await the release of the previous year's HMDA data set, "the most comprehensive publicly available information on mortgage market activity."* This year the CFPB effectively released it twice; from a data modeler's perspective: the hard way and the easy way. The fact that we had already released a version of HMDAVision based on the former made it easy to update it with the latter over this past weekend. Here are the key highlights we'd like to share:
Weekly Dynamic Updates
The CFPB has commenced publishing their weekly updates and corrections to the 2018 data. HMDAVision already has ingested the first of these, and will be updated on a weekly basis as well.
New Derived Fields
The HMDA team at the CFPB deserves a lot of credit for creating 6 new derived fields, in particular for providing a way to combine up to 10 fields as a time for applicant and co-applicant demographic fields. This has always been a challenge for us, so kudos to the CFPB for leading the way on this.

All 6 derived fields are now included in HMDAVision filters and charts. One of them - Conforming Loan Limit - is only meaningful when combined with FHFA per-county conforming loan definitions for 2018. We have ingested these into HMDAVision, so our designation of whether or not a loan was conforming is accurate based on the county in which the application was made, for all US counties.
Accuracy
Going live earlier in the summer with the individual Modified LAR files presented two challenges: making sure we ingested all 5600+ individual files (before Friday's release of the same as a single file), and for us to create our own linkages between LEI and Agency+RespondentID (before Friday's release of the same). So how did we do?

We, and the CFPB actually, did great. Our latest modified LAR pull from August 22-23 diverged from the first monolithic dynamic national file by 3 lenders, which tells us the CFPB's first dynamic file was created very close to these same dates (the number is expected to change as more LARs trickle in, which is the reason for the weekly updates).

Our crosswalk linking 2018 to 2017 lenders was also very good. Despite our field matching, fuzzy logic matching, and manual research, we did miss about 20 matches (now updated) where the lender information was too obtuse to match. But at the same time, we have many more matches not included in the CFPB's crosswalk, as we match back to the earlier years as well - a proprietary advantage of HMDAVision.
*The data referred to in this quote from the CFPB is the basis for our HMDA research tool HMDAVision. The data is quite large and unwieldy, so most of our competitors summarize it or provide only slices at a time. We go the other direction, providing all HMDA transactions for the past 5 years (over 70 million), augmented by the most recent year's American Community Survey microdata, as well as county-level FHFA conforming loan limit data.
Made on
Tilda