Polygon Research Blog


AI Fridays HMDA Fair Lending Data Science
Today's post is our response to the CFPB's Request for Information regarding the Home Mortgage Disclosure Act (HMDA). Already submitted to the CFPB, we're duplicating a lightly edited version here as well.

Polygon Research, Inc.  A New Approach to Mortgage Market Intel
www.polygonresearch.com | info@polygonresearch.com

January 21, 2022

Consumer Financial Protection Bureau
Docket No. CFPB-2021-0018
1700 G Street, NW
Washington, DC 20552

Re: Comments on Request for Information Regarding the HMDA Rule Assessment (Docket No. CFPB-2021-0018)

Dear Madam or Sir:

Polygon Research, a mortgage data science company, with a commercial HMDA analytics product on the market, is pleased to respond to this RFI, Docket No. CFPB-2021-0018. We’re structuring our response along the lines of the eight points on which the RFI solicits response, focused on the stated purposes of HMDA, namely “to provide the public with loan data that can be used

(i) To help determine whether financial institutions are serving the housing needs of their communities;
(ii) to assist public officials in distributing public-sector investment so as to attract private investment to areas where it is needed; and
(iii) to assist in identifying possible discriminatory lending patterns and enforcing antidiscrimination statutes."

Since we launched HMDAVision in 2018, we’ve modeled all HMDA transactions from 2013-2020 – over 120M loan applications and purchased loans our users have been able to search, aggregate, and filter at the loan level. We saw an explosion of interest in and utility of HMDA data in particular with the release of the expanded data points starting in the 2018 LAR.

Our response, enumerated below, centers around 3 themes:

  1. the CFPB is meeting the goals of HMDA well by focusing on data availability and data quality
  2. the CFPB can improve by collecting/releasing more data, enumerated below
  3. the CFPB can exert a greater multiplier effect on the HMDA goals through collaborative efforts rather than building solutions on its own

Responses to Enumerated Inquiries (1-8, not taken in order)

(1)Comments on the feasibility and effectiveness of the assessment plan, the objectives of the HMDA Rule that the Bureau intends to emphasize in the assessment, and the outcomes for assessing the effectiveness of the HMDA Rule as described in part IV above;

Polygon Research Comment: HMDA has been wildly successful – through reliable data aggregation and dissemination – in meeting goals i, ii, and iii listed above, especially with the advent of the expanded data points in the 2018 LAR. The curation and dissemination of HMDA data has catalyzed innovation in both the private and public sector, driving deep analysis of HMDA data for all three use cases (goals). Improvements, however, are needed in two areas: expanding institutional coverage, and expanding data points.

(2) Data and other factual information that the Bureau may find useful in executing its assessment plan and answering related research questions, particularly research questions that may be difficult to address with the data currently available to the Bureau, as described in part IV above;

Polygon Research Comment: Toward goals i, ii, and iii, HMDA operates with a network effect. We can analyze the coverage, effectiveness, and fairness of lending activities only where we have data – ideally, all the mortgage transactions that occurred, regardless of lender size or size of their book of business. Toward this end, we advocate for expanding institutional coverage to include all financial institutions.

(3) The specific data points reported under the 2015 HMDA Rule that help meet the objectives of the HMDA Rule, as described in part IV above, including the rationale, and provide any available detailed supporting information, evidence and data;

Polygon Research Comment: HMDA also allows us to look at questions of fairness. With the expanded data points in 2018, we are able to construct analysis like the following pricing analysis by race (with labels removed):

(8) Recommendations for modifying, expanding, or eliminating any aspects of the HMDA Rule, including but not limited to the institutional coverage and loan-volume thresholds, transactional coverage, and data points.

Polygon Research Comment: Knowing that we can expand this analysis to also see loan size, interest rate, discount points, lender credits, origination charges, and closing costs – by race, ethnicity, age, sex, income (or any combination of these) – by lender and by geography, we are making great strides in fair lending analysis. However, like jumping to conclusions about the fairness of basketball team roster cuts through data that does not include player height, fair lending analysis is not complete without including credit score. For this reason, we advocate expanding HMDA data points to include credit score.

We also advocate disaggregating age for improving fair lending analysis, and including the month of the transaction, to better understand seasonality.

The main objections to expanding data points like these in the past have been tied to privacy. If credit score were included, the argument goes, the identity of the applicant could be inferred from the pool of residents in a given Census tract. Our answer to this is twofold. First, acknowledging the point, we recommend that initially credit score be included in bands.

Second, we all have to acknowledge that for originated loans, key elements of the HMDA data – and more, like property address and borrower name – is publicly available in county recorders’ offices. Beyond this, in 2022 data aggregators have long since mastered the ability to tailor a marketing list down to innumerable and specific personal characteristics. Weighing this against data no more geographically specific than Census tract, it’s not a fair fight. Privacy violators don’t have to go through HDMA to get what they’re after. Going beyond the comparison to originated loans, banding credit scores should adequately obfuscate identity for other loan outcomes, specifically denied applications.

(7) Data and other factual information about the HMDA Rule's effectiveness in meeting the purposes and objectives of title X of the Dodd-Frank Act (section 1021), which are listed in part IV above;
a. Please describe the value that data on such transactions provides in serving HMDA's purposes;
b. Comments relating to the usability of the public HMDA data, potential challenges of the current format of the public HMDA data, and recommendations for additional reporting by the Bureau that would be helpful in informing the use of the public HMDA data by communities, public officials, or other stakeholders; and

Polygon Research Comment: Two attributes/limitations of the current HMDA data blunt its effectiveness for certain types of analysis. First, given the crucial role of the secondary market in providing liquidity to lenders and the importance of including sold/retained analysis in analyzing the reach and effectiveness of private investment to areas where it is needed, it’s a blow to the completeness of this analysis to not have Purchaser Type information on loans originated in one year but sold the following year. We advocate for correcting this limitation.

Pivoting to fair lending, we also advocate for the inclusion of a new data point: the origination year (and month – see above) of purchased loans (i.e. action type 6).

Pivoting to data quality, we see the opportunity for improvement through simplification. We recommend the following:
  • The CFPB should only report Census Tract – eliminate the redundant fields of County, MSA, and State, which in the past have at times been misaligned.
  • The CFPB should solve for the “lost year” of new census tract reporting that occurs every 10 years (e.g. 2021 HMDA data reporting will be done with 2010 census tracts, in light of a. Reg C’s requirement to use the tract numbering system in effect at the start of the year, and b. the fact that the Census Bureau published the 2020 tracts after Jan 1, 2021) by making Census Tract geocoding a CFPB responsibility – i.e. take address information from the lenders, and through centralized geocoding, publish the correct, current census tract in the public LAR. For example, in March, 2022, this would deliver to the public 2020-numbering system Census Tracts for the 2021 HMDA LAR, and would also improve data quality, as lenders would be reporting in a more accessible, verifiable manner (i.e. street address) in their filings to the CFPB. Further, with this change, nothing would stop the CFPB from also publishing zip code (or ZCTA) to the public as well, as this could easily be added to centralized geocoding efforts as well.

Finally, one more area for expansion of data points would significantly enhance the usability of HMDA. Given the significant lender consolidation in the industry, if a lender is acquired during a reporting period, we recommend the inclusion of:
  • The parent company’s LEI
  • The type of acquisition
  • The date of the acquisition

This will improve clarity (and consistency) in analyzing lender performance (in any of the three HMDA goals listed above) over time.

(6) Data and other factual information about the accuracy of estimates of annual ongoing compliance and operational costs for HMDA reporters, or the analytical approach used to estimate these costs, as delineated in the Small Business Review Panel Report under the Small Business Regulatory Enforcement Fairness Act (SBREFA) that the Bureau convened and chaired in 2014;
a. Comments related to the nature and magnitude of any operational challenges in complying with the HMDA Rule. Are they significantly different from those delineated in the published Report of the Small Business Review Panel mentioned above? If so, how and how much?;
b. Comments delineating and describing the ongoing costs incurred in collecting and reporting information for the HMDA Rule. Are they significantly different from those delineated in the published Report of the Small Business Review Panel mentioned above? If so, how and how much?;

Polygon Research Comment: We’d like to invert this question and talk about the benefits all stakeholders can derive from a final recommendation we’d like to make. Polygon Research participated in the 2021 HMDA Tech Sprint sponsored by the CFPB. Our experience was very positive. The tech sprint embodied the truth that innovation comes through collaboration. However, future tech sprints could even better serve the core HMDA goals with one simple change: that the technology entered into the competition be open sourced – e.g. placed under an open source license and posted to GitHub – or, if the results fall short of working code, described in detail in a public post. This would keep participants from brining pre-baked solutions they’re not willing to share. The two biggest concerns voiced and pursued during the 2021 tech sprint were gaps in data quality, and bias. These problems need a lot of innovation, and therefore a lot of collaboration. When the government (the CFPB) sponsors an event, the work produced should be shared publicly.


Lyubomira Buresch
Chief Executive Officer

Greg Oliven
Chief Technology Officer