## When we talk about mortgage data science, we’re talking about using modern analytics platforms to ingest the large volume of data generated within the mortgage industry in order to answer questions, solve problems, and guide decision-making.

We reflect on these questions and answers in our

**MCGI Mondays**blog series, MCGI being our wrapper around leading industry use cases: Measure-Compete-Grow-Include. In our data science work at Polygon Research, we’re increasingly turning to supervised machine learning:

…prediction problems where we have a dataset for which we already know the outcome of interest (e.g. past house prices) and want to learn to predict the outcome for new data. …The goal of supervised learning is to learn a predictive model that maps features of the data (e.g. house size, location, floor type, …) to an output (e.g. house price). If the output is categorical, the task is called classification, and if it is numerical, it is called regression…A fully trained machine learning model can then be used to make predictions for new instances. (source)

Much of our focus is on the prediction itself, but it’s also important to look at the drivers of the prediction – the “why” in the title of this post. In machine learning, this question is discussed in terms of feature importance, one important measure of which are Shapley values (a measurement derived from coalitional game theory). Understanding the relative importance of predictors achieves two important goals. First, it adds transparency to your model, avoiding black box ML. Second, it allows you to compare and contrast your predictions with your own domain expertise, research, and intuition – what some (e.g. here and here) call Augmented Intelligence (AI).

So, with a new year, and this new blog post series, we look forward to sharing how we’re using Augmented Intelligence in our products and especially how it is answering questions and solving problems for our customers.