The “thin file” problem, as the name suggests, is when there is not sufficient information for a lender or an insurer to make a decision when issuing credit or underwriting an insurance policy.

This lack of information disproportionately effects young people, new immigrants, and expatriates, as traditional measures of creditworthiness such as a payment history are minimal to non-existent. Cutting across socio-economic boundaries, in the United States alone, 45 million consumers live without credit scores and 50 percent of adults around the world do not have access to credit.

Even considering ‘thick file’ customers for whom credit scores are readily attainable, companies are not able to distinguish well within categories: high, medium, and low risk. This limitation around lack of granularity in understanding stems from data silos, hesitation of companies to use non-traditional data sources due to regulatory risk, consumer privacy concerns, and confidence in new methods to determine creditworthiness.

However, considering the plethora of information that companies can leverage to determine creditworthiness, even traditional “thick files” increasingly look “thin.”

Non-traditional data sources are key

Ninety percent of U.S. lenders use FICO credit scores, and its use is pervasive in our lives with applications well beyond personal finance. Some examples include what types of cell phone contracts are offered, employment eligibility, and even some hospitals checking scores before patient admission to get a potential read on collections activities.

At its core, FICO relies on classical measures of creditworthiness including debt burden, payment history, types of loans, and the number of credit applications. More recently, companies have sought alternative measures of creditworthiness to capture new opportunities and better distinguish among customers.

Businesses also now leverage various new data sources, including social network structures, social connection attributes, personal attributes (e.g., music preferences, career track), and psychometric tests (e.g., how people respond to adversity).

Thus, a new type of credit score has emerged, one that is inherently social and provides a richer and more nuanced view into a person’s creditworthiness. To leverage such potential, big data technologies for information management must be brought to bear with new scoring methods as per data science. In particular, companies must have the ability to traverse a social graph either by leveraging partners or building the capability internally.

In addition to timely data access, models must be validated as machine learning-based approaches have key dependencies on data sampling for reliable performance in the long term. However, there are not only risks associated with getting new credit scoring models right, namely, the very algorithms used to glean creditworthiness can explicitly be used to determine private and protected information about individuals. This poses a risk of consumer backlash and regulatory ire that companies must address with increasing openness.

Engage the consumer

As simple as it sounds, companies should ask customers whether they want non-traditional data used to make enhanced credit decisions. For millions of consumers, the alternative is no or expensive access to credit as well as higher insurance premiums. Importantly, there are responsible ways to capture value and present opportunity to customers.

As a precedent, several forward looking auto insurance companies provide OBD-II (On Board Diagnostics) port sensors for free, and ask for consent to collect and analyze driving information.

The proposition to the consumer distills thusly: allow me to use more data to make a determination; in the best case I will lower your premium and in the worst case I will keep your premium the same. Although it can be argued that social network data may be more sensitive, the key is consent and a digital experience that makes it worthwhile for the customer to choose. In this way, analytics acts as a sensor to detect patterns for social scoring and the output is what fills “thin files.”

As more companies adopt this approach to lead their industries and solve the “thin file” problem, the challenge will shift to prioritizing “thick files,” which will be all that remain.

(About the author: Dr. Armen Kherlopian is a vice president in the Chief Science Office of the Analytics & Research practice at Genpact, a global leader in digitally-powered business process management and services. He has provided high impact data science advisory services for Global Fortune 100 companies and government organizations such as NASA. He is a co-author of the Field Guide to Data Science, which has been downloaded more than 15,000 times. Dr. Kherlopian is a Lindau Nobel Laureate Fellow, and holds a B.S. and M.S. in biomedical engineering with a focus on algorithms from Columbia University, a Ph.D. in biophysics with a focus on machine learning from Cornell University, and completed a fellowship in high performance computing at Princeton University.)

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access