Got credit-card debt? Big data plots risk of default

March 18, 2015

About $3 trillion of consumer credit was outstanding as of August 2013, $840 billion of which was revolving credit. Nearly half of households carried positive credit-card balances as of December 2012, and the average credit-card debt as of October 2013 was $15,159.

Note: this is my final post about the online course “Tackling the Challenges of Big Data.” See links to my previous posts below, which cover applications ranging from commute optimization to medical diagnostics as well as enabling technologies ranging from GPUs to coresets.

“That’s an astonishing number, when you think about it,” said MIT Professor Andrew W. Lo, “because the typical consumer is paying on average 15% annual interest on that kind of credit card.” The high interest rates reflect the risk the credit card companies are taking. As of the second quarter of 2013, Lo said, 6.7% of all consumer credit outstanding was considered to be a loss. That percentage had reached as high as 10.2% in the first quarter of 2010.

Lo cited the above figures in the online course “Tackling the Challenges of Big Data.” Banks, he said, would like to predict risk cycles and determine when defaults are likely to happen. Standard credit scores aren’t good for this, he said, because they are insensitive to variation from one year to the next.

Lo and colleagues applied big-data and machine-learning concepts to the problem, starting with anonymized data from  consumer credit-card accounts as well as banking transactions from a major U.S. commercial bank. They employed a 1% sample over a few years and ended up with 10 terabytes of data, from which they attempted to extract interesting features. Exploratory data analysis is necessary to identify features that are truly interesting—not spurious, he said.

One feature they identified was that customers facing a significant income drop are, not surprisingly, more likely to default. With that as a starting point, they developed a model that took into account individual characteristics like income and ATM transaction history as well as macroeconomic features such as the unemployment rate and inflation levels. Their goal was to forecast who would default within 90 days, based on three months of training data. It’s important that your training and forecast periods not overlap, he said—unsurprisingly, you can very accurately forecast the data you use for training.

Lo and his colleagues used a three month training period, followed by a one-month delay and a three-month evaluation period in which they measured prediction accuracy. They evaluated 575,000 credit-card accounts, of which only 2.4% represented problems. “That seems pretty good,” he said, “until you realize that 2.4% happens to be 13,900 consumers that are going to be going delinquent.”

The analysis is subject to false positives (credit-worthy customers whom the model predicts will default) and false negatives (customers whom the model deemed credit-worthy who do default). The extent of these errors can be highlighted through the use of a confusion matrix, showing actual outcomes vs. model predictions.

Adjustment of a “classifier threshold” can adjust tradeoffs between the false positives and false negatives. The threshold can be optimized based on factors such as business objectives, risk appetite, capital requirements, and employment cycle. And, Lo said, the machine-learning model provides predictive information not available from traditional credit scores. Big data enhance consumer risk forecasts, he concluded, and machine learning can add value.

Lo and his colleagues’ research was published in “Consumer Credit Risk Models via Machine-Learning Algorithms” in the Journal of Banking & Finance.

See these related posts about the online course “Tackling the Challenges of Big Data”:

Sponsored Recommendations


To join the conversation, and become an exclusive member of Electronic Design, create an account today!