How to predict credit risk

If you remember our main objective from the previous chapter, we were dealing with customer data from a German bank. We will quickly recap our main problem scenario to refresh your memory. These bank customers are potential candidates who ask for credit loans from the bank with the stipulation that they make monthly payments with some interest on the amount to repay the credit amount. In a perfect world there would be credit loans dished out freely and people would pay them back without issues. Unfortunately, we are not living in a utopian world, and so there will be customers who will default on their credit loans and be unable to repay the amount, causing huge losses to the bank. Therefore, credit risk analysis is one of the crucial areas which banks focus on where they analyze detailed information pertaining to customers and their credit history.

Now coming back to the main question, for predicting credit risk, we need to analyze the dataset pertaining to customers, build a predictive model around it using machine learning algorithms, and predict whether a customer is likely to default on paying the credit loan and could be labeled as a potential credit risk. The process which we will follow for achieving this is what we discussed in the previous section. You already have an idea about the data and features associated with it from the previous chapter. We will explore several predictive models, understand the concepts behind how the models work, and then build these models for predicting credit risk. Once we start predicting the outcomes, we will compare the performance of these different models and then talk about the business impact and how to derive insights from the model prediction outcomes. Do note that the predictions are not the output in the predictive analytics life cycle but the valuable insights that we derive from these predictions is the end goal. Businesses such as financial institutions get value only from using domain knowledge to translate prediction outcomes and raw numbers from machine learning algorithms to data driven decisions, which, when executed at the right time, help grow the business.

For this scenario, if you remember the dataset well, the feature credit.rating is the response or class variable, which indicates the credit rating of the customers. We will be predicting this value for the other customers based on other features which are independent variables. For modeling, we will be using machine learning algorithms which belong to the supervised learning family of algorithms. These algorithms are used for predictions and can be divided into two broad categories: classification and regression. However, they have some differences which we will talk about now. In the case of regression, the values for the variables to be predicted are continuous values, like predicting prices of houses based on different features such as the number of rooms, the area of the house, and so on. Regression mostly deals with estimating and predicting a response value based on input features. In the case of classification, the values for the variables to be predicted have discrete and distinct labels, such as predicting the credit rating for customers for our bank, where the credit rating can either be good, which is denoted by 1 or bad, which is denoted by 0. Classification mostly deals with categorizing and identifying group memberships for each data tuple in the dataset. Algorithms such as logistic regression are special cases of regression models which are used for classification, where the algorithm estimates the odds that a variable is in one of the class labels as a function of the other features. We will be building predictive models using the following machine learning algorithms in this chapter:

  • Logistic regression
  • Support vector machines
  • Decision trees
  • Random forests
  • Neural networks

We have chosen these algorithms to give a good flavor of the diverse set of supervised machine learning algorithms which are present, so that you gain knowledge not only about the concepts behind these models but also learn to implement building models using them, and compare model performances using various techniques. Before we begin our analysis, we will glance over some basic concepts in predictive modeling that are mentioned in this book and talk about some of them in detail so you get a good idea of what goes on behind the scenes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset