Model comparison and selection

We have explored various machine learning techniques and built several models to predict the credit ratings of customers, so now comes the question of which model we should select and how the models compare against each other. Our test data has 130 instances of customers with a bad credit rating (0) and 270 customers with a good credit rating (1).

If you remember, earlier we had talked about using domain knowledge and business requirements after doing modeling to interpret results and make decisions. Right now, our decision is to choose the best model to maximize profits and minimize losses for the German bank. Let us consider the following conditions:

  • If we incorrectly predict a customer with bad credit rating as good, the bank will end up losing the whole credit amount lent to him since he will default on the payment and so loss is 100%, which can be denoted as -1 for our ease of calculation.
  • If we correctly predict a customer with bad credit rating as bad, we correctly deny him a credit loan and so there is neither any loss nor any profit.
  • If we correctly predict a customer with good credit rating as good, we correctly give him the credit loan. Assuming the bank has an interest rate on the sum of money lent, let us assume the profit is 30% from the interest money that is paid back monthly by the customer. Therefore, profit is denoted as 30% or +0.3 for our ease of calculation.
  • If we incorrectly predict a customer with good credit rating as bad, we incorrectly deny him the credit loan but there is neither any profit nor any loss involved in this case.

Keeping these conditions in mind, we will make a comparison table for the various models, including some of the metrics we had calculated earlier for the best model for each machine learning algorithm. Remember that considering all the model performance metrics and business requirements, there is no one model that is the best among them all. Each model has its own set of good performance points, which is evident in the following analysis:

Model comparison and selection

The cells highlighted in the preceding table show the best performance for that particular metric. As we mentioned earlier, there is no best model and we have listed down the models that have performed best against each metric. Considering the total overall gain, decision tree seems to be the model of choice. However, this is assuming that the credit loan amount requested is constant per customer. Remember that if each customer requests loans of different amounts then this notion of total gain cannot be compared because then the profit from one loan might be different to another and the loss incurred might be different on different loans. This analysis is a bit complex and out of the scope of this chapter, but we will mention briefly how this can be computed. If you remember, there is a credit.amount feature, which specifies the credit amount requested by the customer. Since we already have the customer numbers in the training data, we can aggregate the rated customers with their requested amount and sum up the ones for which losses and profits are incurred, and then we will get the total gain of the bank for each method!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset