Chapter 5. Credit Risk Detection and Prediction – Descriptive Analytics

In the last two chapters, you saw some interesting problems revolving around the retail and e-commerce domains. You now know how to detect and predict shopping trends from shopping patterns as well as how to build recommendation systems. If you remember from Chapter 1, Getting started with R and Machine Learning that the applications of machine learning are diverse, we can apply the same concepts and techniques to solve a wide variety of problems in the real world. We will be tackling a completely new problem here, but hold on to what you have learnt because several concepts you learnt previously will come in handy soon!

In the next couple of chapters, we will be tackling a new problem related to the financial domain. We will be looking at the bank customers of a particular German bank who could be credit risks for the bank, based on some data that has been previously collected. We will perform descriptive and exploratory analysis on this data to highlight different potential features in the dataset and also look at their relationship with credit risk. In the next step, we will be building predictive models using machine learning algorithms and these data features to detect and predict customers who could be potential credit risks. You may remember that the two main things that we need to do this analysis to remain unchanged are data and algorithms.

You might be surprised to know that risk analysis is one of the top most focus areas of financial organizations including in banks, investment firms, insurance firms, and brokerage firms. Each of these organizations often has dedicated teams for solving problems revolving around risk analysis. Some examples of risk which are frequently analyzed include credit risk, sales risk, fraud related risks, and many more.

In this chapter, we will be focusing on the following topics:

  • Descriptive analytics of our credit risk dataset
  • Domain knowledge of the credit risk problem
  • Detailed analysis of dataset features
  • Exploratory analysis of the data
  • Visualizations on various data features
  • Statistical tests to determine feature significance

Always remember that domain knowledge is essential before solving any machine learning problem because otherwise we will end up applying random algorithms and techniques blindly which may not give the right results.

Types of analytics

Before we start tackling our next challenge, it will be useful to get an idea of the different types of analytics which broadly encompass the data science domain. We use a variety of data mining and machine learning techniques to solve different data problems. However, depending on the mechanism of the technique and its end result, we can broadly classify analytics into four different types which are explained next:

  • Descriptive analytics: This is what we use when we have some data to analyze. We start with looking at the different attributes of the data, extract meaningful features, and use statistics and visualizations to understand what has already happened. The main aim of descriptive analytics is to get a broad idea of what kind of data we are dealing with and summarize what has happened in the past. Above almost 80% of all analytics in businesses today are descriptive.
  • Diagnostic analytics: This is sometimes clubbed together with descriptive analytics. Here the main objective is to delve deeper into the data to find specific patterns and answer questions such as why did this occur. Usually, it involves root-cause analysis to come to the root of why something happened and what were the main factors involved in doing during its occurrence. Sometimes techniques such as regression modeling help in achieving this.
  • Predictive analytics: This is the final step in any analytics pipeline. Once you have built consistent and stable predictive models with a good flow of clean data for predictions, you can build systems which utilize this and start prescribing actions which you might take to improve your business. Do remember that predictive modeling can only predict what might happen in the future because all models are probabilistic in nature and nothing is 100 percent certain.
  • Prescriptive analytics: This is the final step in any analytics pipeline if you are in the stage that you have built consistent predictive models with a good flow of clean data such that you are able to predict what might happen in the future. Then you can build systems which utilize this and start prescribing actions which you might take to improve your business. Do remember that you need working predictive models with good data and an excellent feedback mechanism to achieve this.

Most organizations do a lot of descriptive analytics and some amount of predictive analytics. However, it is really difficult to implement prescriptive analytics due to the ever changing business conditions and data streams and problems associated with that, the most common one being data sanitization issues. We will be touching upon descriptive analytics in this chapter before moving on to predictive analytics in the next chapter to solve our problem related to credit risk analytics.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset