A convenient plotting method is available to immediately get the relative importance of predictors. It is wrapped within the varImpPlot() function:
varImpPlot(random_forest)
As you can see, we get two relevant messages here, one related to the general behavior of our metrics and one specific to our data:
- Accuracy and the Gini index tend to behave in a really coherent way, since if you derive a ranking of variable importance employing the two of them, you will come close to the same rank
- Random forest is confirming conclusions drawn from other models, that is, that company revenues and the ROS index are relevant variables to predict if a customer will default or not
We are now done with random forests, and it is time for the final clash: defining a list of probably defaulted counterparts combining results from different models. But first, I want to share with you the random forest cheat sheet, as done for all of the other models.