Interpreting confusion matrix results

How do you think you could interpret those results? Looking at all of the computed metrics, we generally see that random forest is performing better than the other two models, but which metrics should we focus on? As is always the case, to answer this question, we should focus on the objective of our analysis, we want to predict from a list of customers which ones have probably defaulted. Since our output will then be employed by the internal audit department to perform further analyses, we could also afford the inconvenience of selecting a counterpart and then discovering that it is not actually defaulted. On the other hand, not selecting a counterpart that is actually a defaulted one could constitute a serious problem for the following steps of analyses.

We are saying here that we care more about type II errors, that is, false negatives, than about type I errors, that is, false positives, where positive here is the prediction of a default. That is why simply considering accuracy, which overall measures how good a model is at avoiding type I and II errors, wouldn't be the best possible way. Going back to the list of computed metrics, which do you think is the best one to correctly take into consideration type II errors? Among the most relevant for us, we should necessarily include precision, which exactly measures how many of the real positives were detected from the model, and by that way also how many were not detected.

Looking at the printed out confusion matrix, we can now produce a simple plot to compare the three models on these metrics:

data <- data.frame(model = c("logistic",
"support_vector",
"random_forest"),
precision = c(8352/(164+8352),
8356/(160+8356),
7923/(7923+593)))

ggplot(data = data, aes(x = model,y = precision, label = round(precision,2)))+
geom_bar(stat = 'identity')+
geom_text()

As you can see, on the precision index, our random forest is performing slightly worse then the other models, even if they overall perform quite well. Let's now see if employing results together from all of these three models will lead to an increase in performance in terms of precision.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset