Multiclass classification

There are a number of approaches to learning in multiclass problems. Techniques such as random forest and discriminant analysis will deal with multiclass while some techniques and/or packages will not, for example, generalized linear models, glm(), in base R. As of this writing, the caretEnsemble package, unfortunately, will not work with multiclasses. However, the Machine Learning in R (mlr) package does support multiple classes and also ensemble methods. If you are familiar with sci-kit Learn for Python, one could say that mlr endeavors to provide the same functionality for R. The mlr and the caret-based packages are quickly turning into my favorites for almost any business problem. I intend to demonstrate how powerful the package is on a multiclass problem, then conclude by showing how to do an ensemble on the Pima data.

For the multiclass problem, we will look at how to tune a random forest and then examine how to take a GLM and turn it into a multiclass learner using the "one versus rest" technique. This is where we build a binary probability prediction for each class versus all the others, then ensemble them together to predict an observation's final class. The technique allows you to extend any classifier method to multiclass problems, and it can often outperform multiclass learners.

One quick note: don't confuse the terminology of multiclass and multilabel. In the former, an observation can be assigned to one and only one class, while in the latter, it can be assigned to multiple classes. An example of this is text that could be labeled both politics and humor. We will not cover multilabel problems in this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset