More Classification Techniques - K-Nearest Neighbors and Support Vector Machines

"Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write."
                                                                                                                       - H.G. Wells

In Chapter 3, Logistic Regression and Discriminant Analysis, we discussed using logistic regression to determine the probability that a predicted observation belongs to a categorical response what we refer to as a classification problem. Logistic regression was just the beginning of classification methods, with a number of techniques that we can use to improve our predictions.

In this chapter, we will delve into two nonlinear techniques: K-Nearest Neighbors (KNN) and Support Vector Machines (SVM). These techniques are more sophisticated than what we've discussed earlier because the assumptions on linearity can be relaxed, which means a linear combination of the features in order to define the decision boundary is not needed. Be forewarned though, that this does not always equal superior predictive ability. Additionally, these models can be a bit problematic to interpret for business partners and they can be computationally inefficient. When used wisely, they provide a powerful complement to the other tools and techniques discussed in this book. They can be used for continuous outcomes in addition to classification problems; however, for the purposes of this chapter, we will focus only on the latter.

After a high-level background on the techniques, we will lay out the business case and then put both of them to the test in order to determine the best method of the two, starting with KNN.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset