K-Nearest Neighbors and Naive Bayes

In the previous chapter, we have learned about computationally intensive methods. In contrast, this chapter discusses the simple methods to balance it out! We will be covering the two techniques, called k-nearest neighbors (KNN)and Naive Bayes here. Before touching on KNN, we explained the issue with the curse of dimensionality with a simulated example. Subsequently, breast cancer medical examples have been utilized to predict whether the cancer is malignant or benign using KNN. In the final section of the chapter, Naive Bayes has been explained with spam/ham classification, which also involves the application of the natural language processing (NLP) techniques consisting of the following basic preprocessing and modeling steps:

Punctuation removal
Word tokenization and lowercase conversion
Stopwords removal
Stemming
Lemmatization with POS tagging
Conversion of words into TF-IDF to create numerical representation of words
Application of the Naive Bayes model on TF-IDF vectors to predict if the message is either spam or ham on both train and test data

Table of Contents for K-Nearest Neighbors and Naive Bayes

Create new playlist

Sign In

Sign Up

Table of Contents for
K-Nearest Neighbors and Naive Bayes