Summary

In this chapter, we learned about classification using naive Bayes. This algorithm constructs tables of probabilities that are used to estimate the likelihood that new examples belong to various classes. The probabilities are calculated using a formula known as Bayes' theorem, which specifies how dependent events are related. Although Bayes' theorem can be computationally expensive to process, a simplified version that makes so-called "naive" assumptions about the independence of features is capable of being used with extremely large datasets.

The naive Bayes classifier is often used for text classification. To illustrate its effectiveness, we employed naive Bayes on a classification task involving filtering spam SMS messages. Preparing the text data for analysis required the use of specialized R packages for text processing and visualization. Ultimately, the model was able to classify nearly 98 percent of all SMS messages correctly as spam or ham.

In the next chapter, we will examine a set of two more machine learning methods. Each performs classification by partitioning data into groups of similar values.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset