This chapter introduces the most common and simple generative classifiers—Naïve Bayes. As a reminder, generative classifiers are supervised learning algorithms that attempt to fit a joint probability distribution, p(X,Y), of two events X and Y, representing two sets of observed and hidden (or latent) variables, x and y.
In this chapter, you will learn, and hopefully appreciate, the simplicity of the Naïve Bayes technique through a concrete example. Then, you will build a Naïve Bayes classifier to predict stock price movement, given some prior technical indicators in the analysis of financial markets.
Finally, you will apply Naïve Bayes to text mining by predicting stock prices, using financial news feed and press releases.
Let's start with a refresher course in basic statistics.
Given two events or observations, X and Y, the joint probability of X and Y is defined as . If the observations X and Y are not related, an assumption known as conditional independence, then p(X,Y) = p(X).p(Y). The conditional probability of event Y, given X, is defined as p(Y|X)=p(X,Y)/p(X).
These two definitions are quite simple. However, probabilistic reasoning can be difficult to read in the case of large numbers of variables and sequences of conditional probabilities. As a picture is worth a thousand words, researchers introduced graphical models to describe a probabilistic relation between random variables [5:1].
There are two categories of graphs, and therefore, graphical models:
Directed graphical models are directed acyclic graphs that have been introduced to:
A Bayesian network is a directed graphical model defining a join probability over a set of variables [5:2].
The two join probabilities, p(X,Y) and p(X,Y,Z), can be graphically modeled using Bayesian networks, as follows:
The conditional probability p(Y|X) is represented by an arrow directed from the output (or symptoms) Y to the input (or cause) X. Elaborate models can be described as a large directed graph between variables.
Here is an example of a real-world Bayesian network; the functioning of a smoke detector:
This representation may be a bit counterintuitive, as the vertices are directed from the symptoms (or output) to the cause (or input). Directed graphical models are used in many different models, besides Bayesian networks [5:3].
The Naïve Bayes models are probabilistic models based on the Bayes's theorem under the assumption of features independence, as mentioned in the Generative models section in Chapter 1, Getting Started.