Chapter 5. Naïve Bayes Classifiers

This chapter introduces the most common and simple generative classifiers—Naïve Bayes. As a reminder, generative classifiers are supervised learning algorithms that attempt to fit a joint probability distribution, p(X,Y), of two events X and Y, representing two sets of observed and hidden (or latent) variables, x and y.

In this chapter, you will learn, and hopefully appreciate, the simplicity of the Naïve Bayes technique through a concrete example. Then, you will build a Naïve Bayes classifier to predict stock price movement, given some prior technical indicators in the analysis of financial markets.

Finally, you will apply Naïve Bayes to text mining by predicting stock prices, using financial news feed and press releases.

Probabilistic graphical models

Let's start with a refresher course in basic statistics.

Given two events or observations, X and Y, the joint probability of X and Y is defined as Probabilistic graphical models. If the observations X and Y are not related, an assumption known as conditional independence, then p(X,Y) = p(X).p(Y). The conditional probability of event Y, given X, is defined as p(Y|X)=p(X,Y)/p(X).

These two definitions are quite simple. However, probabilistic reasoning can be difficult to read in the case of large numbers of variables and sequences of conditional probabilities. As a picture is worth a thousand words, researchers introduced graphical models to describe a probabilistic relation between random variables [5:1].

There are two categories of graphs, and therefore, graphical models:

  • Directed graphs such as Bayesian networks
  • Undirected graphs such as conditional random fields (refer to the Conditional random fields section in Chapter 7, Sequential Data Models)

Directed graphical models are directed acyclic graphs that have been introduced to:

  • Provide a simple way to visualize a probabilistic model
  • Describe the conditional dependence (or independence) between variables
  • Represent statistical inference in terms of graphical manipulation

A Bayesian network is a directed graphical model defining a join probability over a set of variables [5:2].

The two join probabilities, p(X,Y) and p(X,Y,Z), can be graphically modeled using Bayesian networks, as follows:

Probabilistic graphical models

Examples of probabilistic graphical models

The conditional probability p(Y|X) is represented by an arrow directed from the output (or symptoms) Y to the input (or cause) X. Elaborate models can be described as a large directed graph between variables.

Tip

Metaphor for graphical models

From a software engineering perspective, graphical models visualize probabilistic equations the same way the UML class diagram visualizes object-oriented source code.

Here is an example of a real-world Bayesian network; the functioning of a smoke detector:

  1. A fire may generate smoke.
  2. Smoke may trigger an alarm.
  3. A depleted battery may trigger an alarm.
  4. The alarm may alert the homeowner.
  5. The alarm may alert the fire department.
    Probabilistic graphical models

    A Bayesian network for smoke detectors

This representation may be a bit counterintuitive, as the vertices are directed from the symptoms (or output) to the cause (or input). Directed graphical models are used in many different models, besides Bayesian networks [5:3].

Note

Plate models

There are several alternate representations of probabilistic models, besides the directed acyclic graph, such as the plate model commonly used for the latent Dirichlet allocation (LDA) [5:4].

The Naïve Bayes models are probabilistic models based on the Bayes's theorem under the assumption of features independence, as mentioned in the Generative models section in Chapter 1, Getting Started.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset