Getting to know the Bayes theorem

At its core, the Naïve Bayes classification is nothing more than keeping track of which feature gives evidence to which class. The way the features are designed determines the model that is used to learn. The so-called Bernoulli model only cares about Boolean features; whether a word occurs only once or multiple times in a tweet does not matter. In contrast, the Multinomial model uses word counts as features. For the sake of simplicity, we will use the Bernoulli model to explain how to use Naïve Bayes for sentiment analysis. We will then use the Multinomial model later to set up and tune our real-world classifiers.

Let's assume the following meanings for the variables that we will use to explain Naïve Bayes:

Variable

Meaning

c

This is the class of a tweet (positive or negativefor this explanation, we ignore the neutral label)

F1

The word awesome occurs at least once in the tweet

F2

The word crazy occurs at least once in the tweet

 

During training, we learned the Naïve Bayes model, which is the probability for a  class when we already know features and . This probability is written as .

Since we cannot estimate directly, we apply a trick, which was found out by Bayes:

If we substitute with the probability of both awesome and crazy, and think of as being our  class, we arrive at the relationship that helps us to later retrieve the probability for the data instance belonging to the specified class:

This allows us to express by means of the other probabilities:

We could also describe this as follows:

The prior and the evidence are easily determined:

  • is the prior probability of the  class without knowing about the data. We can estimate this quantity by simply calculating the fraction of all training data instances belonging to that particular class.
  • is the evidence or the probability of features and

The tricky part is the calculation of the likelihood . It is the value describing how likely it is to see the  and  feature values if we know that the class of the data instance is . To estimate this, we need to do some thinking.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset