Using Naïve Bayes to classify

Given a new tweet, the only part left is to calculate the probabilities:

Then choose the class as having the higher probability.

As for both classes, the denominator, , is the same, we can simply ignore it without changing the winner class.

Note, however, that we don't calculate any real probabilities any more. Instead, we are estimating which class is more likely given the evidence. This is another reason why Naïve Bayes is so robust: it is not interested in the real probabilities, but only in the information, which class is more likely. In short, we can write:

This simply says that we are calculating the part after argmax for all classes of (pos and neg in our case) and returning the class that results in the highest value.

But, for the following example, let's stick to real probabilities and do some calculations to see how Naïve Bayes works. For the sake of simplicity, we will assume that Twitter allows only for the two aforementioned words, awesome and crazy, and that we have already manually classified a handful of tweets:

Tweet	Class
awesome	Positive tweet
awesome	Positive tweet
awesome crazy	Positive tweet
crazy	Positive tweet
crazy	Negative tweet
crazy	Negative tweet

In this example, we have the crazy tweet in both a positive and negative tweet to emulate some ambiguities you will often find in the real world (for example, being crazy about soccer versus a crazy idiot).

In this case, we have six total tweets, out of which four are positive and two negative, which results in the following priors:

This means, without knowing anything about the tweet itself, that it would be wise to assume the tweet to be positive.

We're still missing the calculation of and , which are the probabilities for the two features, and , conditioned in the class.

This is calculated as the number of tweets in which we have seen the concrete feature divided by the number of tweets that have been labeled with the class of . Let's say we want to know the probability of seeing awesome occurring in a tweet, knowing that its class is positive, we will have:

This is because out of the four positive tweets, three contained the word awesome. Obviously, the probability for not having awesome in a positive tweet is its inverse:

Similarly, for the rest (omitting the case that a word does not occur in a tweet):

For the sake of completeness, we will also compute the evidence so that we can see real probabilities in the following example tweets. For two concrete values of and , we can calculate the evidence as follows:

This leads to the following values:

Now we have all the data to classify new tweets. The only work left is to parse the tweet and analyze it's features it:

Tweet			Class probabilities	Classification
awesome	1	0		Positive
crazy	0	1		Negative
awesome crazy	1	1		Positive

So far, so good. The classification of trivial tweets seems to assign correct labels to the tweets. The question remains, however, how we should treat words that did not occur in our training corpus. After all, with the preceding formula, new words will always be assigned a probability of zero.

Table of Contents for Using Na&#xEF;ve Bayes to classify

Create new playlist

Sign In

Sign Up

Table of Contents for
Using Naïve Bayes to classify