Naive Bayes classification

In the past example, we have seen with a single word called lottery, however, in this case, we will be discussing with a few more additional words such as Million and Unsubscribe to show how actual classifiers do work. Let us construct the likelihood table for the appearance of the three words (W1, W2, and W3), as shown in the following table for 100 emails:

When a new message is received, the posterior probability will be calculated to determine that email message is spam or ham. Let us assume that we have an email with terms Lottery and Unsubscribe, but it does not have word Million in it, with this details, what is the probability of spam?

By using Bayes theorem, we can define the problem as Lottery = Yes, Million = No and Unsubscribe = Yes:

Solving the preceding equations will have high computational complexity due to the dependency of words with each other. As a number of words are added, this will even explode and also huge memory will be needed for processing all possible intersecting events. This finally leads to intuitive turnaround with independence of words (cross-conditional independence) for which it got name of the Naive prefix for Bayes classifier. When both events are independent we can write P(A ∩ B) = P(A) * P(B). In fact, this equivalence is much easier to compute with less memory requirement:

In a similar way, we will calculate the probability for ham messages as well, as follows:

By substituting the preceding likelihood table in the equations, due to the ratio of spam/ham we can just simply ignore the denominator terms in both the equations. Overall likelihood of spam is:

After calculating the ratio, 0.008864/0.004349 = 2.03, which means that this message is two times more likely to be spam than ham. But we can calculate the probabilities as follows:

P(Spam) = 0.008864/(0.008864+0.004349) = 0.67

P(Ham) = 0.004349/(0.008864+0.004349) = 0.33

By converting likelihood values into probabilities, we can show in a presentable way for either to set-off some thresholds, and so on.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset