Evaluating the model

To illustrate why this is more useful, let's run the following scenario on two sample series:

a = pd.Series([10,10,10,10]) 
b = pd.Series([12,8,8,12]) 
 
np.sqrt(np.mean((b-a)**2))/np.mean(a)

This generates the following output:

Now, compare that to the mean:

(b-a).mean()

This generates the following output:

Clearly, the latter is the more meaningful statistic. Now, let's run it for our model:

np.sqrt(np.mean((y_pred-y_actual)**2))/np.mean(y_actual)

This generates the following output:

Suddenly, our awesome model looks a lot less awesome. Let's take a look at some of the predictions our model made versus the actual values that can be seen in the data:

deltas[['predicted','actual']].iloc[:30,:].plot(kind='bar', figsize=(16,8))

The preceding code generates the following output:

Based on what we can see here, the model—at least for this sample—tends to modestly underpredict the virality of the typical article, but then heavily underpredicts the virality for a small number. Let's see what those are:

all_data.loc[test_index[:30],['title', 'fb']].reset_index(drop=True)

The preceding code results in the following output:

From the preceding output, we can see that an article on Malala and an article on a husband complaining about how much his stay-at-home wife costs him greatly overshot the predicted numbers of our model. Both would seem to have high emotional valence.

Table of Contents for Evaluating the model

Create new playlist

Sign In

Sign Up

Table of Contents for
Evaluating the model