Detecting trends in time series

In previous sections, we have analyzed the most frequent keywords and phrases without taking into account the time frame. However, a brand can benefit from a temporal dimension and dynamic analysis of the content of posts and comments.

Our goal in this section is to analyze in time series the moments of highest engagement to posts in terms of likes and shares and then see what those posts were about.

Firstly, we convert a string with a date into a datetime object:

df_comments['date'] = df_comments['created_time'].apply(pd.to_datetime) 

The next operation transforms the data frame into a time series and creates an index on a datetime object:

df_comments_ts = df_comments.set_index(['date']) 

Finally, we subset our data frame to only get the verbatims since the beginning of 2015:

df_comments_ts = df_comments_ts['2015-01-01':] 

We have to execute the same operation on our data frame containing the post to be able to make comparisons:

df_posts['date'] = df_posts['created_time'].apply(pd.to_datetime) 
df_posts_ts = df_posts.set_index(['date']) 
df_posts_ts = df_posts_ts['2015-01-01':] 

In the next step, we will visualize the results to see what are the weeks where brand posts had the highest and lowest number of shares and likes.

The choice of weeks instead of days or months is purely arbitrary.

We create a data frame that contains the average number of likes and shares per week:

dx = df_posts_ts.resample('W').mean() 
dx.index.name = 'date' 
dx = dx.reset_index() 

Then, we plot a chart showing the progression of likes over time:

p = ggplot(dx, aes(x='date', y = 'likes')) + geom_line( 
p = p + xlab("Date") + ylab("Number of likes") + ggtitle("Facebook Google Page") 
print(p) 

Then, we apply the same method on shares:

p = ggplot(dx, aes(x='date', y = 'shares')) + geom_line( 
p = p + xlab("Date") + ylab("Number of shares") + ggtitle("Facebok Google Page") 
print(p) 

From the two plots, we can see that there are a few peaks that generated the most amount of likes and shares. There's a distinct peak in September 2015 in terms of likes, while there is a big peak during the end of 2016 in terms of shares.

It is interesting to check what is the vocabulary behind the peaks of likes and shares. We will investigate these periods by comparing words used in brand posts and user comments.

We define a function that takes as a parameter a time series of posts, a time series of comments, the name of a column containing keywords, and the criterion of comparison ('shares' or 'likes'):

def max_wordcloud(ts_df_posts,ts_df_comments,columnname,criterium='shares'):  

Firstly, the function computes an average number of shares/likes per week:

    mean_week = ts_df_posts.resample('W').mean()  

Then, it searches for the first day and last day of the global peak on posts time series:

    start_week = (mean_week[criterium].idxmax() - datetime.timedelta(days=7)).strftime('%Y-%m-%d') 
    end_week = mean_week['shares'].idxmax().strftime('%Y-%m-%d')  

It creates wordclouds with the previously defined function:

    viz_wordcloud(ts_df_posts[end_week:start_week],columnname) 
    viz_wordcloud(ts_df_comments[start_week:end_week],columnname) 

This code allows us to visualize the most popular vocabulary in the defined timeseries that generates the most number of shares and likes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset