Modeling tweet topics

In machine learning and natural language processing, a topic model is a type of statistical model used to discover the abstract topics that occur in a collection of documents. A good example or use case to illustrate this concept is Twitter. Suppose we could analyze an individual's (or an organization's) tweets to discover any overriding trend. Let's look at a simple example.

If you have a Twitter account, you can perform this exercise pretty easily (you can then apply the same process to an archive of tweets you want to focus on and/or model). First, we need to create a tweet archive file.

Under Settings, you can submit a request to receive your tweets in an archive file. Once it's ready, you'll get an email with a link to download it:

Modeling tweet topics

And then save your file locally:

Modeling tweet topics

Now that we have a data source to work with, we can move the tweets into a list object (we'll call it x) and then convert that into an R data frame object (df1):

Modeling tweet topics

The tweets were first converted to a data frame before using the R tm package to convert them to a corpus or Corpus collection (of text documents) object:

Modeling tweet topics

Next, we convert the Corpus to a Document-Term Matrix object with the following code. This creates a mathematical matrix that describes the frequency of terms that occur in a collection of documents, in this case, our collection of tweets:

Modeling tweet topics

Word clouding

After building a document-term matrix (shown earlier), we can more easily show the importance of the words found within our tweets with a word cloud (also known as a tag cloud). We can do this using the R package wordcloud:

Word clouding

Finally, let's generate the word cloud visual:

Word clouding

Seems like there may be a theme involved here! The word cloud shows us that the words south and carolinas are the most important words.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset