Data frame creation

As per an old joke and bit of wisdom: 

"How can you tell when a politician is lying? Their lips are moving!"

If not already done, please install the following packages, and call the magrittr and sotu libraries:

> install.packages("ggplot2")

> install.packages("ggraph")

> install.packages("igraph")

> install.packages("quanteda")

> install.packages("qdap")

> install.packages("tidytext")

> install.packages("tidyverse")

> install.packages("sotu")

> install.packages("topicmodels")

> library(magrittr)

> library(sotu)

Since the data is located within the sotu package, we needed to call it to create the objects of the data like this:

> data(sotu_text)

> data(sotu_meta)

It is easy to turn this into a data frame with everything we need by adding the raw text to the metadata:

> sotu_meta$text <- sotu_text

Here are the column names. I recommend you spend a few minutes exploring this data on your own as well:

> colnames(sotu_meta)
[1] "president" "year" "years_active" "party" "sotu_type"
[6] "text"

The text column has the data of interest in a character string. Before we start analyzing the data, we need to tokenize the text and link it to each President. What does that mean? It means we put one token per row per document. A token can be a character, a word, an n-gram combination of words, or a sentence. This will set us up for applying tidy format procedures:

> sotu_meta %>%
tidytext::unnest_tokens(word, text) -> sotu_unnest

All we did was just tell the unnest_tokens() function to take the column text and turn it into a column called word. The function we shall see accommodates n-grams but defaults to words. It also automatically removes all capitalization. When we tackle n-grams, we'll set that to false. Here is what the new tibble created looks like:

> sotu_unnest
# A tibble: 1,965,212 x 6
president year years_active party sotu_type word
<chr> <int> <chr> <chr> <chr> <chr>
1 George Washington 1790 1789-1793 Nonpartisan speech fellow
2 George Washington 1790 1789-1793 Nonpartisan speech citizens
3 George Washington 1790 1789-1793 Nonpartisan speech of
4 George Washington 1790 1789-1793 Nonpartisan speech the
5 George Washington 1790 1789-1793 Nonpartisan speech senate
6 George Washington 1790 1789-1793 Nonpartisan speech and
7 George Washington 1790 1789-1793 Nonpartisan speech house
8 George Washington 1790 1789-1793 Nonpartisan speech of
9 George Washington 1790 1789-1793 Nonpartisan speech representatives
10 George Washington 1790 1789-1793 Nonpartisan speech i
# ... with 1,965,202 more rows

With our data ready, let's get started. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset