Sentiment analysis

"We shall nobly save, or meanly lose, the last, best hope of earth.” 
Abraham Lincoln

In this section, we'll take a look at the various sentiment options available in tidytext. Then, we'll apply that to a subset of the data before, during, and after the Civil War. To get started, let's explore the sentiments dataset that comes with tidytext:

> table(sentiments$lexicon)

AFINN bing loughran nrc
2476 6788 4149 13901

The four sentiment options and researchers associated with them are as follows:

  • AFINN: Finn, Arup, and Nielsen
  • bing: Bing, Liu et al.
  • loughran: Loughran and McDonald
  • nrc: Mohammad and Turney

The AFINN sentiment categorizes words on a negative to positive scale from -5 to +5. The bing version has a simple binary negative or positive ranking; loughran provides six different categories including negative, positive, and such things as superfluous. With nrc, you get five categories such as anger or trust. Here is a glance at a few words and associated sentiment classification with nrc:

> get_sentiments("nrc")
# A tibble: 13,901 x 2
word sentiment
<chr> <chr>
1 abacus trust
2 abandon fear
3 abandon negative
4 abandon sadness
5 abandoned anger
6 abandoned fear
7 abandoned negative
8 abandoned sadness
9 abandonment anger
10 abandonment fear

You see that a word can have multiple sentiment categories. Let's see whether Lincoln expressed anger in his 1862 attempt to mollify his political opponents:

> nrc_anger <- tidytext::get_sentiments("nrc") %>% 
dplyr::filter(sentiment == "anger")

> sotu_tidy %>%
dplyr::filter(year == 1862) %>%
dplyr::inner_join(nrc_anger) %>%
dplyr::count(word, sort = TRUE)
Joining, by = "word"
# A tibble: 62 x 2
word n
<chr> <int>
1 slavery 13
2 slave 12
3 demand 5
4 force 5
5 money 5
6 abolish 4
7 rebellion 4
8 cash 3
9 deportation 3
10 fugitive 3
# ... with 52 more rows

OK, that is interesting and might be an indication of the challenge of taking qualitative sentiment rankings developed recently and applying them to historical documents. We'll expand the analysis now by looking at addresses from 1853 to 1872 using the bing sentiment technique. We will build a data frame of the total positive and negative sentiment, using that to calculate an overall sentiment score for each year:

> sentiment <- sotu_tidy %>%
dplyr::inner_join(tidytext::get_sentiments("bing")) %>%
dplyr::filter(year > 1852 & year <1873) %>%
dplyr::count(president, year, sentiment) %>%
tidyr::spread(sentiment, n, fill = 0) %>%
dplyr::mutate(sentiment = positive - negative) %>%
dplyr::arrange(year)
Joining, by = "word"

You can explore that on your own, but in the meantime, here is a plot of sentiment by president and year:

> ggplot2::ggplot(sentiment, ggplot2::aes(year, sentiment, fill = president)) +
ggplot2::geom_col(show.legend = FALSE) +
ggplot2::facet_wrap(~ president, ncol = 2, scales = "free_x") +
ggthemes::theme_pander()

The output of the preceding code is as follows:

The pre-war Presidents had negative sentiment, I guess as things fell apart. Arguably, Buchanan was the worst President ever. Not even Jimmy Carter was as bad. It is interesting how positive Grant is, given, the difficulties of reconstruction, having to fight a near-guerrilla war in the south. He is as underrated a President as there is. Enough of my historical ruminations. It is an easy task to find and portray sentiment in text data using tidytext. Indeed, here is an example of most what words are driving positive or negative sentiment:

> sotu_tidy %>%
dplyr::inner_join(tidytext::get_sentiments("bing")) %>%
dplyr::count(word, sentiment, sort = TRUE) %>%
dplyr::ungroup()
Joining, by = "word"
# A tibble: 3,592 x 3
word sentiment n
<chr> <chr> <int>
1 peace positive 2021
2 free positive 1306
3 progress positive 1157
4 support positive 961
5 protection positive 864
6 proper positive 840
7 recommend positive 836
8 debt negative 795
9 freedom positive 744
10 secure positive 724
# ... with 3,582 more rows

Peace is the number one positive word, despite its elusiveness, and the number one negative word is debt. Oh well, good luck with that!

One of the things to consider in processing text is what resolutions of it help facilitate learning. We've done just words up to this point, let's shift gears to word combinations or n-grams.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset