Polarity scores

We are going to leverage the sentimentr R package to learn the sentiments of the articles we have collected.
Let's look at how to score using the sentiment function :

> sentiment.score <- sentiment(match.refined$TITLE)
> head(sentiment.score)
   element_id sentence_id word_count   sentiment
1:          1           1          8  0.00000000
2:          2           1         11  0.00000000
3:          3           1          9 -0.13333333
4:          4           1          9 -0.08333333
5:          5           1         11  0.07537784
6:          6           1          9  0.00000000
>

The sentiment function in sentimentr calculates a score between -1 and 1 for each of the articles. In fact, if a text has multiple sentences, it will calculate the score for each sentence. A score of -1 indicates that the sentence has a very negative polarity. A score of 1 means that the sentence is very positive. A score of 0 refers to the neutral nature of the sentence.

However, we need the score at an article level and not at a sentence level, so we can take an average value of the score across all the sentences in a text.

Calculate the average value of the sentiment scores for each article:

> sentiment.score <- sentiment.score %>% group_by(element_id) %>%
+   summarise(sentiment = mean(sentiment))
> head(sentiment.score)
# A tibble: 6 x 2
  element_id   sentiment
       <int>       <dbl>
1          1  0.00000000
2          2  0.00000000
3          3 -0.13333333
4          4 -0.08333333
5          5  0.07537784
6          6  0.00000000

Here, the element_id refers to the individual article. By grouping element_id and calculating the average, we can get the sentiment score at an article level. We now have the scores for each article.

Let's update the match.refined data frame with the polarity scores:

> match.refined$polarity <- sentiment.score$sentiment
> head(match.refined)
      ID    cosine                                                                             TITLE
1  38081 1.0000000                              PRECIOUS-Gold ticks lower, US dollar holds near peak
2  38069 0.3779645 PRECIOUS-Bullion drops nearly 1 pct on dollar, palladium holds near 2-1/2-yr high
3 231136 0.2672612                        Dollar steady near 3-1/2 month lows vs. yen, Aussie weaker
4 334088 0.2672612                           Canadian dollar falls amid lower than expected GDP data
5 276011 0.2519763                       Gold holds near four-month low as ECB move on rates awaited
6 394401 0.2390457                    Dollar Tree Will Buy Competitor Family Dollar For $8.5 Billion
          PUBLISHER CATEGORY    polarity
1           Reuters        b  0.00000000
2           Reuters        b  0.00000000
3            NASDAQ        b -0.13333333
4          CTV News        b -0.08333333
5 Business Standard        b  0.07537784
6     The Inquisitr        b  0.00000000

Before we move on, let's spend some time understanding the inner workings of our dictionary-based sentiment method. The sentiment function utilizes a sentiment lexicon (Jockers, 2017) from the lexicon package. It preprocesses the given text as follows:

Paragraphs are split into sentences
Sentences are split into words
All punctuation is removed except commas, semicolons, and colons
Finally, words are stored as tuples, for example, w_{5,2,3} means the third word in the second sentence of the fifth paragraph

Each word is looked up in the lexicon; positive and negative words are tagged with +1 and -1 respectively. Let's call the words which have received a score the polarized words. Not all words receive a score. Only those found in the lexicons receive a score. We can pass a customer lexicon through the polarity_dt parameter to the sentiment function. For each of the polarized words, n words before them and n words after them are considered, and together they are called polarized context clusters. The parameter n can be set by the user. The words in the polarized context cluster can be tagged as either of the following:

neutral
negator
amplifier
de-amplifier
adversative conjunctions

A dictionary of these words can be passed through parameter valence_shifter_dt. Looking up this dictionary, the neighboring words can be tagged. The weights for these are passed through the amplifier.weight and adversative.weight parameters. Each polarized word is weighted now based on polarity_dt, and also weighted based on the number of valence shifters/words surrounding it, which are tagged either as amplifiers or adversative conjunctions. Neutrally tagged weights have no weights. For more details about weight and scoring refer to R function (help) for sentiment function.

Table of Contents for Polarity scores

Create new playlist

Sign In

Sign Up

Table of Contents for
Polarity scores