Polarity scores

We are going to leverage the sentimentr R package to learn the sentiments of the articles we have collected.
Let's look at how to score using the sentiment function :

> sentiment.score <- sentiment(match.refined$TITLE)
> head(sentiment.score)
element_id sentence_id word_count sentiment
1: 1 1 8 0.00000000
2: 2 1 11 0.00000000
3: 3 1 9 -0.13333333
4: 4 1 9 -0.08333333
5: 5 1 11 0.07537784
6: 6 1 9 0.00000000
>

The sentiment function in sentimentr calculates a score between -1 and 1 for each of the articles. In fact, if a text has multiple sentences, it will calculate the score for each sentence. A score of -1 indicates that the sentence has a very negative polarity. A score of 1 means that the sentence is very positive. A score of 0 refers to the neutral nature of the sentence.

However, we need the score at an article level and not at a sentence level, so we can take an average value of the score across all the sentences in a text.

Calculate the average value of the sentiment scores for each article:

> sentiment.score <- sentiment.score %>% group_by(element_id) %>%
+ summarise(sentiment = mean(sentiment))
> head(sentiment.score)
# A tibble: 6 x 2
element_id sentiment
<int> <dbl>
1 1 0.00000000
2 2 0.00000000
3 3 -0.13333333
4 4 -0.08333333
5 5 0.07537784
6 6 0.00000000

Here, the element_id refers to the individual article. By grouping element_id and calculating the average, we can get the sentiment score at an article level. We now have the scores for each article.

Let's update the match.refined data frame with the polarity scores:

> match.refined$polarity <- sentiment.score$sentiment
> head(match.refined)
ID cosine TITLE
1 38081 1.0000000 PRECIOUS-Gold ticks lower, US dollar holds near peak
2 38069 0.3779645 PRECIOUS-Bullion drops nearly 1 pct on dollar, palladium holds near 2-1/2-yr high
3 231136 0.2672612 Dollar steady near 3-1/2 month lows vs. yen, Aussie weaker
4 334088 0.2672612 Canadian dollar falls amid lower than expected GDP data
5 276011 0.2519763 Gold holds near four-month low as ECB move on rates awaited
6 394401 0.2390457 Dollar Tree Will Buy Competitor Family Dollar For $8.5 Billion
PUBLISHER CATEGORY polarity
1 Reuters b 0.00000000
2 Reuters b 0.00000000
3 NASDAQ b -0.13333333
4 CTV News b -0.08333333
5 Business Standard b 0.07537784
6 The Inquisitr b 0.00000000

Before we move on, let's spend some time understanding the inner workings of our dictionary-based sentiment method. The sentiment function utilizes a sentiment lexicon (Jockers, 2017) from the lexicon package. It preprocesses the given text as follows:

  • Paragraphs are split into sentences
  • Sentences are split into words
  • All punctuation is removed except commas, semicolons, and colons
  • Finally, words are stored as tuples, for example, w_{5,2,3} means the third word in the second sentence of the fifth paragraph

Each word is looked up in the lexicon; positive and negative words are tagged with +1 and -1 respectively. Let's call the words which have received a score the polarized words. Not all words receive a score. Only those found in the lexicons receive a score. We can pass a customer lexicon through the polarity_dt  parameter to the sentiment function. For each of the polarized words, n words before them and n words after them are considered, and together they are called polarized context clusters. The parameter n can be set by the user. The words in the polarized context cluster can be tagged as either of the following:

  • neutral
  • negator
  • amplifier
  • de-amplifier
  • adversative conjunctions

A dictionary of these words can be passed through parameter valence_shifter_dt. Looking up this dictionary, the neighboring words can be tagged. The weights for these are passed through the amplifier.weight and adversative.weight parameters. Each polarized word is weighted now based on polarity_dt, and also weighted based on the number of valence shifters/words surrounding it, which are tagged either as amplifiers or adversative conjunctions. Neutrally tagged weights have no weights. For more details about weight and scoring refer to R function (help) for sentiment function.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset