How it works...

As mentioned earlier, we have used Jane Austen's famous novel Pride and Prejudice in this section, detailing the steps involved in tidying the data, and extracting sentiments using (publicly) available lexicons.

Steps 1 and 2 show the loading of the required cran packages and the required text. Steps 3 and 4 perform unigram tokenization and stop word removal. Steps 5 and 6 extract and visualize the top 10 most occurring words across all the 62 chapters. Steps 7 to 12 demonstrate high and granular-level sentiments using two widely used lexicons bing and nrc.

Both the lexicons contains a list of widely used English words that are tagged to sentiments. In bing, each word is tagged to one of the high level binary sentiments (positive or negative), and in nrc, each word is tagged to one of the granular-level multiple sentiments (positive, negative, anger, anticipation, joy, fear, disgust, trust, sadness, and surprise).

Each 150-word-long sentence is tagged to a sentiment, and the same has been shown in the figure showing the Distribution of number of positive and negative words across sentences of 150 words each. In step 13, chapter-wise sentiment tagging is performed using maximum occurrence of positive or negative words from the bing lexicon. Out of 62 chapters, 52 have more occurrences of positive lexicons, and 10 have more occurrences of negative lexicons.

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...