As mentioned earlier, we have used Jane Austen's famous novel Pride and Prejudice in this section, detailing the steps involved in tidying the data, and extracting sentiments using (publicly) available lexicons.
Steps 1 and 2 show the loading of the required cran packages and the required text. Steps 3 and 4 perform unigram tokenization and stop word removal. Steps 5 and 6 extract and visualize the top 10 most occurring words across all the 62 chapters. Steps 7 to 12 demonstrate high and granular-level sentiments using two widely used lexicons bing and nrc.
Each 150-word-long sentence is tagged to a sentiment, and the same has been shown in the figure showing the Distribution of number of positive and negative words across sentences of 150 words each. In step 13, chapter-wise sentiment tagging is performed using maximum occurrence of positive or negative words from the bing lexicon. Out of 62 chapters, 52 have more occurrences of positive lexicons, and 10 have more occurrences of negative lexicons.