Getting ready

In this recipe, we will work with the famous Amazon Fine Food Reviews dataset from Kaggle, which can be downloaded from https://www.kaggle.com/snap/amazon-fine-food-reviews. This data consists of fine food reviews from Amazon and spans more than 10 years. We will only use the review texts and their summaries in our analysis.

Let's start by loading the required libraries:

pckgs <- c("textclean","keras","stringr","tm","qdap")
lapply(pckgs, library, character.only = TRUE ,quietly = T)

Now, we read two columns, Text and Summary, from the data. We will only use the first 10,000 reviews:

reviews <- read.csv("data/Reviews.csv", nrows = 10000)[,c('Text', 'Summary')]
head(reviews)

The following screenshot shows a few records from the input data:

We are only interested in keeping those rows that have both text and summary information in the data:

reviews <- reviews[complete.cases(reviews),]
rownames(reviews) <- 1:nrow(reviews)

In the next section, we will preprocess the input data and build a model for text summarization.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset