How to do it...

Now that we're familiar with the data, let's look at it in more detail:

  1. Let's import the word index for the imdb data:
word_index = dataset_imdb_word_index()

We can look at the head of the word index using the following code:

head(word_index)

Here, we can see that there's is a list of key-value pairs, where the key is the word and the value is the integer that it's mapped to:

Let's also look at the number of unique words in our word index:

length((word_index))

Here, we can see that there are 88,584 unique words in the word index:

  1. Now, we create a reversed list of key-value pairs of the word index. We will use this list to decode the reviews in the IMDb dataset:
reverse_word_index <- names(word_index)
names(reverse_word_index) <- word_index
head(reverse_word_index)

Here, we can see that the reversed word index list is a list of key-value pairs, where the key is the integer index and the value is the associated word:

  1. Now, we decode the first review. Note that the word encodings are offset by three because 0,1,2 are reserved for padding, the start of the sequence, and out of vocabulary words, respectively:
decoded_review <- sapply(train_x[[1]], function(index) {

word <- if (index >= 3) reverse_word_index[[as.character(index -3)]]
if (!is.null(word)) word else "?"

})
cat(decoded_review)

The following screenshot shows the decoded version of the first review:

  1. Let's pad all the sequences to make them uniform in length:
train_x <- pad_sequences(train_x, maxlen = 80)
test_x <- pad_sequences(test_x, maxlen = 80)
cat('x_train shape:', dim(train_x), ' ')
c
at('x_test shape:', dim(test_x), ' ')

All the sequences are padded to a length of 80:

Now, let's look at the first review after padding it:

train_x[1,]

Here, you can see that the review only has 80 indexes after padding:

  1. Now, we build the model for sentiment classification and view its summary:
model <- keras_model_sequential()
model %>%
layer_embedding(input_dim = 1000, output_dim = 128) %>%
l
ayer_simple_rnn(units = 32) %>%
layer_dense(units = 1, activation = 'sigmoid') summary(model)

Here is the description of the model:

  1. Now, we compile the model and train it:
# compile model
model %>% compile(
  loss = 'binary_crossentropy',
o
ptimizer = 'adam',
metrics = c('accuracy') ) # train model model %>% fit(
train_x,train_y,
batch_size = 32,
e
pochs = 10,
validation_split = .2 )
  1. Finally, we evaluate the model's performance on the test data and print the metrics:
scores <- model %>% evaluate(
test_x, test_y,
batch_size = 32 ) cat('Test score:', scores[[1]],' ')
cat('Test accuracy', scores[[2]])

The following screenshot shows the performance metrics on the test data: 

By doing this, we achieved an accuracy of around 71% on the test data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset