Experimenting with the maximum sequence length and the optimizer

Let's start by creating train and test data for the sequence of integers representing movie reviews and their labels using the following code:

c(c(train_x, train_y), c(test_x, test_y)) %<-% imdb
z <- NULL
for (i in 1:25000) {z[i] <- print(length(train_x[[i]]))}
summary(z)
Min. 1st Qu. Median Mean 3rd Qu. Max.
11.0 130.0 178.0 238.7 291.0 2494.0

In the preceding code, we're storing the length of the sequences based on the training data in z. By doing this, we get a summary of z. From here, we can obtain numeric summary values such as the minimum, first quartile, median, mean, third quartile, and maximum. The median value for the sequence of words is 178. In the previous sections, we used a maximum length of 100 at the time of padding the sequences so that they were of equal length. We will increase this to 200 in this experiment so that we have a number closer to the median value, as shown in the following code:

imdb <;- dataset_imdb(num_words = 500)  
c(c(train_x, train_y), c(test_x, test_y)) %<-% imdb
train_x <- pad_sequences(train_x, maxlen = 200)
test_x <- pad_sequences(test_x, maxlen = 200)
model <- keras_model_sequential()
model %>% layer_embedding(input_dim = 500,
output_dim = 16,
input_length = 200) %>%
layer_flatten() %>%
layer_dense(units = 16, activation = 'relu') %>%
layer_dense(units = 1, activation = "sigmoid")
model %>% compile(optimizer = "adamax",
loss = "binary_crossentropy",
metrics = c("acc"))
model_3 <- model %>% fit(train_x, train_y,
epochs = 10,
batch_size = 512,
validation_split = 0.2)
plot(model_3)

Another change we'll make is using the adamax optimizer when compiling the model. Note that this a variant of the popular adam optimizer. We keep everything else the same. After training the model, we plot the resulting loss and accuracy, as shown in the following plot:

From the preceding plot for loss and accuracy, we can observe the following:

  • The loss and accuracy values for the training and validation data show rapid improvements for about four epochs.
  • After four epochs, these improvements slow down for the training data.
  • For the validation data, the loss and accuracy values become flat for the last few epochs.
  • The plot doesn't show any cause for concern regarding overfitting.

Next, we need to calculate the loss and accuracy based on the test data using the following code:

model %>% evaluate(test_x, test_y)
$loss
[1] 0.3906249
$acc
[1] 0.82468

Looking at the preceding code, we can observe the following:

  • The model's loss and accuracy, based on the test data, are 0.391 and 0.825, respectively.
  • Both numbers indicate improvements compared to the performance we retrieved in the previous section.

To look into the model's sentiment classification performance even further, we can use the following code:

pred1 <- model %>%   predict_classes(test_x)
table(Predicted=pred1, Actual=imdb$test$y)
Actual
Predicted 0 1
0 9970 1853
1 2530 10647

From the preceding confusion matrix, which is based on movie reviews of test data, we can observe the following:

  • The correct classifications of negative (9,970) and positive movie reviews (10,647) are much closer now.
  • The correct classification of positive movie reviews is slightly better compared to the correct classification of negative reviews.
  • This model misclassifies a negative movie review as positive at a slightly higher rate (2,530) compared to a positive review being misclassified as a negative review (1,853).

Here, experimenting with the maximum sequence length and the type of optimizer that's used to compile the model resulted in improved sentiment classification performance. You are encouraged to continue experimenting and improve the model's sentiment classification performance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset