Experimenting with the LSTM network having an additional layer

In this second experiment to improve the performance of the classification model, we will add an extra LSTM layer. Let's have a look at the following code:

# Model architecture
model <- keras_model_sequential() %>%
layer_embedding(input_dim = 500, output_dim = 32) %>%
layer_lstm(units = 32,
return_sequences = TRUE) %>%
layer_lstm(units = 32) %>%
layer_dense(units = 1, activation = "sigmoid")

# Compiling model
model %>% compile(optimizer = "adam",
loss = "binary_crossentropy",
metrics = c("acc"))

# Fitting model
model_three <- model %>% fit(train_x, train_y,
epochs = 10,
batch_size = 128,
validation_split = 0.2)

# Loss and accuracy plot
plot(model_three)

By adding an extra LSTM layer to the network, as shown in the preceding code, the total number of parameters with these two LSTM layers will now increase to 32,673 compared to 24,353 that we had previously with one LSTM layer. This increase in the number of parameters will also lead to higher training time when training the network. We are also retaining the use of the Adam optimizer when compiling the model. We are keeping everything else the same as what we had used in the previous model.

A simple flow chart for the network architecture with two LSTM layers used in this experiment, is shown in the following screenshot:

The preceding flow chart shown for the LSTM network highlights the two layers in the architecture and activation functions used. In both LSTM layers, tanh is used as the default activation function. In the dense layer, we continue to use the sigmoid activation function that we used earlier.

After training the model, the accuracy and loss values for each epoch is stored in model_three. We use the loss and accuracy values in model_three to develop the following plot:

From the loss and accuracy plot shown, we can make the following observations:

  • The plot for loss and accuracy values doesn't indicate the presence of an over-fitting problem since the curves for the training and validation data are close to each other.
  • As in the earlier model, the loss and accuracy for the validation data seem to remain flat for the last few epochs, indicating ten epochs are sufficient for training the model, and increasing the number of epochs is not likely to improve results.

We can now obtain the loss, accuracy, and confusion matrix for the training data using the following code:

# Loss and accuracy
model %>% evaluate(train_x, train_y)
$loss
[1] 0.3396379
$acc
[1] 0.85504

pred <- model %>% predict_classes(train_x)

# Confusion Matrix
table(Predicted=pred, Actual=imdb$train$y)
Actual
Predicted 0 1
0 11245 2369
1 1255 10131

From the preceding code output, we can make the following observations:

  • The loss and accuracy values based on training data are obtained as 0.339 and 0.855 respectively. Both loss and accuracy show improvement compared to the earlier two models.
  • We can use this model to make predictions for each review in the training data, compare them with actual labels, and then summarize the results in the form of a confusion matrix.
  • For the training data, the confusion matrix shows that the model correctly classifies negative movie reviews about 90% of the time and correctly classifies positive reviews about 81% of the time.
  • So, although there is an overall improvement in the model performance, we continue to observe bias when correctly classifying one category compared to the other. 

After reviewing the performance of the model using training data, we will now repeat the process with the test data. Following is the code for obtaining the loss, accuracy, and confusion matrix:

# Loss and accuracy
model %>% evaluate(test_x, test_y)
$loss
[1] 0.3761043
$acc
[1] 0.83664

pred1 <- model %>% predict_classes(test_x)

# Confusion Matrix
table(Predicted=pred1, Actual=imdb$test$y)
Actual
Predicted 0 1
0 10916 2500
1 1584 10000

From the preceding code output, we can make the following observations:

  • For the test data, the loss and accuracy values are 0.376 and 0.837 respectively. Both results show a better classification performance compared to the previous two models for the test data.
  • The confusion matrix shows that negative movie reviews are correctly classified at a rate of about 87.3%, and positive reviews are correctly classified at a rate of about 80%.
  • Hence, these results are consistent with those obtained using the training data and show a similar bias to that we observed for the training data.

To summarize, by adding an extra LSTM layer, we were able to improve the movie review sentiment classification performance of the model. However, we continue to observe bias when correctly classifying one category compared to the other category. Hence, although we obtained moderate success in improving model performance, there is scope to further improve the classification performance of the model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset