Experimenting with batch size, kernel size, and filters in CNNs

The code that will be used for this experiment is as follows:

# Model architecture
model <- keras_model_sequential() %>%
          layer_embedding(input_dim = 1500,
                          output_dim = 32,
                          input_length = 400) %>%
          layer_conv_1d(filters = 64,
                   kernel_size = 4,
                   padding = "valid",
                   activation = "relu",
                   strides = 1) %>%
          layer_max_pooling_1d(pool_size = 4) %>%
          layer_dropout(0.25) %>%
          layer_lstm(units = 32) %>%
          layer_dense(units = 50, activation = "softmax")

# Compiling the model
 model %>% compile(optimizer = "adam",  
          loss = "categorical_crossentropy",
          metrics = c("acc"))

 # Fitting the model
 model_three <- model %>% fit(trainx, trainy,
          epochs = 30,
          batch_size = 8,
          validation_data = list(validx, validy))

# Loss and accuracy plot
plot(model_three)

From the preceding code, we can make the following observations:

We have reduced the kernel size from 5 to 4.
We have increased the number of filters for the convolutional layer from 32 to 64.
We have reduced the batch size from 16 to 8 while training the model.
We have kept all other settings the same as what was used for the previous model.

The loss and accuracy values based on the training and validation data for each of the 30 epochs are stored in model_three. A plot of this data is as follows:

The plot for the loss and accuracy shows the following:

The accuracy values for the validation data remain flat for the last few epochs, whereas it increases at a relatively slower pace in the last few epochs for the training data.
The loss values based on the validation data start to increase during the last few epochs and continue to decrease for the training data.

Now, we will obtain the loss and accuracy values based on the train and test data using the evaluate function, as follows:

# Loss and accuracy for train data
model %>% evaluate(trainx, trainy)
$loss
[1] 0.1093387
$acc
[1] 0.9880419

# Loss and accuracy for test data
model %>% evaluate(testx, testy)
[1] 3.262691
$acc
[1] 0.337

From the preceding code and output, we can observe the following:

The loss and accuracy values based on the training data show an improvement compared to the previous two models.
For the test data, although the loss value is higher compared to the first two models, an accuracy value of about 34% shows better accuracy in classifying author articles.

The following bar plot shows the accuracy of correctly classifying the authors of articles in the test data:

From the preceding bar plot, we can observe the following:

The accuracy of correctly classifying articles from each author shows better performance compared to the previous two models since we don't have any authors with zero accuracy.
When comparing the three models that we've used so far using test data, we can see that the first model has four authors classified with 50% or higher accuracy. However, for the second and third models, the number of authors classified with 50% or higher accuracy increases to 8 and 9, respectively.

In this section, we carried out two experiments that showed that the author classification performance of the model can be improved further.

Table of Contents for Experimenting with batch size, kernel size, and filters in CNNs

Create new playlist

Sign In

Sign Up

Table of Contents for
Experimenting with batch size, kernel size, and filters in CNNs