Experimenting using a deeper network with more units in the hidden layer

After building three different neural network models with 203, 239, and 753 parameters respectively, we will now build a deeper neural network model containing a larger number of units in the hidden layers. The code used for this experiment is as follows:

# Model architecture
model <- keras_model_sequential()
model %>%
         layer_dense(units = 40, activation = 'relu', input_shape = c(21)) %>%
         layer_dropout(rate = 0.4) %>%
         layer_dense(units = 30, activation = 'relu') %>%
         layer_dropout(rate = 0.3) %>%
         layer_dense(units = 20, activation = 'relu') %>%
         layer_dropout(rate = 0.2) %>%
         layer_dense(units = 3, activation = 'softmax')
summary(model)

OUTPUT
__________________________________________________________________________
Layer (type)                  Output Shape                 Param #     
==========================================================================
dense_1 (Dense)                (None, 40)                   880          
__________________________________________________________________________
dropout_1 (Dropout)            (None, 40)                    0            
__________________________________________________________________________
dense_2 (Dense)                (None, 30)                   1230         
__________________________________________________________________________
dropout_2 (Dropout)            (None, 30)                    0            
__________________________________________________________________________
dense_3 (Dense)                (None, 20)                   620          
__________________________________________________________________________
dropout_3 (Dropout)            (None, 20)                    0            
__________________________________________________________________________
dense_4 (Dense)                (None, 3)                     63           
==========================================================================
Total params: 2,793
Trainable params: 2,793
Non-trainable params: 0
___________________________________________________________________________

# Compile model
 model %>% 
   compile(loss = 'categorical_crossentropy', 
           optimizer = 'adam',
           metrics = 'accuracy')

# Fit model
model_four <- model %>% 
 fit(training, 
 trainLabels, 
 epochs = 200,
 batch_size = 32, 
 validation_split = 0.2)
plot(model_four)

You can see from the preceding code and output that to try and improve the classification performance, this model has a total of 2,793 parameters. This model has three hidden layers with 40, 30, and 20 units in the three hidden layers. After each hidden layer, we have also added a dropout layer with dropout rates of 40%, 30%, and 20% to avoid overfitting—for example, with a dropout rate of 0.4 (or 40%) after the first hidden layer, 40% of the units in the first hidden layer are dropped to zero at random at the time of training. This helps to avoid any overfitting that may occur because of the higher number of units in the hidden layers. We compile the model and then run the model with same settings that we used earlier. We also store the loss and accuracy values after each epoch in model_four.

A plot for accuracy and loss values for training and validation data is shown in the following graph:

Accuracy and loss for training and validation data

From the preceding plot, we can make the following observations:

Training loss and accuracy values stay approximately constant after about 150 epochs.
Accuracy values for validation data are mainly flat after about 75 epochs.
However, for loss, we see some divergence between training and validation data after about 75 epochs, with loss from validation data increasing gradually. This suggests the presence of overfitting after about 75 epochs.

Let's now make predictions using test data and review the resulting confusion matrix to assess model performance, as shown in the following code:

# Predictions and confusion matrix
pred <- model %>% 
         predict_classes(test)
table(Predicted=pred, Actual=testtarget)

OUTPUT
         Actual
Predicted   0   1   2
        0 431  34   7
        1  20  53   2
        2   9   7  40

From the preceding confusion matrix, the following observations can be made:

The correct classifications for the 0, 1, and 2 categories are 431, 53, and 40 respectively.
The overall accuracy comes to 86.9%, which is better than the first three models.
We can also find that this classification model correctly classifies normal, suspect, and pathological cases with percentages of about 93.7%, 56.4%, and 81.6% respectively.

Table of Contents for Experimenting using a deeper network with more units in the hidden layer

Create new playlist

Sign In

Sign Up

Table of Contents for
Experimenting using a deeper network with more units in the hidden layer