Model architecture and related calculations

We start by creating a model using the keras_model_sequential function. The codes used for the model architecture are given as follows:

# Model architecture
model <- keras_model_sequential()
model %>%
layer_conv_2d(filters = 32,
kernel_size = c(3,3),
activation = 'relu',
input_shape = c(28,28,1)) %>%
layer_conv_2d(filters = 64,
kernel_size = c(3,3),
activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2,2)) %>%
layer_dropout(rate = 0.25) %>%
layer_flatten() %>%
layer_dense(units = 64, activation = 'relu') %>%
layer_dropout(rate = 0.25) %>%
layer_dense(units = 10, activation = 'softmax')

As shown in the preceding code, we add various layers to develop a CNN model. The input layer in this network has 28 x 28 x 1 dimensions based on the height and width of the images, which are 28 each. And since we are using grayscale images, the color channel is one. We use two-dimensional convolutional layers here as we are building a deep learning model with gray-scale images.

Note that when developing an image recognition and classification model with gray scale image data, we make use of a 2D convolutional layer, and with color images, we make use of a 3D convolutional layer.

Let's look at some calculations involving the first convolutional layer of the network, which will help us to appreciate the use of such layers compared to a densely connected layer. In a CNN, neurons in a layer are not connected to all the neurons in the next layer.

Here, the input layer has an image with dimensions of 28 x 28 x 1. To obtain the output shape, we subtract three (from kernel_size) from 28 (height of the input image) and add one. This gives us 26. The final dimension for the output shape becomes 26 x 26 x 32, where 32 is the number of output filters. Thus, the output shape has reduced height and width, but it has a greater depth. To arrive at the number of parameters, we use 3 x 3 x 1 x 32 + 32 = 320, where 3 x 3 is the kernel_size, 1 is the number of channels for the image, 32 is the number of output filters, and to this we add 32 bias terms.

If we compare this to a fully connected neural network, we will obtain a much larger number of parameters. In a fully connected network, 28 x 28 x 1 = 784 neurons will be connected to 26 x 26 x 32 = 21,632 neurons. So, the total number of parameters will be 784 x 21,632 + 21,632 = 16,981,120. This is more than 53,000 times the number of parameters for a densely connected layer compared to what we get for a convolutional layer. This, in turn, helps to significantly reduce the processing time and thereby the processing cost. 

The number of parameters for each layer is indicated in the following code:

# Model summary
summary(model)
__________________________________________________________________ Layer (type Output Shape Param # ================================================================== conv2d_1 (Conv2D) (None, 26, 26, 32) 320 __________________________________________________________________ conv2d_2 (Conv2D) (None, 24, 24, 64) 18496 __________________________________________________________________ max_pooling2d_1 (MaxPooling2D) (None, 12, 12, 64) 0 __________________________________________________________________ dropout_1 (Dropout) (None, 12, 12, 64) 0 __________________________________________________________________ flatten_1 (Flatten) (None, 9216) 0 __________________________________________________________________ dense_1 (Dense) (None, 64) 589888 __________________________________________________________________ dropout_2 (Dropout) (None, 64) 0 __________________________________________________________________ dense_2 (Dense) (None, 10) 650 ================================================================== Total params: 609,354 Trainable params: 609,354 Non-trainable params: 0 ___________________________________________________________________

The output shape for the second convolutional network is 24 x 24 x 64, where 64 is the number of output filters. Here too, the output shape has a reduced height and width, but it has a greater depth. To arrive at a number of parameters, we use 3 x 3 x 32 x 64 + 64 = 18,496, where 3 x 3 is the kernel_size, 32 is the number of filters in the previous layer, and 64 is the number of output filters, and to this, we add 64 bias terms.

The next layer is the pooling layer, which is usually placed after the convolutional layer and performs a down-sampling operation. This helps to reduce processing time and also helps to reduce overfitting. To obtain output shape, we can divide 24 by 2, where 2 comes from the pool size that we have specified. The output shape here is 12 x 12 x 64 and no new parameters are added. The pooling layer is followed by a dropout layer with the same output shape and, once again, no new parameters are added.

In the flattened layer, we go from 3 dimensions (12 x 12 x 64) to one dimension by multiplying the three numbers to obtain 9,216. This is followed by a densely connected layer with 64 units. The number of parameters here can be obtained as 9216 x 64 + 64 = 589,888. This is followed by another dropout layer to avoid the overfitting problem and no parameters are added here. And finally, we have the last layer, which is a densely connected layer with 10 units representing 10 fashion items. The number of parameters here is 64 x 10 + 10 = 650. The total number of parameters is thus 609,354. In the CNN architecture that we have, we are using the relu activation function for the hidden layers and softmax for the output layer.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset