How it works...

The dimensions and characteristics of an input image are shown in steps 1 and 2, respectively. Every input image is further processed in a convolution layer using a set of filters as defined in steps 4 and 5. The first convolution layer results in a set of 64 images (one for each set filter). In addition, the resolution of these images are also reduced to half (because of 2 x 2 max pooling); namely, from 32 x 32 pixels to 16 x 16 pixels.

The second convolution layer will input these 64 images and provide an output of new 64 images with further reduced resolutions. The updated resolution is now 8 x 8 pixels (again due to 2 x 2 max pooling). In the second convolution layer, a total of 64 x 64 = 4,096 filters are created, which are then further convoluted into 64 output images (or channels). Remember that these 64 images of 8 x 8 resolution correspond to a single input image.

Further, these 64 output images of 8 x 8 pixels are flattened into a single vector of length 4,096 (8 x 8 x 64), as defined in step 3, and are used as an input to a fully connected layer of a given set of neurons, as defined in step 6. The vector of 4,096 elements is then fed into the first fully connected layer of 1,024 neurons. The output neurons are again fed into a second fully connected layer of 10 neurons (equal to num_classes). These 10 neurons represent each of the class labels, which are then used to determine the (final) class of the image.

First, the weights of the convolution and fully connected layers are randomly initialized till the classification stage (the end of CNN graph). Here, the classification error is computed based on the true class and the predicted class (also called cross entropy).

Then, the optimizer backpropagates the error through the convolution network using the chain rule of differentiation, post which the weights of the layers (or filters) are updated such that the error is minimized. This entire cycle of one forward and backward propagation is called one iteration. Thousands of such iterations are performed till the classification error is reduced to a sufficiently low value.

Generally, these iterations are performed using a batch of images instead of a single image to increase the efficiency of computation.

The following image represents the convolution network designed in this chapter:

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...