Layers of CNNs

A CNN consists of an input and an output layer; it also has various hidden layers. The following are the various hidden layers in a CNN:

Convolution: Assume that we have an image represented as pixels, a convolution is something where we have a little matrix nearly always 3 x 3 in deep learning and multiply every element of the matrix by every element of 3 x 3 section of the image and then add them all together to get the result of that convolution at one point. The following diagram illustrates the process of convolution on a pixel:

Convolution application on an image

Rectified Linear Unit (ReLU): A non-linear activation that throws away the negatives in an input matrix. For example, let's assume we have a 3 x 3 matrix with negative numbers, zeros, and positive numbers as values in the cells of the matrix. Given this matrix as input to ReLU, it transforms all negative numbers in the matrix to zeros and returns the 3 x 3 matrix. ReLU is an activation function that can be defined as part of the CNN architecture. The following diagram demonstrates the function of ReLU in CNNs:

Rectified Linear Unit (ReLU) in CNNs

Max pooling: Max pooling is something that can be set as a layer in the CNN architecture. It allows to identify if the specific characteristic is present in the previous level. It replaces the highest value in an input matrix with the maximum and gives the output. Let's consider an example, with a 2 x 2 max pooling layer, given a 4 x 4 matrix as input, the max pooling layer replaces each 2 x 2 in the input matrix with the highest value among the four cells. The output matrix thus obtained is non-overlapping and it's an image representation with a reduced resolution. The following diagram illustrates the functionality of max pooling in a CNN:

Functionality of max pooling layer in CNNs

There are various reasons to apply max pooling, such as to reduce the amount of parameters and computation load, to eliminate overfitting, and, most importantly, to force the neural network to see the larger picture, as in previous layers it was focused on seeing bits and pieces of the image.

Fully-connected layer: Also known as a dense layer, this involves a linear operation on the layer's input vector. The layer ensures every input is connected to every output by a weight.
Softmax: An activation function that is generally applied at the last layer of the deep neural network. In a multiclass classification problem, we require the fully-connected output of a deep learning network to be interpreted as a probability. The total probability of a particular observation in data (for all classes) should add up to 1, and the probability of the observation belonging to each class should range between 0 and 1. Therefore, we transform each output of the fully-connected layer as a portion of a total sum. However, instead of simply doing the standard proportion, we apply this non-linear exponential function for a very specific reason: we would like to make our highest output as close to 1 as possible and our lower output as close to 0. Softmax does this by pushing the true linear proportions closer to either 1 or 0.

The following diagram illustrates the softmax activation function:

Softmax activation function

Sigmoid: This is similar to softmax, except that it is applied to a binary classification, such as cats versus dogs. With this activation function, the class to which the observation belongs is assigned a higher probability compared to the other class. Unlike softmax, the probabilities do not have to add up to 1.

Table of Contents for Layers of CNNs

Create new playlist

Sign In

Sign Up

Table of Contents for
Layers of CNNs