What are CNNs?

A CNN, also known as a ConvNet, is one of the most widely used deep learning algorithms for computer vision tasks. Let's say we are performing an image-recognition task. Consider the following image. We want our CNN to recognize that it contains a horse:

How can we do that? When we feed the image to a computer, it basically converts it into a matrix of pixel values. The pixel values range from 0 to 255, and the dimensions of this matrix will be of [image width x image height x number of channels]. A grayscale image has one channel, and colored images have three channels red, green, and blue (RGB).

Let's say we have a colored input image with a width of 11 and a height of 11, that is 11 x 11, then our matrix dimension would be of [11 x 11 x 3]. As you can see in [11 x 11 x 3], 11 x 11 represents the image width and height and 3 represents the channel number, as we have a colored image. So, we will have a 3D matrix.

But it is hard to visualize a 3D matrix, so, for the sake of understanding, let's consider a grayscale image as our input. Since the grayscale image has only one channel, we will get a 2D matrix.

As shown in the following diagram, the input grayscale image will be converted into a matrix of pixel values ranging from 0 to 255, with the pixel values representing the intensity of pixels at that point:

The values given in the input matrix are just arbitrary values for our understanding.

Okay, now we have an input matrix of pixel values. What happens next? How does the CNN come to understand that the image contains a horse? CNNs consists of the following three important layers:

The convolutional layer
The pooling layer
The fully connected layer

With the help of these three layers, the CNN recognizes that the image contains a horse. Now we will explore each of these layers in detail.

Table of Contents for What are CNNs?

Create new playlist

Sign In

Sign Up

Table of Contents for
What are CNNs?