Convolutional neural networks

ANN is a family of models inspired from biological neural networks (the human brain) that, starting from the mechanisms regulating natural neural networks, plan to simulate human thinking. They are used to estimate or approximate functions that may depend on a large number of inputs, many of which are often unknown. ANNs are generally presented as interconnected neuron systems among which an exchange of messages takes place. Each connection has a related weight; the value of the weight is adjustable based on experience, and this makes neural networks an instrument adaptable to the various types of input and having the ability to learn.

ANNs define the neuron as a central processing unit, which performs a mathematical operation to generate one output from a set of inputs. The output of a neuron is a function of the weighted sum of the inputs plus the bias. Each neuron performs a very simple operation that involves activation if the total amount of signal received exceeds an activation threshold. In the following figure, a simple ANN architecture is shown:

Essentially, CNN are ANNs. In fact, just like the latter, CNNs are made up of neurons connected to one another by weighted branches (weight); the training parameters of the nets are once again the weight and the bias.

In CNN, the connection pattern between neurons is inspired by the structure of the visual cortex in the animal world. The individual neurons present in this part of the brain (visual cortex) respond to certain stimuli in a narrow region of the observation, called the receptive field. The receptive fields of different neurons are partially overlapped in order to cover the entire field of vision. The response of a single neuron to stimuli taking place in its receptive field can be mathematically approximated by a convolution operation.

Everything related to the training of a neural network, that is, forward/backward propagation and updating of the weight, also applies in this context; moreover, a whole CNN always uses a single function of differentiable cost. However, CNNs make a specific assumption that their input has a precise data structure, such as an image, and this allows them to take specific properties in their architectureto better process such data.

The normal neural networks stratified with an FC architecture—where every neuron of each layer is connected to all the neurons of the previous layer (excluding bias neurons)—in general do not scale well with an increase in the size of input data.

Let's take a practical example: suppose we want to analyze an image to detect objects. To start, let's see how the image is processed. As we know, in the coding of an image, it is divided into a grid of small squares, each of which represents a pixel. At this point, to encode the color images, it will be enough to identify for each square a certain number of shades and different color gradations. And then we code each one by means of an appropriate sequence of bits. Here is a simple image encoding:

The number of squares in the grid defines the resolution of the image. For example, an image that is 1,600 pixels wide and 800 pixels high (1,600 x 800) contains (multiplied) 1,280,000 pixels, or 1.2 megapixels. To this, we must multiply the three color channels, finally obtaining 1,600 x 800 x 3 = 3,840,000. So, each neuron completely connected in the first hidden layer would have 3,840,000 weights. This is only for a single neuron; considering the whole network, the thing would certainly become unmanageable!

CNNs are designed to recognize visual patterns directly in images represented by pixels and require zero or very limited preprocessing. They are able to recognize extremely variable patterns, such as freehand writing and images representing the real world.

Typically, a CNN consists of several alternate convolution and subsampling levels (pooling) followed by one or more FC final levels in the case of classification. The following figure shows a classic image-processing pipeline:

To solve problems in the real world, these steps can be combined and stacked as often as necessary. For example, you can have two, three, or even more layers of Convolution. You can enter all the Pooling you want to reduce the size of the data.

As already mentioned, different types of levels are typically used in a CNN. In the following sections, the main ones will be covered.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset