The hidden layer(s)

For understanding how this layer works, let's go back to the regression algorithm we explain earlier. There, we expressed the predicted variable as a linear combination of features—area, number of rooms, and distance to the center multiplied by weights, respectively, W1W2, and W3. Establishing the analogy with our neural network, the features would apply to the neurons and the weights to the edges that connect each pair of neurons:

Source: https://commons.wikimedia.org/wiki/File:Artificial_neural_network_pso.png, Cyberbotics Ltd.
CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0

The value of each feature would be processed with the sigmoid function of its neuron (input layer; j neurons) to produce a probability value, Sij, which is then multiplied by the weight, Wij, of the edge that connects it to each neuron downstream (hidden layer ; i neurons). Hence, the feature input to this neuron, i, in the hidden layer is a sum of products, there being as many terms as neurons are connected to it upstream (input layer ; j neurons).

Such a result is the sum over j of all of the terms, Sij, with the index, j, which is an iterator that ranges over all of the neurons connected to i neurons in the input layer. The weights Wij of the edges connecting pairs of neurons are more properly called hyperparameters.

The neural structure of the hidden layers provides what we call intrinsic features, which are inherent properties of the network and do not have to be selected by the user (they are established by the designer of the neural network). What the user has to do is to train the network to obtain the best set of weights, Wij, that makes the network to as predictive as possible with the available dataset. Here is where the magic of deep learning resides because a well-designed architecture of layers can provide a very accurate predictive model. The downside is that you need a lot of data to get a well-trained network.

Recapping from the beginning, given an input image, you can calculate the feature input to the neurons of each layer, Fi, based on the probabilities from the previous layer, Sij, and the weights, Wij, of the edges connecting to neuron i:

Fi = (sum over j) [Sij * Wij]

Proceeding downstream layer by layer, you can finally obtain the probabilities of the neurons of the output layer and, therefore, answer with the prediction of what the analyzed image contains.

As was mentioned earlier, and given that complexity of the network structure, you may guess that, for training such a model, you would need much more data than for traditional ML algorithms such as regression. More specially, what you have to calculate are the values of how many hyperparameters as edges connecting pairs of neurons there are. Once you achieve this milestone, you get a trained network that can be applied to unlabeled images to predict its content.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset