6 2. DEEP LEARNING
Input Layer Hidden Layer Output Layer
Figure 2.2: Feedforward neural network.
feedforward neural network. e first layer (on the left) is the input layer, with each neuron
representing an input to the neural network. e second layer is the hidden layer. Although in
the neural network shown in Fig. 2.2 there is only one hidden layer, this number can vary. e
final layer is the output layer, with each neuron representing an output from the neural network.
e neurons send their values to neurons in the next layer. Typically, each neuron is connected
to all neurons in the next layer. is is called a fully connected neural network. Each connection
also has a weight, which multiplies the input to the neuron in the next layer. erefore, the input
at the next layer is a weighted summation of the outputs of the previous layer. e weights of
these connections are changed over time as the neural network learns. e neural network tunes
these weights over the training process to optimize its performance at a given task. Generally, a
loss function is used as a measure of the network error, and network weights are updated such
that the loss is minimized during training.
e hidden neurons also normalize their output by applying an activation function to its
input, such as a Rectified Linear Unit (ReLU) or a sigmoid function. e activation function
is helpful as it introduces some nonlinearity in the neural network. Consider a case of a neural
network with no nonlinearities, since the output of each layer is a linear function, and a sum
of linear functions is still a linear function, the relationship between the input and the network
output could be described by the function F .x/ D mx. To update the weights using gradient
descent, the gradient of this function would be calculated as m. erefore the gradient is not a
function of the inputs x. Moreover, in the case of linear activations, there is no benefit in making
the network deeper as the function F .x/ D mx can always be estimated by a neural network
with a single hidden layer. However, by using nonlinear activation functions, we can make the
networks deeper by introducing more hidden layers and enabling the networks to model more