Let f and g both be real-valued functions of a single variable. Suppose that y = g(x) and z = f(g(x)) = f(y).
Then, the chain rule of derivatives states that:
Similarly for the function of several variables, let , ,, then,
Therefore, the gradient of z with respect to , is represented as a multiplication of the Jacobian with the gradient vector. So, we have the chain rule of derivatives for a function of several variables as follows:
The neural network learning algorithm consists of several such Jacobian gradient multiplications.