Gradients with respect to V

After predicting the output, , we are in the final layer of the network. Since we are backpropagating; that is, going from the output layer to the input layer, our first weight will be , which is hidden-to-output layer weight.

We have learned throughout that the final loss is the sum of the loss over all the time steps. In a similar manner, our final gradient is the sum of gradients at all time steps as follows:

If we have layers, then we can write the gradient of loss with respect to as follows:

Since the final equation of LSTM, that is, , is the same as RNN, calculating gradients of loss with respect to is exactly the same as what we computed in the RNN. Thus, we can directly write the following:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset