Now we will see how to calculate the gradients of loss with respect to the input to the hidden weights, , for all the gates and the content state. Computing gradients with respect to is exactly the same as for those we computed with respect to , except that the last term will be instead of , similar to what we learned when we covered the LSTM cell.
We can write the gradients of loss with respect to as:
The gradients of loss with respect to are represented as follows:
The gradients of loss with respect to are represented as follows: