Gradients with respect to V

Since the final equation of GRU, that is, , is the same as with the RNN, calculating the gradients of loss with respect to hidden-to-output layer weight is exactly the same as what we computed in the RNN. Thus, we can directly write the following:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset