Gradient clipping

We can use gradient clipping to bypass the exploding gradient problem. In this method, we normalize the gradients according to a vector norm (say, L2) and clip the gradient value to a certain range. For instance, if we set the threshold as 0.7, then we keep the gradients in the -0.7 to +0.7 range. If the gradient value exceeds -0.7, then we change it to -0.7, and similarly, if it exceeds 0.7, then we change it to +0.7.

Let's assume is the gradient of loss L with respect to W:

First, we normalize the gradients using the L2 norm, that is, . If the normalized gradient exceeds the defined threshold, we update the gradient, as follows:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset