The momentum method

Cost function may have regions of high curvature and small but consistent gradients. This is due to the poor conditioning of the Hessian matrix and variance in the stochastic gradient. SGD may slow down a lot in such regions. The momentum algorithm accumulates the exponentially weighted moving-average (EWMA) of previous gradients and makes a move in that direction instead of the local gradient direction suggested by SGD. The exponential weighting is controlled by the hyperparameter, α ∈[0,1) which determines how quickly the effect of the previous gradient decays. The momentum method damps the oscillations in directions of high curvature by combining gradients of opposite signs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset