AdaGrad

The AdaGrad algorithm adapts the learning rate for each connection by scaling them inversely proportional to the square root of all previous gradients' sum-squared values. Thus, larger moves are made in the gently sloped direction of the error surface. However, applying this trick from very beginning of training may cause some of the learning rates to diminish drastically. But, AdaGrad has still performed very well on several deep learning tasks. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset