Gradient checking

Back propagation, and neural nets in general, are a little difficult to conceptualize. So, it is often not easy to understand how changing any of the model (hyper) parameters will affect the outcome. Furthermore, with different implementations, it is possible to get results that indicate that an algorithm is working correctly, that is, the cost function is decreasing on each level of gradient descent. However, as with any complicated software, there can be hidden bugs that might only manifest themselves under very specific conditions. A way to help eliminate these is through a procedure called gradient checking. This is a numerical way of approximating gradients, and we can understand this intuitively by examining the following diagram:

Gradient checking

The derivative of J(w), with respect to w, can be approximated as follows:

Gradient checking

The preceding formula approximates the derivative when the parameter is a single value. We need to evaluate these derivatives on a cost function, where the weights are a vector. We do this by performing a partial derivative on each of the weights in turn. Here is an example:

Gradient checking
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset