Initialization of model parameters

Here's how the choice of initial points impacts the performance of the iterative learning algorithm for deep neural networks:

  • Initial points can determine whether the learning will converge or not
  • Even if learning converges, how quickly it converges depends on the initial point
  • Initial points of comparable costs may have varying generalization errors

Initialization algorithms are mostly heuristics. The whole point of a good initialization is having one that somehow enables faster learning. One of the important aspects of initialization is to break symmetry of the initial set of weights to the units of hidden layers. Two units with same activation function at the same level of the network will be updated equally if they are initialized with the same weight. The reason multiple units are kept in the hidden layer is that they should learn different functions. So, getting equally updated will not leaf to a different function learning.

A simple way to break symmetry is use random initialization—sampling from a Gaussian or uniform distribution. Bias parameters in the models can be heuristically chosen constants. Choosing the magnitude of weights depends on a trade-off between optimization and regularization. Regularization demands that the weights should not be very large—this may lead to poor generalization performance. Optimization needs the weights to be large enough to propagate information through the network successfully. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset