Overcoming over-fitting using regularization

In the previous section, we established that a high weight magnitude is one of the reasons for over-fitting. In this section, we will look into ways to get around the problem of over-fitting, such as penalizing for high weight magnitude values.

Regularization gives a penalty for having a high magnitude of weights in model. L1 and L2 regularizations are among the most commonly used regularization techniques and work as follows:

L2 regularization minimizes the weighted sum of squares of weights at the specified layer of the neural network, in addition to minimizing the loss function (which is the sum of squared loss in the following formula):

Where  is the weightage associated with the regularization term and is a hyperparameter that needs to be tuned, y is the predicted value of , and  is the weight values across all the layers of the model.

L1 regularization minimizes the weighted sum of absolute values of weights at the specified layer of the neural network in addition to minimizing the loss function (which is the sum of the squared loss in the following formula):

.

This way, we ensure that weights do not get customized for extreme cases in the training dataset only (and thus, not generalizing on the test data).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset