Bias versus variance

There are two primary reasons behind a prediction error: bias and variance. The best way to understand bias and variance is to look at a case where we are doing predictions on the same dataset multiple times. Bias is an estimate of how far the predicted results are from the actual values, and variance is an estimate of the difference in predicted values among different predictions. Generally, adding more features helps reduce bias, as can be understood easily. If, while building a prediction model, we leave out some features with significant correlation, it would lead to a significant error. If your model has high variance, you can remove features to reduce it. A bigger dataset also helps reduce variance.
Here, we are going to use a simple dataset that is ill-posed. An ill-posed dataset is a dataset where the sample data size is smaller than the number of predictors, as shown here:

y x0 x1 x2 x3 x4 x5 x6 x7 x8
1 5 3 1 2 1 3 2 2 1
2 9 8 8 9 7 9 8 7 9

You can easily guess that out of the nine predictors, only two have a strong correlation with yx0 and x1. We will use this dataset with the lasso algorithm to see its validity.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset