Learning as an optimization

In the previous sections, we saw various ways of evaluating our models and also defining the loss functions that we want to minimize. This suggests that a learning task can be viewed as an optimizations problem. In an optimization problem, we are provided with a hypothesis space, which in this case, is the set of all possible models along with an objective function, on the basis of which we will select the best-representing model from the hypothesis space. In this section, we will discuss the various choices of objective functions and how they affect our learning task.

Empirical risk and overfitting

Let's consider the task of selecting a model, M, which optimizes the expectation of some loss function, Empirical risk and overfitting. As we don't know the value of Empirical risk and overfitting, we generally use the dataset, D, which we have to get an empirical estimate of the expectation. Using D, we can define an empirical distribution, Empirical risk and overfitting, as follows:

Empirical risk and overfitting

Putting this in simple words, for some event, A, we assign its probability to be the number of times we have seen this event in our samples. Therefore, as we have more and more samples from the original distribution, Empirical risk and overfitting, the value of Empirical risk and overfitting keeps getting closer and closer to the original distribution.

However, there are a few drawbacks to this approach that we need to keep in mind to avoid getting poor results. Think of a case when we have a lot of variables in the network, let's say n. Considering that all the variables can only take two different states, our joint distribution over these variables will have Empirical risk and overfitting different assignments. Now, let's say that we are provided with 1000 distinct samples from the original distribution. If we try to find the empirical distribution using this data, we will be assigning a probability of 0.001 to each of the 1000 assignments that were given to us and will assign 0 to the rest Empirical risk and overfitting assignments. In real life, we want to predict over new data using our learned model, and it is highly possible that our training data doesn't have all the possible events. In such cases, our trained model will overfit to the training data as it assigns 0 probability to all the events that are not present in the training data.

So, to avoid overfitting, we can limit our hypothesis space to simpler models. This leads to yet another problem; with limited hypothesis space, we might not be able to find a model that will fit perfectly into the original distribution, even if we are provided with infinite data. This type of limitation in learning introduces an inherent error in the learning model, which is known as bias. Conversely, if we have a hypothesis space with more complex models, we can correctly learn the actual distribution, Empirical risk and overfitting. In that case, if we also have less data, we will get too many fluctuations in our predictions. As a result, we will have a learned model with high variance.

In conclusion, we will always have a trade-off between the bias and variance in our learned models. However, with very limited data, variance turns out to be more dangerous, as it is not able to learn the actual distribution, Empirical risk and overfitting, at all.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset