Regression evaluation

When dealing with regression, the most relevant concept is the one of residuals. What are residuals? They are nothing more than the difference between the estimated values and the actual one. Let's imagine we are training a model to predict the level of revenues starting from the investment into advertising activity. We estimated a regression model that associates to one million euros of investment, a revenue of two-and-a-half million. If, within our training dataset, the actual amount of revenue for that given amount is, for instance, one million nine hundred euros, we will have a  residual of 600,000 euros. Applying this kind of reasoning through all of the datasets you employed to train your regression model will lead to a whole new data series represented by residuals:

>

Those residuals are actually really relevant within the regression models for at least two clear reasons:

  • They are a direct way to measure the level of accuracy of the model
  • Some specific conditions are to be met with their regard in order to confirm that a regression model can be applied to your data

Since the second topic will be better explained in Chapters 7, Our First Guess – a Linear Regression, let's focus now on the residuals as a measure of model accuracy.

A very popular method to employ residuals to evaluate the level of accuracy of a model is to use them to compute the Mean Absolute Error (MAE). This measure is simply the mean of all the residuals into their absolute value:

Mean Absolute Error = (Σ |residual|)/n

This measure will be expressed in the same unit of measurement of the data within the dataset, and will clearly express how, on average, the model misses its predictions.

 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset