Summary

This chapter marks the beginning of the introduction to the algorithms, which are the backbone of predictive modelling. These algorithms are converted into mathematical equations based on the historical data. These equations are the predictive models.

In this chapter, we discussed the simplest and the most widely used predictive modelling technique called linear regression.

Here is a list of things that we learned in this chapter:

  • Linear regression assumes a linear relationship between an output variable and one or more predictor variables. The one with a single predictor variable is called a simple linear regression while the one with multiple variables is called multiple linear regression.
  • The coefficients of the linear relationship (model) are estimated using the least sum of squares method.
  • In Python, statsmodel.api and scikit-learn are the two methods to implement Python.
  • The coefficient of determination, R2, is a good way to gauge the efficiency of the model in explaining the error between the predicted value and the actual value. The more the value of R2, the lesser the error and the better the model.
  • The model parameters, such as p-values associated with the estimates of the co-efficients, F-statistic, and RSE should be analyzed to further assess the efficiency of the model.
  • Multi-collinearity is an issue that arises when two of the input variables in a multiple regression model are highly correlated. This increases the variability of the coefficient estimates of the correlated variables. Variance Inflation Factor or VIF statistic can be used to select variables getting affected due to multi-collinearity. Variables with a very high VIF should be removed from the model.
  • A dataset can be broken into training and testing data before starting the modelling process, in order to validate the model. K-fold cross validation (about which we will learn more later) is another popular method.
  • scikit-learn has inbuilt methods for variable selection which will take a lot of time, if done manually.
  • Categorical variables can be included in the model by converting them into dummy variables.
  • Some variables might need to be transformed before they are fit into a linear function. Sometimes, a variable might exhibit polynomial relationship with its predictor variable.

The linear regression is the simplest of all the predictive models. But after going through this chapter, we should be able to appreciate the complexities involved in the process. There can be multiple variations and the fine-tuning of the model is an elaborate process.

However, there's nothing to be worried about. We have gathered all the armor we need to implement a linear regression and understanding the model coefficients and parameters. The variations and kind of data shouldn't deter us in our endeavor. We need to fine tune the model using the methods discussed in this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset