What is the collinearity assumption about? It basically states that for beta coefficients to be unbiased estimators the independent variables should not be too correlated with each other. Take for instance these variables:
x1
|
x2
|
x3
|
119
|
328,5
|
715,8
|
134
|
406
|
792,8
|
183
|
460,5
|
981,6
|
126
|
390
|
734,2
|
177
|
434,5
|
951,4
|
107
|
362,5
|
688,4
|
119
|
325,5
|
715,8
|
165
|
387,5
|
904
|
156
|
371
|
876,2
|
If we compute the linear correlation coefficient, we obtain the following:
variable | x1 | x2 | x3 |
x1 | 1.000 | 0.79 | 0.996 |
x2 | 0.790 | 1.00 | 0.800 |
x3 | 0.996 | 0.80 | 1.000 |
As you can see, we have all of the three variables correlated with each other. How does this influence our linear model estimation? It turns out that collinearity (between two variables) and multicollinearity (between more than two variables) tends to cause the following undesired side effects:
- Counterintuitive value of beta coefficient (for instance a negative slope for a variable where a positive contribution would be expected).
- Instability of the model to changes within the sample. The presence of multicollinearity tends to cause high instability within our model coefficient estimates.
Taking one step back, we should formally define a way to decide if collinearity is occurring among two variables. We can do this by calculating the so-called tolerance.