Tolerance 

Despite its quite sonorous name, tolerance is actually the complement to unity of the model performance parameter R-squared. As you know from our discussions about model performance metrics, the R-squared can range from zero to one, and the closer it is to one, the more useful the model is to explain the variability within our response variable. The complement of this model is, therefore, a number expressing how much variability within the response variable is not explained by the model. 

The relevant points which R-squared you have to consider when looking for collinearity. To get it we can look again at the numbers we have previously seen:

x1 x2 x3
119 328,5 715,8
134 406 792,8
183 460,5 981,6
126 390 734,2
177 434,5 951,4
107 362,5 688,4
119 325,5 715,8
165 387,5 904
156 371 876,2

 

To compute the tolerance associated for instance to x1 and x3, we have to estimate a univariate linear model of x1 on x3, in order to observe the associated R-squared. You can now easily do it by yourself, just run lm() on x1 ~ x3 and call summary() on it.

If everything goes right you should obtain something like the following:

Call:

 lm(formula = x1 ~ x3)

Residuals:

 Min 1Q Median 3Q Max
-3.8230 -1.3607 0.7504 1.3871 3.8275

Coefficients:


Estimate Std. Error t value Pr(>|t|)
(Intercept) -59.765450 6.517130 -9.171 3.77e-05 ***
x3 0.247804 0.007903 31.355 8.67e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.508 on 7 degrees of freedom
Multiple R-squared: 0.9929, Adjusted R-squared: 0.9919
F-statistic: 983.1 on 1 and 7 DF, p-value: 8.671e-09

Where you can locate Multiple R-squared: 0.9929showing the R-squared parameter for that model. We can now just compute the tolerance as:

tolerance = 1 - R-squared = 1 - 0.9929 = 0.0071

Is it good news? First of all, we should resonate about the meaning of this number. What is it? It expresses how much of the variability of x1 is not explained from x3. In the presence of collinearity, would that be high or low? If we assume a variable being significantly correlated to another one we can conclude that the variation of one of them would significantly influence, and therefore explain, the variation of the other. In this situation, the R-squared would therefore be very high and its complement very low.

That is why it is expected that for a collinear variable the tolerance value should be low. But how low? Looking around, you will find a threshold most of the time set to 0.10 and sometimes 0.2.We can summarise this  as follows:

  • Tolerance >= 0.1/0.2 we can exclude collinearity occurring
  • Tolerance < 0.10 we should conclude collinearity being in place
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset