4.11 Cautions and Pitfalls in Regression Analysis

This chapter has provided a brief introduction to regression analysis, one of the most widely used quantitative techniques in business. However, some common errors are made with regression models, so caution should be observed when using them.

If the assumptions are not met, the statistical tests may not be valid. Any interval estimates are also invalid, although the model can still be used for prediction purposes.

Correlation does not necessarily mean causation. Two variables (such as the price of automobiles and your annual salary) may be highly correlated to one another, but one is not causing the other to change. They may both be changing due to other factors such as the economy in general or the inflation rate.

If multicollinearity is present in a multiple regression model, the model is still good for prediction, but interpretation of individual coefficients is questionable. The individual tests on the regression coefficients are not valid.

Using a regression equation beyond the range of X is very questionable. A linear relationship may exist within the range of values of X in the sample. What happens beyond this range is unknown; the linear relationship may become nonlinear at some point. For example, there is usually a linear relationship between advertising and sales within a limited range. As more money is spent on advertising, sales tend to increase even if everything else is held constant. However, at some point, increasing advertising expenditures will have less impact on sales unless the company does other things to help, such as opening new markets or expanding the product offerings. If advertising is increased and nothing else changes, the sales will probably level off at some point.

Related to the limitation regarding the range of X is the interpretation of the intercept (b0). Since the lowest value for X in a sample is often much greater than 0, the intercept is a point on the regression line beyond the range of X. Therefore, we should not be concerned if the t-test for this coefficient is not significant, as we should not be using the regression equation to predict a value of Y when X=0. This intercept is merely used in defining the line that fits the sample points the best.

Using the F test and concluding a linear regression model is helpful in predicting Y does not mean that this is the best relationship. While this model may explain much of the variability in Y, it is possible that a nonlinear relationship might explain even more. Similarly, if it is concluded that no linear relationship exists, another type of relationship could exist.

A statistically significant relationship does not mean it has any practical value. With large enough samples, it is possible to have a statistically significant relationship, but r2 might be 0.01. This would normally be of little use to a manager. Similarly, a high r2 could be found due to random chance if the sample is small. The F test must also show significance to place any value in r2.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset