4.2 Simple Linear Regression

In any regression model, there is an implicit assumption (which can be tested) that a relationship exists between the variables. There is also some random error that cannot be predicted. The underlying simple linear regression model is

Y=β0+β1X+ϵ
(4-1)

where

Y=dependent variable (response variable)X=independent variable (predictor variable or explanatory variable)β0=intercept (value of Y when X = 0)β1=slope of regression lineϵ=random error

The true values for the intercept and slope are not known, and therefore they are estimated using sample data. The regression equation based on sample data is given as

Y^=b0+b1X
(4-2)

where

Y^=predicted value of Yb0=estimate of β0, based on sample resultsb1=estimate of β1, based on sample results

In the Triple A Construction example, we are trying to predict the sales, so the dependent variable (Y) would be sales. The variable we use to help predict sales is the Albany area payroll, so this is the independent variable (X). Although any number of lines can be drawn through these points to show a relationship between X and Y in Figure 4.1, the line that will be chosen is the one that in some way minimizes the errors. Error is defined as

Error=(ActualValue)(PredictedValue)e=YY^
(4-3)

Since errors may be positive or negative, the average error could be zero even though there are extremely large errors—both positive and negative. To eliminate the difficulty of negative errors canceling positive errors, the errors can be squared. The best regression line will be defined as the one with the minimum sum of the squared errors. For this reason, regression analysis is sometimes called least squares regression.

Statisticians have developed formulas that we can use to find the equation of a straight line that would minimize the sum of the squared errors. The simple linear regression equation is

Y^=b0+b1X

The following formulas can be used to compute the slope and the intercept:

X¯=Xn=Average(mean)ofXvalueY¯=Xn=Average (mean) of Y valuesb1=(XX¯)(YY¯)(XX¯)2
(4-4)
b0=Y¯b1X¯
(4-5)

The preliminary calculations are shown in Table 4.2. There are other “shortcut” formulas that are helpful when doing the computations on a calculator, and these are presented in Appendix 4.1. They will not be shown here, as computer software will be used for most of the other examples in this chapter.

Table 4.2 Regression Calculations for Triple A Construction

Y X (XX¯)2 (XX¯)(YY¯)
6 3 (34)2=1 (34)(67)=1
8 4 (44)2=0 (44)(87)=0
9 6 (64)2=4 (64)(97)=4
5 4 (44)2=0 (44)(57)=0
4.5 2 (24)2=4 (24)(4.57)=5
9.5 5 (54)2=1 (54)(9.57)=2.5

ΣY=42Y¯=42/6=7

ΣX=24X¯=24/6=4

(XX¯)2=10 (XX¯)(YY¯)=12.5

Computing the slope and the intercept of the regression equation for the Triple A Construction Company example, we have

X¯=ΣX6=246=4Y¯=ΣX6=426=7b1=Σ(XX¯)(YY¯)Σ(XX¯)2=12.510=1.25b0=Y¯b1X¯=7(1.25)(4)=2

The estimated regression equation therefore is

Y^=2+1.25X

or

Sales=2+1.25(Payroll)

If the payroll next year is $600 million (X=6), then the predicted value would be

Y^=2+1.25(6)=9.5

or $950,000.

One of the purposes of regression is to understand the relationship among variables. This model tells us that each time the payroll increases by $100 million (represented by X), we would expect the sales to increase by $125,000, since b1=1.25 ($100,000). This model helps Triple A Construction see how the local economy and company sales are related.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset