4.7 Multiple Regression Analysis

The multiple regression model is a practical extension of the model we just observed. It allows us to build a model with several independent variables. The underlying model is

Y=β0+β1X1+β2X2++βkXk+ϵ
(4-16)

where

Y=dependent variable (response variable)Xi=ith independent variable (predictor variable or explanatory variable)β0=intercept (value of Y when all Xi = 0)β1=coefficient of the ith independent variablek=number of independent variablesϵ=random error

To estimate the values of these coefficients, a sample is taken and the following equation is developed:

Y^=b0+b1X1+b2X2++bkXk
(4-17)

where

Y^=predicted value of Yb0=sample intercept (and is an estimate of β0)bi=sample coefficient of ith variable (and is an estimate of βi)

Consider the case of Jenny Wilson Realty, a real estate company in Montgomery, Alabama. Jenny Wilson, owner and broker for this company, wants to develop a model to determine a suggested listing price for houses based on the size of the house and the age of the house. She selects a sample of houses that have sold recently in a particular area, and she records the selling price, the square footage of the house, the age of the house, and also the condition (good, excellent, or mint) of each house, as shown in Table 4.5. Initially Jenny plans to use only the square footage and age to develop a model, although she wants to save the information on condition of the house to use later. She wants to find the coefficients for the following multiple regression model:

Table 4.5 Jenny Wilson Real Estate Data

SELLING PRICE ($) SQUARE FOOTAGE AGE CONDITION
95,000 1,926 30 Good
119,000 2,069 40 Excellent
124,800 1,720 30 Excellent
135,000 1,396 15 Good
142,800 1,706 32 Mint
145,000 1,847 38 Mint
159,000 1,950 27 Mint
165,000 2,323 30 Excellent
182,000 2,285 26 Mint
183,000 3,752 35 Good
200,000 2,300 18 Good
211,000 2,525 17 Good
215,000 3,800 40 Excellent
219,000 1,740 12 Mint
Y^=b0+b1X1+b2X2

where

Y^=predicted value of dependent variable (selling price)b0=Y interceptX1andX2=value of the two independent variables (square footage and age), respectivelyb1andb2=slopes for X1 and X2, respectively

The mathematics of multiple regression becomes quite complex, so we leave formulas for b0, b1, and b2 to regression textbooks.3 Excel can be used to develop a multiple regression model just as it was used for a simple linear regression model. When entering the data in Excel, it is important that all of the independent variables are in adjoining columns to facilitate the input. From the Data tab in Excel, select Data Analysis and then Regression, as shown earlier, in Program 4.1A. This opens the Regression window to allow the input, as shown in Program 4.4A. Note that the X Range includes the data in two columns (B and C) because there are two independent variables. The Excel output that Jenny Wilson obtains is shown in Program 4.4B, and it provides the following equation:

Y^=b0+b1X1+b2X2=146,630.89+43.82X12898.69X2

Evaluating the Multiple Regression Model

A multiple regression model can be evaluated in a manner similar to the way a simple linear regression model is evaluated. Both the p-value for the F test and r2 can be interpreted the same with multiple regression models as they are with simple linear regression models. However, as there is more than one independent variable, the hypothesis that is being tested with the F test is that all the coefficients are equal to 0. If all these are 0, then none of the independent variables in the model is helpful in predicting the dependent variable.

Screenshot of the input screen with a pop up window open to specify data ranges.

Program 4.4A Input Screen for Jenny Wilson Realty Multiple Regression in Excel 2016

The Excel screenshot it titled Summary Output.

Program 4.4B Excel 2016 Output Screen for Jenny Wilson Realty Multiple Regression Example

To determine which of the independent variables in a multiple regression model is significant, a significance test on the coefficient for each variable is performed. While statistics textbooks can provide the details of these tests, the results of these tests are automatically displayed in the Excel output. The null hypothesis is that the coefficient is 0 (H0:βi=0), and the alternate hypothesis is that it is not 0 (H1:βi0). The test statistic is calculated in Excel, and the p-value is given. If the p-value is lower than the level of significance (α), then the null hypothesis is rejected, and it can be concluded that the variable is significant.

Jenny Wilson Realty Example

In the Jenny Wilson Realty example in Program 4.4B, the overall model is statistically significant and useful in predicting the selling price of the house because the p-value for the F test is 0.002. The r2 value is 0.6719, so 67% of the variability in selling price for these houses can be explained by the regression model. However, there were two independent variables in the model–-square footage and age. It is possible that one of these is significant and the other is not. The F test simply indicates that the model as a whole is significant.

Two significance tests can be performed to determine if square footage or age (or both) is significant. In Program 4.4B, the results of two hypothesis tests are provided. The first test for variable X1 (square footage) is

H0:β1=0H1:β10

Using a 5% level of significance (α=0.05), the null hypothesis is rejected because the p-value for this is 0.0013. Thus, square footage is helpful in predicting the price of a house.

Similarly, the variable X2 (age) is tested using the Excel output, and the p-value is 0.0039. The null hypothesis is rejected because this is less than 0.05. Thus, age is also helpful in predicting the price of a house.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset