Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

4.7 Multiple Regression Analysis

The multiple regression model is a practical extension of the model we just observed. It allows us to build a model with several independent variables. The underlying model is

Y = β_{0} + β_{1} X_{1} + β_{2} X_{2} + \dots + β_{k} X_{k} + ϵ

$Y = β_{0} + β_{1} X_{1} + β_{2} X_{2} + \dots + β_{k} X_{k} + ϵ$ (4-16)

where

\begin{array}{l} Y & = & dependent variable (response variable) \\ X_{i} & = & i th independent variable (predictor variable or explanatory variable) \\ β_{0} & = & intercept (value of Y when all X_{i} = 0) \\ β_{1} & = & coefficient of the i th independent variable \\ k & = & number of independent variables \\ ϵ & = & random error \end{array}

$\begin{array}{l} Y & = & dependent variable (response variable) \\ X_{i} & = & i th independent variable (predictor variable or explanatory variable) \\ β_{0} & = & intercept (value of Y when all X_{i} = 0) \\ β_{1} & = & coefficient of the i th independent variable \\ k & = & number of independent variables \\ ϵ & = & random error \end{array}$

To estimate the values of these coefficients, a sample is taken and the following equation is developed:

\hat{Y} = b_{0} + b_{1} X_{1} + b_{2} X_{2} + \dots + b_{k} X_{k}

$\hat{Y} = b_{0} + b_{1} X_{1} + b_{2} X_{2} + \dots + b_{k} X_{k}$ (4-17)

where

\begin{array}{l} \hat{Y} & = & predicted value of Y \\ b_{0} & = & sample intercept (and is an estimate of β_{0}) \\ b_{i} & = & sample coefficient of ith variable (and is an estimate of β_{i}) \end{array}

$\begin{array}{l} \hat{Y} & = & predicted value of Y \\ b_{0} & = & sample intercept (and is an estimate of β_{0}) \\ b_{i} & = & sample coefficient of ith variable (and is an estimate of β_{i}) \end{array}$

Consider the case of Jenny Wilson Realty, a real estate company in Montgomery, Alabama. Jenny Wilson, owner and broker for this company, wants to develop a model to determine a suggested listing price for houses based on the size of the house and the age of the house. She selects a sample of houses that have sold recently in a particular area, and she records the selling price, the square footage of the house, the age of the house, and also the condition (good, excellent, or mint) of each house, as shown in Table 4.5. Initially Jenny plans to use only the square footage and age to develop a model, although she wants to save the information on condition of the house to use later. She wants to find the coefficients for the following multiple regression model:

Table 4.5 Jenny Wilson Real Estate Data

SELLING PRICE ($)	SQUARE FOOTAGE	AGE	CONDITION
95,000	1,926	30	Good
119,000	2,069	40	Excellent
124,800	1,720	30	Excellent
135,000	1,396	15	Good
142,800	1,706	32	Mint
145,000	1,847	38	Mint
159,000	1,950	27	Mint
165,000	2,323	30	Excellent
182,000	2,285	26	Mint
183,000	3,752	35	Good
200,000	2,300	18	Good
211,000	2,525	17	Good
215,000	3,800	40	Excellent
219,000	1,740	12	Mint

\hat{Y} = b_{0} + b_{1} X_{1} + b_{2} X_{2}

$\hat{Y} = b_{0} + b_{1} X_{1} + b_{2} X_{2}$

where

\begin{array}{l} \hat{Y} & = & predicted value of dependent variable (selling price) \\ b_{0} & = & Y intercept \\ X_{1} and X_{2} & = & value of the two independent variables (square footage and age), respectively \\ b_{1} a n d b_{2} & = & slopes for X_{1} and X_{2}, respectively \end{array}

$\begin{array}{l} \hat{Y} & = & predicted value of dependent variable (selling price) \\ b_{0} & = & Y intercept \\ X_{1} and X_{2} & = & value of the two independent variables (square footage and age), respectively \\ b_{1} a n d b_{2} & = & slopes for X_{1} and X_{2}, respectively \end{array}$

The mathematics of multiple regression becomes quite complex, so we leave formulas for $b_{0} {, b}_{1},$ $b_{0} {, b}_{1},$ and $b_{2}$ $b_{2}$ to regression textbooks.³ Excel can be used to develop a multiple regression model just as it was used for a simple linear regression model. When entering the data in Excel, it is important that all of the independent variables are in adjoining columns to facilitate the input. From the Data tab in Excel, select Data Analysis and then Regression, as shown earlier, in Program 4.1A. This opens the Regression window to allow the input, as shown in Program 4.4A. Note that the X Range includes the data in two columns (B and C) because there are two independent variables. The Excel output that Jenny Wilson obtains is shown in Program 4.4B, and it provides the following equation:

\begin{array}{l} \hat{Y} & = & b_{0} + b_{1} X_{1} + b_{2} X_{2} \\ = & 146, 630.89 + 43.82 X_{1} - 2898.69 X_{2} \end{array}

$\begin{array}{l} \hat{Y} & = & b_{0} + b_{1} X_{1} + b_{2} X_{2} \\ = & 146, 630.89 + 43.82 X_{1} - 2898.69 X_{2} \end{array}$

Evaluating the Multiple Regression Model

A multiple regression model can be evaluated in a manner similar to the way a simple linear regression model is evaluated. Both the p-value for the F test and $r^{2}$ $r^{2}$ can be interpreted the same with multiple regression models as they are with simple linear regression models. However, as there is more than one independent variable, the hypothesis that is being tested with the F test is that all the coefficients are equal to 0. If all these are 0, then none of the independent variables in the model is helpful in predicting the dependent variable.

Screenshot of the input screen with a pop up window open to specify data ranges.

Figure 4.4A Full Alternative Text

The Excel screenshot it titled Summary Output.

Figure 4.4B Full Alternative Text

To determine which of the independent variables in a multiple regression model is significant, a significance test on the coefficient for each variable is performed. While statistics textbooks can provide the details of these tests, the results of these tests are automatically displayed in the Excel output. The null hypothesis is that the coefficient is $0 (H_{0} : β_{i} = 0)$ $0 (H_{0} : β_{i} = 0)$ , and the alternate hypothesis is that it is not 0 $(H_{1} : β_{i} \neq 0) .$ $(H_{1} : β_{i} \neq 0) .$ The test statistic is calculated in Excel, and the p-value is given. If the p-value is lower than the level of significance $(α)$ $(α)$ , then the null hypothesis is rejected, and it can be concluded that the variable is significant.

Jenny Wilson Realty Example

In the Jenny Wilson Realty example in Program 4.4B, the overall model is statistically significant and useful in predicting the selling price of the house because the p-value for the F test is 0.002. The $r^{2}$ $r^{2}$ value is 0.6719, so 67% of the variability in selling price for these houses can be explained by the regression model. However, there were two independent variables in the model–-square footage and age. It is possible that one of these is significant and the other is not. The F test simply indicates that the model as a whole is significant.

Two significance tests can be performed to determine if square footage or age (or both) is significant. In Program 4.4B, the results of two hypothesis tests are provided. The first test for variable $X_{1}$ $X_{1}$ (square footage) is

\begin{array}{l} H_{0} : β_{1} & = & 0 \\ H_{1} : β_{1} & \neq & 0 \end{array}

$\begin{array}{l} H_{0} : β_{1} & = & 0 \\ H_{1} : β_{1} & \neq & 0 \end{array}$

Using a 5% level of significance $(α = 0.05),$ $(α = 0.05),$ the null hypothesis is rejected because the p-value for this is 0.0013. Thus, square footage is helpful in predicting the price of a house.

Similarly, the variable $X_{2}$ $X_{2}$ (age) is tested using the Excel output, and the p-value is 0.0039. The null hypothesis is rejected because this is less than 0.05. Thus, age is also helpful in predicting the price of a house.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for
4.7 Multiple Regression Analysis

4.7 Multiple Regression Analysis

Table 4.5 Jenny Wilson Real Estate Data

Evaluating the Multiple Regression Model

Program 4.4A Input Screen for Jenny Wilson Realty Multiple Regression in Excel 2016

Program 4.4B Excel 2016 Output Screen for Jenny Wilson Realty Multiple Regression Example

Jenny Wilson Realty Example

Table of Contents for 4.7 Multiple Regression Analysis

Create new playlist

Sign In

Sign Up

Table of Contents for
4.7 Multiple Regression Analysis