Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

4.5 Testing the Model for Significance

Both the MSE and $r^{2}$ $r^{2}$ provide a measure of accuracy in a regression model. However, when the sample size is too small, it is possible to get good values for both of these even if there is no relationship between the variables in the regression model. To determine whether these values are meaningful, it is necessary to test the model for significance.

To see if there is a linear relationship between X and Y, a statistical hypothesis test is performed. The underlying linear model was given in Equation 4-1 as

Y = β_{0} + β_{1} X + ϵ

$Y = β_{0} + β_{1} X + ϵ$

If $β_{1} = 0,$ $β_{1} = 0,$ then Y does not depend on X in any way. The null hypothesis says there is no linear relationship between the two variables (i.e., $β_{1} = 0$ $β_{1} = 0$ ). The alternate hypothesis is that there is a linear relationship (i.e., $β_{1} \neq 0$ $β_{1} \neq 0$ ). If the null hypothesis can be rejected, then we have proven that a linear relationship does exist, so X is helpful in predicting Y. The F distribution is used for testing this hypothesis. Appendix D contains values for the F distribution that can be used when calculations are performed by hand. See Chapter 2 for a review of the F distribution. The results of the test can also be obtained from both Excel and QM for Windows.

The F statistic used in the hypothesis test is based on the MSE (seen in the previous section) and the mean squared regression (MSR). The MSR is calculated as

MSR = \frac{SSR}{k}

$MSR = \frac{SSR}{k}$ (4-14)

where

k = number of independent variables in the model

$k = number of independent variables in the model$

The F statistic is

F = \frac{MSR}{MSE}

$F = \frac{MSR}{MSE}$ (4-15)

Based on the assumptions regarding the errors in a regression model, this calculated F statistic is described by the F distribution with

\begin{array}{l} {Degrees of freedom for the numerator = df}_{1} = k \\ {Degrees of freedom for the denominator = df}_{2} = n - k - 1 \end{array}

$\begin{array}{l} {Degrees of freedom for the numerator = df}_{1} = k \\ {Degrees of freedom for the denominator = df}_{2} = n - k - 1 \end{array}$

where

k = the number of independent (X) variables

$k = the number of independent (X) variables$

If there is very little error, the denominator (MSE) of the F statistic is very small relative to the numerator (MSR), and the resulting F statistic will be large. This is an indication that the model is useful. A significance level related to the value of the F statistic is then found. Whenever the F value is large, the observed significance level (p-value) will be low, indicating that it is extremely unlikely that this could have occurred by chance. When the F value is large (with a resulting small significance level), we can reject the null hypothesis that there is no linear relationship. This means that there is a linear relationship and the values of MSE and $r^{2}$ $r^{2}$ are meaningful.

The hypothesis test just described is summarized here:

Steps in Hypothesis Test for a Significant Regression Model

Specify null and alternative hypotheses:

$\begin{array}{l} H_{0} : β_{1} & = & 0 \\ H_{1} : β_{1} & \neq & 0 \end{array}$ $\begin{array}{l} H_{0} : β_{1} & = & 0 \\ H_{1} : β_{1} & \neq & 0 \end{array}$
Select the level of significance $(α) .$ $(α) .$ Common values are 0.01 and 0.05.
Calculate the value of the test statistic using the formula

$F = \frac{MSR}{MSE}$ $F = \frac{MSR}{MSE}$
Make a decision using one of the following methods:
1. Reject the null hypothesis if the test statistic is greater than the F value from the table in Appendix D. Otherwise, do not reject the null hypothesis:
  
  $\begin{array}{l} Reject if F_{calculated} > F_{α, {df}_{1} {,df}_{2}} \\ {df}_{1} = k \\ {df}_{2} = n − k −1 \end{array}$ $\begin{array}{l} Reject if F_{calculated} > F_{α, {df}_{1} {,df}_{2}} \\ {df}_{1} = k \\ {df}_{2} = n − k −1 \end{array}$
2. Reject the null hypothesis if the observed significance level, or p-value, is less than the level of significance ( $α$ $α$ ). Otherwise, do not reject the null hypothesis:
  
  $\begin{array}{l} p -value = P (F > calculated test statistic) \\ Reject if p -value < α \end{array}$ $\begin{array}{l} p -value = P (F > calculated test statistic) \\ Reject if p -value < α \end{array}$

Triple A Construction Example

To illustrate the process of testing the hypothesis about a significant relationship, consider the Triple A Construction example. Appendix D will be used to provide values for the F distribution.

Step 1.

$\begin{array}{l} H_{0} : β_{1} & = & 0 & (no linear relationship between X and Y) \\ H_{1} : β_{1} & \neq & 0 & (linear relationship exists between X and Y) \end{array}$ $\begin{array}{l} H_{0} : β_{1} & = & 0 & (no linear relationship between X and Y) \\ H_{1} : β_{1} & \neq & 0 & (linear relationship exists between X and Y) \end{array}$
Step 2.

$Select α = 0.05$ $Select α = 0.05$
Step 3. Calculate the value of the test statistic. The MSE was already calculated to be 1.7188. The MSR is then calculated so that F can be found:

$\begin{array}{l} MSR & = & \frac{SSR}{k} = \frac{15.6250}{1} = 15.6250 \\ F & = & \frac{MSR}{MSE} = \frac{15.6250}{1.7188} = 9.09 \end{array}$ $\begin{array}{l} MSR & = & \frac{SSR}{k} = \frac{15.6250}{1} = 15.6250 \\ F & = & \frac{MSR}{MSE} = \frac{15.6250}{1.7188} = 9.09 \end{array}$
Step 4.
1. Reject the null hypothesis if the test statistic is greater than the F value from the table in Appendix D :
  
  $\begin{array}{l} {df}_{1} = k = 1 \\ {df}_{2} = n - k - 1 = 6 - 1 - 1 = 4 \end{array}$ $\begin{array}{l} {df}_{1} = k = 1 \\ {df}_{2} = n - k - 1 = 6 - 1 - 1 = 4 \end{array}$
The value of F associated with a 5% level of significance and with degrees of freedom 1 and 4 is found in Appendix D . Figure 4.5 illustrates this:

$\begin{array}{l} F_{0 .05, 1,4} = 7 .71 \\ F_{calculated} = 9 .09 \\ {Reject H}_{0} because 9 .09 > 7 .71 \end{array}$ $\begin{array}{l} F_{0 .05, 1,4} = 7 .71 \\ F_{calculated} = 9 .09 \\ {Reject H}_{0} because 9 .09 > 7 .71 \end{array}$

Thus, there is sufficient data to conclude that there is a statistically significant relationship between X and Y, so the model is helpful. The strength of this relationship is measured by $r^{2} = 0.69.$ $r^{2} = 0.69.$ Thus, we can conclude that about 69% of the variability in sales (Y) is explained by the regression model based on local payroll (X).

A line begins at the graph’s origin and steeply ascends up and to the right and then begins to gently and smoothly descend back toward the x axis.

Figure 4.5 Full Alternative Text

The Analysis of Variance (ANOVA) Table

When software such as Excel or QM for Windows is used to develop regression models, the output provides the observed significance level, or p-value, for the calculated F value. This is then compared to the level of significance $(α)$ $(α)$ to make the decision.

Table 4.4 provides summary information about the ANOVA table. This shows how the numbers in the last three columns of the table are computed. The last column of this table, labeled Significance F, is the p-value, or observed significance level, which can be used in the hypothesis test about the regression model.

Table 4.4 Analysis of Variance Table for Regression

	DF	SS	MS	F	SIGNIFICANCE F
Regression	k	SSR	MSR = SSR/k	MSR/MSE	P(F > MSR/MSE)
Residual	n −k − 1	SSE	MSE = SSE / (n −k − 1)
Total	n − 1	SST

Triple A Construction ANOVA Example

The Excel output that includes the ANOVA table for the Triple A Construction data is shown in the next section. The observed significance level for $F = 9.0909$ $F = 9.0909$ is given to be 0.0394. This means

P (F > 9.0909) = 0.0394

$P (F > 9.0909) = 0.0394$

Because this probability is less than $0.05 (α),$ $0.05 (α),$ we would reject the hypothesis of no linear relationship and conclude that there is a linear relationship between X and Y. Note in Figure 4.5 that the area under the curve to the right of 9.09 is clearly less than 0.05, which is the area to the right of the F value associated with a 0.05 level of significance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for
4.5 Testing the Model for Significance

4.5 Testing the Model for Significance

Steps in Hypothesis Test for a Significant Regression Model

Triple A Construction Example

Figure 4.5 F Distribution for Triple A Construction Test for Significance

The Analysis of Variance (ANOVA) Table

Table 4.4 Analysis of Variance Table for Regression

Triple A Construction ANOVA Example

Table of Contents for 4.5 Testing the Model for Significance

Create new playlist

Sign In

Sign Up

Table of Contents for
4.5 Testing the Model for Significance