4.5 Testing the Model for Significance

Both the MSE and r2 provide a measure of accuracy in a regression model. However, when the sample size is too small, it is possible to get good values for both of these even if there is no relationship between the variables in the regression model. To determine whether these values are meaningful, it is necessary to test the model for significance.

To see if there is a linear relationship between X and Y, a statistical hypothesis test is performed. The underlying linear model was given in Equation 4-1 as

Y=β0+β1X+ϵ

If β1=0, then Y does not depend on X in any way. The null hypothesis says there is no linear relationship between the two variables (i.e., β1=0). The alternate hypothesis is that there is a linear relationship (i.e., β10). If the null hypothesis can be rejected, then we have proven that a linear relationship does exist, so X is helpful in predicting Y. The F distribution is used for testing this hypothesis. Appendix D contains values for the F distribution that can be used when calculations are performed by hand. See Chapter 2 for a review of the F distribution. The results of the test can also be obtained from both Excel and QM for Windows.

The F statistic used in the hypothesis test is based on the MSE (seen in the previous section) and the mean squared regression (MSR). The MSR is calculated as

MSR=SSRk
(4-14)

where

k=number of independent variables in the model

The F statistic is

F=MSRMSE
(4-15)

Based on the assumptions regarding the errors in a regression model, this calculated F statistic is described by the F distribution with

Degrees of freedom for the numerator = df1 = kDegrees of freedom for the denominator = df2 = n - k - 1

where

k=the number of independent (X) variables

If there is very little error, the denominator (MSE) of the F statistic is very small relative to the numerator (MSR), and the resulting F statistic will be large. This is an indication that the model is useful. A significance level related to the value of the F statistic is then found. Whenever the F value is large, the observed significance level (p-value) will be low, indicating that it is extremely unlikely that this could have occurred by chance. When the F value is large (with a resulting small significance level), we can reject the null hypothesis that there is no linear relationship. This means that there is a linear relationship and the values of MSE and r2 are meaningful.

The hypothesis test just described is summarized here:

Steps in Hypothesis Test for a Significant Regression Model

  1. Specify null and alternative hypotheses:

    H0:β1=0H1:β10
  2. Select the level of significance (α). Common values are 0.01 and 0.05.

  3. Calculate the value of the test statistic using the formula

    F=MSRMSE
  4. Make a decision using one of the following methods:

    1. Reject the null hypothesis if the test statistic is greater than the F value from the table in Appendix D. Otherwise, do not reject the null hypothesis:

      Reject if Fcalculated > Fα,df1,df2df1 = kdf2 = nk−1
    2. Reject the null hypothesis if the observed significance level, or p-value, is less than the level of significance (α). Otherwise, do not reject the null hypothesis:

      p-value = P(F > calculated test statistic)Reject if p-value < α

Triple A Construction Example

To illustrate the process of testing the hypothesis about a significant relationship, consider the Triple A Construction example. Appendix D will be used to provide values for the F distribution.

  1. Step 1.

    H0:β1=0(no linear relationship between X and Y)H1:β10(linear relationship exists between X and Y)
  2. Step 2.

    Selectα=0.05
  3. Step 3. Calculate the value of the test statistic. The MSE was already calculated to be 1.7188. The MSR is then calculated so that F can be found:

    MSR=SSRk=15.62501=15.6250F=MSRMSE=15.62501.7188=9.09
  4. Step 4.

    1. Reject the null hypothesis if the test statistic is greater than the F value from the table in Appendix D :

      df1 = k = 1df2 = n - k - 1 = 6 - 1 - 1 = 4

    The value of F associated with a 5% level of significance and with degrees of freedom 1 and 4 is found in Appendix D . Figure 4.5 illustrates this:

    F0.05, 1,4 = 7.71Fcalculated = 9.09Reject H0 because 9.09 > 7.71

    Thus, there is sufficient data to conclude that there is a statistically significant relationship between X and Y, so the model is helpful. The strength of this relationship is measured by r2=0.69. Thus, we can conclude that about 69% of the variability in sales (Y) is explained by the regression model based on local payroll (X).

A line begins at the graph’s origin and steeply ascends up and to the right and then begins to gently and smoothly descend back toward the x axis.

Figure 4.5 F Distribution for Triple A Construction Test for Significance

The Analysis of Variance (ANOVA) Table

When software such as Excel or QM for Windows is used to develop regression models, the output provides the observed significance level, or p-value, for the calculated F value. This is then compared to the level of significance (α) to make the decision.

Table 4.4 provides summary information about the ANOVA table. This shows how the numbers in the last three columns of the table are computed. The last column of this table, labeled Significance F, is the p-value, or observed significance level, which can be used in the hypothesis test about the regression model.

Table 4.4 Analysis of Variance Table for Regression

DF SS MS F SIGNIFICANCE F
Regression k SSR MSR = SSR/k MSR/MSE P(F > MSR/MSE)
Residual nk − 1 SSE MSE = SSE / (nk − 1)
Total n − 1 SST

Triple A Construction ANOVA Example

The Excel output that includes the ANOVA table for the Triple A Construction data is shown in the next section. The observed significance level for F=9.0909 is given to be 0.0394. This means

P(F>9.0909)=0.0394

Because this probability is less than 0.05 (α), we would reject the hypothesis of no linear relationship and conclude that there is a linear relationship between X and Y. Note in Figure 4.5 that the area under the curve to the right of 9.09 is clearly less than 0.05, which is the area to the right of the F value associated with a 0.05 level of significance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset