Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

4.3 Measuring the Fit of the Regression Model

A regression equation can be developed for any variables X and Y, even random numbers. We certainly would not have any confidence in the ability of one random number to predict the value of another random number. How do we know that the model is actually helpful in predicting Y based on X? Should we have confidence in this model? Does the model provide better predictions (smaller errors) than simply using the average of the Y values?

In the Triple A Construction example, sales figures (Y) varied from a low of 4.5 to a high of 9.5, and the mean was 7. If each sales value is compared with the mean, we see how far they deviate from the mean, and we could compute a measure of the total variability in sales. Because Y is sometimes higher and sometimes lower than the mean, there may be both positive and negative deviations. Simply summing these values would be misleading because the negatives would cancel out the positives, making it appear that the numbers are closer to the mean than they actually are. To prevent this problem, we will use the sum of squares total (SST) to measure the total variability in Y:

SST = Σ {(Y - \bar{Y})}^{2}

$SST = Σ {(Y - \bar{Y})}^{2}$ (4-6)

If we did not use X to predict Y, we would simply use the mean of Y as the prediction, and the SST would measure the accuracy of our predictions. However, a regression line may be used to predict the value of Y, and while there are still errors involved, the sum of these squared errors will be less than the total sum of squares just computed. The sum of squares error (SSE) is

SSE = Σ e^{2} = Σ {(Y - \hat{Y})}^{2}

$SSE = Σ e^{2} = Σ {(Y - \hat{Y})}^{2}$ (4-7)

Table 4.3 provides the calculations for the Triple A Construction example. The mean $(\bar{Y} = 7)$ $(\bar{Y} = 7)$ is compared to each value, and we get

SST = 22.5

$SST = 22.5$

The prediction $(\hat{Y})$ $(\hat{Y})$ for each observation is computed and compared to the actual value. This results in

SSE = 6.875

$SSE = 6.875$

The SSE is much lower than the SST. Using the regression line has reduced the variability in the sum of squares by $22.5 - 6.875 = 15.625.$ $22.5 - 6.875 = 15.625.$ This is called the sum of squares regression (SSR) and indicates how much of the total variability in Y is explained by the regression model. Mathematically, this can be calculated as

SSR = Σ {(\hat{Y} - \bar{Y})}^{2}

$SSR = Σ {(\hat{Y} - \bar{Y})}^{2}$ (4-8)

Table 4.3 indicates

SSR = 15.625

$SSR = 15.625$

There is a very important relationship among the sums of squares that we have computed:

\begin{array}{l} (Sum of squares total) & = & (Sum of squares due to regression) + (Sum of squares error) \\ SST & = & SSR + SSE \end{array}

$\begin{array}{l} (Sum of squares total) & = & (Sum of squares due to regression) + (Sum of squares error) \\ SST & = & SSR + SSE \end{array}$ (4-9)

Figure 4.2 displays the data for Triple A Construction. The regression line is shown, as is a line representing the mean of the Y values. The errors used in computing the sums of squares are shown on this graph. Notice how the sample points are closer to the regression line than they are to the mean.

Table 4.3 Sum of Squares for Triple A Construction

Y	X	${(Y - \bar{Y})}^{2}$ ${(Y - \bar{Y})}^{2}$	$\hat{Y}$ $\hat{Y}$	${(Y - \hat{Y})}^{2}$ ${(Y - \hat{Y})}^{2}$	${(\hat{Y} - \bar{Y})}^{2}$ ${(\hat{Y} - \bar{Y})}^{2}$
6	3	${(6 - 7)}^{2} = 1$ ${(6 - 7)}^{2} = 1$	$2 + 1.25 (3) = 5.75$ $2 + 1.25 (3) = 5.75$	0.0625	1.563
8	4	${(8 - 7)}^{2} = 1$ ${(8 - 7)}^{2} = 1$	$2 + 1.25 (4) = 7.00$ $2 + 1.25 (4) = 7.00$	1	0
9	6	${(9 - 7)}^{2} = 4$ ${(9 - 7)}^{2} = 4$	$2 + 1.25 (6) = 9.50$ $2 + 1.25 (6) = 9.50$	0.25	6.25
5	4	${(5 - 7)}^{2} = 4$ ${(5 - 7)}^{2} = 4$	$2 + 1.25 (4) = 7.00$ $2 + 1.25 (4) = 7.00$	4	0
4.5	2	${(4.5 - 7)}^{2} = 6.25$ ${(4.5 - 7)}^{2} = 6.25$	$2 + 1.25 (2) = 4.50$ $2 + 1.25 (2) = 4.50$	0	6.25
9.5	5	${(9.5 - 7)}^{2} = 6.25$ ${(9.5 - 7)}^{2} = 6.25$	$2 + 1.25 (5) = 8.25$ $2 + 1.25 (5) = 8.25$	1.5625	1.563
$\bar{Y} = 7$ $\bar{Y} = 7$		$\begin{array}{l} Σ {(Y - \bar{Y})}^{2} & = & 22.5 \\ SST & = & 22.5 \end{array}$ $\begin{array}{l} Σ {(Y - \bar{Y})}^{2} & = & 22.5 \\ SST & = & 22.5 \end{array}$		$\begin{array}{l} Σ {(Y - \hat{Y})}^{2} & = & 6.875 \\ SSE & = & 6.875 \end{array}$ $\begin{array}{l} Σ {(Y - \hat{Y})}^{2} & = & 6.875 \\ SSE & = & 6.875 \end{array}$	$\begin{array}{l} Σ {(\hat{Y} - \bar{Y})}^{2} & = & 15.625 \\ SSR & = & 15.625 \end{array}$ $\begin{array}{l} Σ {(\hat{Y} - \bar{Y})}^{2} & = & 15.625 \\ SSR & = & 15.625 \end{array}$

The scatter diagram from the previous figure is replicated.

Figure 4.2 Full Alternative Text

Coefficient of Determination

The SSR is sometimes called the explained variability in Y, while the SSE is the unexplained variability in Y. The proportion of the variability in Y that is explained by the regression equation is called the coefficient of determination and is denoted by $r^{2} .$ $r^{2} .$ Thus,

r^{2} = \frac{SSR}{SST} = 1 - \frac{SSE}{SST}

$r^{2} = \frac{SSR}{SST} = 1 - \frac{SSE}{SST}$ (4-10)

Either the SSR or the SSE can be used to find $r^{2}$ $r^{2}$ . For Triple A Construction, we have

r^{2} = \frac{15.625}{22.5} = 0.6944

$r^{2} = \frac{15.625}{22.5} = 0.6944$

This means that about 69% of the variability in sales (Y) is explained by the regression equation based on payroll (X).

If every point in the sample were on the regression line (meaning all errors are 0), then 100% of the variability in Y could be explained by the regression equation, so $r^{2} = 1$ $r^{2} = 1$ and $SSE = 0.$ $SSE = 0.$ The lowest possible value of $r^{2}$ $r^{2}$ is 0, indicating that X explains 0% of the variability in Y. Thus, $r^{2}$ $r^{2}$ can range from a low of 0 to a high of 1. In developing regression equations, a good model will have an $r^{2}$ $r^{2}$ value close to 1.

Correlation Coefficient

Another measure related to the coefficient of determination is the coefficient of correlation. This measure also expresses the degree or strength of the linear relationship. It is usually expressed as r and can be any number between and including $+ 1$ $+ 1$ and $- 1.$ $- 1.$ Figure 4.3 illustrates possible scatter diagrams for different values of r. The value of r is the square root of $r^{2} .$ $r^{2} .$ It is negative if the slope is negative, and it is positive if the slope is positive. Thus,

r = \pm \sqrt{r^{2}}

$r = \pm \sqrt{r^{2}}$ (4-11)

For the Triple A Construction example with $r^{2} = 0.6944,$ $r^{2} = 0.6944,$

r = \sqrt{0.6944} = 0.8333

$r = \sqrt{0.6944} = 0.8333$

We know it is positive because the slope is $1 . 25.$ $1 . 25.$

A series of four scatter plots labelled a, b, c, and d. — Figure 4.3 Four Values of the Correlation Coefficient

Figure 4.3 Full Alternative Text

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for
4.3 Measuring the Fit of the Regression Model

4.3 Measuring the Fit of the Regression Model

Table 4.3 Sum of Squares for Triple A Construction

Figure 4.2 Deviations from the Regression Line and from the Mean

Coefficient of Determination

Correlation Coefficient

Figure 4.3 Four Values of the Correlation Coefficient

Table of Contents for 4.3 Measuring the Fit of the Regression Model

Create new playlist

Sign In

Sign Up

Table of Contents for
4.3 Measuring the Fit of the Regression Model