Chapter 2

The Method of Least Squares

The Logic Behind Regression Procedure

The method of least squares, or the regression analysis for a model with one variable, is sometimes called the ordinary least squares (OLS). In this method a line, similar to equation (1.1) in Chapter 1, is claimed to represent real-life data such as the income and consumption data for the US for years 1990–2010, which are plotted in Figure 2.1. Each dot on the graph represents a pair of income–consumption data for one year. The data are not sorted because doing so would not create any difference in the analysis. An attempt to have a line graph where data points for consecutive years are connected will usually present a meaningless graph. Therefore, the customary way of presenting the data is a scatter plot, as in Figure 2.1 where income is presented on the horizontal axis and consumption is presented on the vertical axis.

CH002-F001.eps

Figure 2.1. Scatter plot of income–consumption for the US (years 1990–2010).

Because pairs of income–consumption data for years 1990–2010 do not line up perfectly, some compromise must be made if one wishes to use a single straight line. The compromise must be logical. In fact, you have been using such a logical line for a long time, as will become clear shortly.

It is not unusual to report typical consumption. One way of defining typical is to use average consumption. Average consumption, which is actually the sum of consumption values divided by the number of observations, is not a function of income. That means regardless of the level of income, the average consumption does not change. The graph for average consumption on an income–consumption coordinate is a flat line parallel to income axis at the level where consumption is equal to the mean value of consumption. Because the average of consumption does not depend on income, all the income–consumption pairs line up at the level of the average on the consumption axis, while the line connecting them is horizontal and parallel to the income axis. The flat dashed line in Figure 2.1 represents the average income, which is $24,137. This line represents a model, which provides an unbiased estimate of the average for consumption. The line is the best estimate when no other contributing factor such as income is considered. Model (1.8) from Chapter 1 is reproduced below as equation (2.1).

Consumption = β0 + β1 Income + ε (2.1)

If the marginal propensity to consume MPC = β1 = 0, then consumption function becomes a constant, and the graph of a constant on a scatter plot is a flat line. This line, by virtue of representing consumption data, will be in the middle of the scatter plot of consumption–income data points and can serve as a model representing the data.

When income is completely ignored, for example, when MPC = β1 = 0, then the parameter of interest is the average consumption. The regression estimate of such a model is presented in equation (2.2)

Eqn 1.wmf

(2.2)

It can be proven that Eqn 2.wmf from equation (2.2) is actually the estimate of average consumption; the graph of which is the dashed horizontal line in Figure 2.1. As is evident, obtaining an estimate for a model that represents income–consumption is as easy as finding the average of consumption data. Therefore, a possible regression line for model (2.1) is the average of the dependent variable. A question remains unanswered. How good is this flat line in explaining real data? It turns out that the average of the dependent variable is not a “good” regression line. The reason is that many other alternative lines, and specifically, the regression line obtained by the method of least squares, provide “better” results. The concept better refers to the smallness of the average squared error.

There are several error concepts, which are closely related. We will define some of these concepts first. We then use an appropriate error concept to demonstrate that the regression line obtained from the method of least squares is better than the one obtained from the average of the dependent variable. The first error concept is the deviation of observations from the mean, which is called individual error.

Definition 2.1

An individual error is the difference between an observed value and its expected value.

When only consumption is considered and the contribution of income in explaining variations in consumption is ignored, the expected value is represented by the mean of consumption. The differences of observed values from population mean result in 21 individual errors, depicted in Table 2.1, where “I ” is income, “C ” is consumption, “μC” is average consumption, (CμC) is the difference between consumption and its average, , pronounced C hat, is the regression estimate of Y or estimated consumption, and is the deviation of consumption from its regression line. Comparison of the sums of (CμC) and , the deviations of consumption from its average and from its regression on income, is useless, because both columns sum to zero, as is evident at the bottom of the columns labeled (CμC) and , respectively. Note that both the mean of consumption ( μC) and the regression estimates are expected values of consumption based on two different procedures.

Table 2.1. Deviations from Average and Regression Line

Sources: Bureau of Economic Analysis, National Income and Product Account Tables: Table 2.3.5—Personal Consumption Expenditures by Major Type of Product. Bureau of Economic Analysis, GDP and Personal Income: SA1-3 Personal Income Summary.

Rule 2.1

The sum, and thus the average, of deviations of values from their expected value is always zero. In other words, sum of individual errors is always zero.

This is due to cancelation of negative and positive errors. One way to avoid this outcome is to square the deviations, which are presented in Table 2.1 in columns with headings (CμC )2 and , respectively.

Note the following relationships. The total for consumption is the same as the total for the average of consumption repeated 21 times in the column named mean consumption (μC) and the column of estimated consumption . The reason for the sum for estimated consumption being equal to the actual sum is that on average there is no error in regression estimates. Note that the sum of squared deviations of consumption from mean of consumption (C – mC )2 = 760,086,436 is much larger than the square of deviation of consumption from regression estimates . Comparing the results of the sums of squared (SS) values in the last two columns of Table 2.1 is unreasonable because the values are based on different numbers of observations, and therefore, different numbers of degrees of freedom. The totals for the last two columns of Table 2.1 are part of the regression output. Regress consumption on income and compare your results with those in Table 2.2. Instructions for performing regression analysis in Excel are provided in Chapter 3.

Table 2.2. Output for Regression of Consumption on Income for the Years 1929–2010 in the US

Abbreviation: ANOVA, analysis of variance.

Explanation of Output

Detailed explanation of the output is provided in Chapter 5. Here we only focus on two values, the sum of squares of residual and the sum of squares total. It is alright to refer to these as residual sum of squares and total sum of squares, respectively. They are in rows two and three of column “SS” in the ANOVA section in Table 2.2. Note that these two values are exactly the same as the sum of the squares of deviations of consumption from regression line and the sum of the squares of deviation of consumption from its average headings (CμC)2 = 760,086,436 in Table 2.1, respectively. It is noteworthy that regression SS values is based on one degree of freedom, residual SS is based on 19 degrees of freedom, and total sum of squares is based on 20 degrees of freedom. This makes the comparison between the two values meaningless. To make these comparable, SS values are divided by their corresponding degrees of freedom to obtain mean squared (MS) values. As you see there is no MS value reported for the total.

Sum of squared total represents total variation in the dependent variable. In your earlier statistics course,1 you dealt with this value as the numerator of the sample variance. The sample variance consists of squared values of individual errors divided by degrees of freedom. It shows the amount of variation in the dependent variable that cannot be explained by the mean of the dependent variable. To verify this outcome, calculate the variance of consumption for the data in Table 2.1. The command in Excel is

Var.s(c2:c22) = 38,004,321.78

Dividing Total SS by degrees of freedom = 760,086,436/20 = 38,004,321.78

In older versions of Excel the command was

Var(c2:c22) = 38,004,321.78

Both numbers are rounded up to two decimal places. Part of the total sum of squared, or the previously unexplainable variation in the dependent variable, the value 760,086,436, can now be explained by the regression model. This amount, represented by regression SS is 757,881,401, which is displayed under SS column on the row for “regression.” This value is called sum of squares regression or regression sum of squares. As stated earlier, although this is a substantial difference it is misleading due to the fact that different numbers of values are used to get these sums. The sum for regression has one degree of freedom, while the sum for the total has 20 degrees of freedom. Other things equal, the sum of more numbers would be greater than the sum of fewer numbers. To determine whether the portion explained by regression is statistically significant, the regression sum of squares and the residual sum of squares are divided by their corresponding degrees of freedom. The customary comparison is between the portion explained and the portion that is unexplained, as will be discussed in detail in Chapter 6.

Residual sum of squares is the amount of variation in the dependent variable that is unexplained by either the mean of dependent variable or regression line. Here too, the appropriate value is the average value instead of the sum. To obtain average value of regression sum of squares and residual sum of squares, divide them by their respective degrees of freedom, which are 1 and 19 for this example. The results are called regression mean square (757,881,401) and residual mean square (116,054), respectively. These terms can also be called mean square regression and mean square residual, respectively. It is not unusual to refer to residual as error, in the above terms.

Verify that MS for regression and residual is obtained by dividing their respective SS by the corresponding degrees of freedom for data in Table 2.2. We will demonstrate this in an example in Chapter 3. The ratio of mean-squared regression to mean-squared residual provides the value of the F statistics. Dividing 757,881,401/116,054 = 6,530, which is provided in Table 2.2 in the regression output.

The above exercise demonstrates that regression analysis can reduce the amount of unexplained variations in the dependent variable. F statistics is the measure that verifies whether the explained portion of the dependent variable exceeds the unexplained portion. F statistics is discussed in more detail in Chapter 4 and more formally in Chapter 5.1 The above practice demonstrates that regression line is better than the average of the dependent variable in explaining variation of values of dependent variable. However, there are numerous other statistics that can be used to explain this variation. We need to establish which one produces the smallest average error. The idea of finding the smallest possible average of errors is a novel one. Unfortunately, it is useless if the deviations from mean are used because, as we showed earlier, the sums of deviations of observations from their mean or deviations from regression line are both zero. There is no smaller average error than zero error. The problem is not in the novelty of using the method; it is in the fact that errors cancel each other out. One remedy is to square the errors to avoid such cancellation, which was used in the above demonstrations. The sum of squared errors (SSE) is never zero unless all the errors are identical and equal to zero, which can happen if only if all the observations have the exact same values—another boring and unrealistic case. There are mathematical methods to find estimates of the parameters of model, namely β0 and β1, such that the estimated equation would have the least squared errors, which is the origin of the name of the OLS as an alternative to the name regression.

Minimizing the Squared (Individual) Errors

This section explains the concept and the method of minimizing squared values of individual errors. As explained above, individual errors are not identical, reflecting the random nature of real-life data. There are as many individual errors in any given dataset as the number of pairs or rows of data. Because in statistics we prefer to capture the nature and the essence of data in as few parameters as possible, it is better to use the average of individual errors. As the sum of all individual errors is always zero, their average is also zero. Squaring each individual error resolves this problem. SSE is always non-negative. The trivial case of zero sum of squared errors is excluded. Consequently, the more the number of observations the greater is the SSE, with other things being equal. This is the reason for dividing sum of squared residuals by its degrees of freedom, which yields average squared error. A more important factor affecting the size of SSE is the choice of the estimator. As we saw earlier in this chapter, you have been using the mean of the dependent variable, which is consumption in the above example, as a possible regression line. We also established that having information on variable(s) that influence the dependent variable, in this example the income, will reduce the amount of unexplained portion of the dependent variable, enabling the researcher to provide a better estimate for the dependent variable. The term better means smaller variance, or less error. Better estimates result in better forecasting.

When only consumption data is considered, sample consumption mean provides the best regression line; nevertheless, it ignores the contribution of factors that economics theory offers, such as income. Note that in this context the line is a horizontal line with zero slope. Taking advantage of the extra information provided by income and the use of the regression analysis improves the estimate. The regression line explains part of the previously unexplainable variation in the dependent variable. Because part of the previous error is explained by regression, it provides a better estimate than the average of the dependent variable. Once again, the adjective better is used to indicate smaller variance.

The regression line intersects with the line representing the average of consumption at exactly the average points of both income and consumption. The intersection of the two lines is the center of data. The center of data is actually its center of gravity. If each observation, pairs of income—consumption points, is given the same weight and is placed on a plane, the center of data is the point where the plane is balanced on a pin.

The simplest way to obtain estimates of parameters of model, namely β0 and β1, is to find partial derivative of SSE, set them equal to zero, and then solve the two resulting equations to obtain the values of β0 and β1 that minimizes SSE. This requires knowledge of calculus. Some of the books written for students majoring in statistics, mathematics, or closely related fields, provide derivation of the estimates of parameters β0 and β1 and mostly place them in their appendix.2 We will not concern ourselves with such derivations.

The method of least squares provides estimates for parameters β0 and β1 that minimize squares of individual errors. Note, however, that in the definition below

Individual error = observed – expected

The “expected” is no longer the mean or the average of the data; it is the regression estimate of the dependent variable.

The estimated values of dependent variable and parameters are represented by a “hat.”

Eqn 2.wmf

(2.3)

Thus, now

Individual error = Y

(2.4)

For the consumption example, these values are shown under column (CμC). Individual errors are represented by vertical distance between the observed value and the regression line in Figure 2.2; only some of the lines are shown to avoid clutter. Note that in some cases the distance between an observation and the regression line exceeds the distance between the same observation and the line representing the mean of dependent variable, which is consumption in our example. Nevertheless, mean of SSE obtained from regression line is the smallest average of squared errors including the one based on the average of dependent variable, which is also depicted in Figure 2.2 as Eqn 5.wmf.

CH002-F002.eps

Figure 2.2. Deviation of observations from mean and regression line.

Error

It is of vital importance not to mistake any of the concepts of errors discussed here with errors in measurement. Errors in measurement refer to incorrectly measuring or recording the values of dependent or independent variables. In some social science disciplines, lack of measurement error is called validity.

The notion of error plays an important role in statistics in general, particularly in regression analysis. We have seen individual error, error term in regression model, SSE, and mean of squared error (MSE). As one might expect there are close relationships between these concepts, some of which have been explained already. This section provides an opportunity to solidify these concepts and provide further insight into the error concept.

Definition 2.2

Error in statistics is what we are unable to explain. It is the difference between an observed value and its expected value.

The observed values are collected data about a phenomenon, such as consumption. The expected value, however, depends on the theory we use and the data that are available. For example, in the case of a single variable, the expected value is its mean. Therefore the error is

Error = Yμ,

where μ, pronounced mu, is population mean. When population mean is not available its estimate, the sample mean, is used.

Eqn 6.wmf

(2.5)

An observant student will notice that the above two formulas were called individual error earlier to focus on the fact that for each row of data there is one such error.

When a regression model is proposed, the claim is that the regression line explains part of what was previously unexplainable. Individual errors can be represented by the vertical distance between an observed point and the flat line that represents the average. The claim of the regression model is that the vertical distance reflecting the error can be partitioned into the segment explainable by the regression line and a smaller segment, on the average, that still cannot be explained, as depicted in Figure 2.2. Representing the estimated regression equation by , the individual error is the same as equation (2.4)

Individual error = Y

These individual errors are shown as distances between observations and regression line instead of the line for average consumption in Figure 2.2. The previous individual error can no longer be called error because at least part of it is explained by the estimated regression equation. The partition can be demonstrated very easily by the following identity:

(2.6)

Subtract and add the amount from the right-hand side of the equation. Because the same amount is added and subtracted, the identity remains valid.

(2.7)

Rearrange the right-hand side to obtain

The left-hand side is the deviation of observed values of dependent variable from their mean. The first term on the right-hand side represents the amount that is explained by regression, and the second term is what is left unexplained. It can be proven2 that squaring these individual errors and summing them up results in the following:

SST = SSR + SSE,

where SST is sum of squared total, SSR is sum of squared regression, and SSE is sum of squared errors or sum of squared residual. These are the three components in the column designated as SS in Excel output that were discussed and analyzed above in theoretical format and will also be discussed in Chapter 3 using examples.

Dividing SSE by the corresponding degrees of freedom, which is n – k, that is, number of (pairs) of observation minus number of parameters (b0 and b1, or 2 for the simplest case), results in MSE, which is the same as variance. Therefore, variance of a regression model is its MSE. Hence, square root of MSE is the average error or regression.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset