Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Appendix A

Multiple Regression and Matrices

In order to understand how the computer carries out regression analysis—which automatically leads to a better understanding of regression—it is necessary to understand something about matrix math. This section provides a brief introduction to matrix math.

Matrices are used to describe linear equations by tracking the coefficients of linear transformations and to record data that depend on multiple parameters.

An Introduction to Matrix Math

A matrix is a set of numbers in a two-dimensional space. Matrices are almost always designated with capital letters in boldface type. In general, a matrix X has n rows and m columns, as shown in Equation A.1, where the first subscript denotes the row and the second denotes the column.

A.1

A matrix is said to have dimensions equal to n and m so that the matrix X has dimensions n × m (rows first, columns second). The following is a typical example of a 2 × 3 matrix.

A vector is a matrix with dimensions n × 1 or 1 × m, as shown in Equation A.2.

A.2

The following are typical examples of a 3 × 1 vector and a 1 × 3 vector.

A scalar can be considered a matrix of dimensions 1 × 1. The individual entries of any matrix or vector are scalars, but scalars can stand alone. A scalar is simply a number.

The transpose of a matrix X , designated $b01-math-0005$ , is a matrix in which the first row of X becomes the first column of $b01-math-0006$ , the second row of X the second column of $b01-math-0007$ , and so on, so that if

or if

Addition and Subtraction of Matrices

Addition and subtraction may be performed on matrices if each has equal rows and equal columns. The two matrices X and Y may be added, as shown in Equation A.3.

A.3

The matrix Y can also be subtracted from X, as shown in Equation A.4.

A.4

Multiplication of Matrices

Multiplication can be performed on matrices if the number of columns in the first matrix is equal to the number of rows in the second. Such matrices will be said to conform for multiplication. The two matrices $b01-math-0012$ and $b01-math-0013$ can be multiplied, and the resulting matrix $b01-math-0014$ will be of dimension 2 × 2, as shown in Equation A.5. Although $b01-math-0015$ when a and b are both scalars, in general, $b01-math-0016$ does not equal $b01-math-0017$ and will not have the same dimensions. When the operation $b01-math-0018$ is carried out, Y may be said to be premultiplied by $b01-math-0019$ . When $b01-math-0020$ is carried out, Y is postmultiplied by $b01-math-0021$ . A particularly useful matrix product is $b01-math-0022$ , which provides the sum of squares and cross products for $b01-math-0023$ .

A.5

Matrix Multiplication and Scalars

Any matrices or vectors that are to be multiplied must conform for multiplication, with one exception: Any matrix or vector of any dimension can be pre- or postmultiplied by a scalar, as shown in Equation A.6.

A.6

Finding the Determinant of a Matrix

Division is, in general, undefined for matrices. With matrices in which rows equal columns and the determinant of the matrix does not equal 0, multiplication of a matrix by its inverse produces the same result as division. The determinant of a matrix is a scalar defined only for matrices in which rows equal columns. For the 2 × 2 matrix W , the determinant of W may be written as shown in Equation A.7.

A.7

If any column of a matrix is a linear transformation of one or more other columns, the determinant will be 0. If, for example, the matrix V is as shown in Equation A.8, the determinant of V will be 0 and the inverse of $b01-math-0027$ will not exist.

A.8

In scalar arithmetic, $b01-math-0029$ . If the inverse of a matrix W exists (see Equations A.7 and A.8), then $b01-math-0030$ , where I is a matrix with 1 in each entry on the main diagonal and 0 everywhere else. For example, suppose W is as defined in Equation A.9 (the matrix whose determinant is shown in Equation A.7).

A.9

Then, W ^–1 is as defined in Equation A.10, where $b01-math-0032$ , as defined in Equation A.7.

A.10

Finally, $b01-math-0034$ , as shown in Equation A.11.

A.11

Whereas it is relatively easy to find the inverse of a 2 × 2 matrix, finding the inverse of a 3 × 3 or larger matrix becomes increasingly difficult. One method developed early on in linear algebra is the square root method—described, for example, in Harmon (1960). Today it is easier to rely on computer programs that have the capability to produce the matrix inverse in order to obtain it. Fortunately, Excel is one such computer program. In the next section, we look at this capability of Excel's and how it works to solve the simultaneous equation problem of multiple regression.

Matrix Capabilities of Excel

Excel provides several functions that are specifically matrix functions and that are particularly useful in regard to the solution of the multiple regression problem. (Interestingly, Excel refers to matrices as arrays, and that term will be used to refer to a matrix during much of this discussion. But it is often confusing to try to stay with one or the other, so if you see “matrix” and “array” used interchangeably here, just remember that they are basically the same things.) Excel provides four functions that are specifically designed to manipulate arrays (or matrices). These are =MMULT, =MINVERSE, =MDETERM, and =TRANSPOSE.

Excel's Array Functions: `=MMULT` , `=MINVERSE` , `=MDETERM` , and `=TRANSPOSE`

The first of these functions, =MMULT, multiplies any two arrays together as long as the two conform for multiplication. The second, =MINVERSE, finds the inverse of any square array that has a nonzero determinant. If the determinant of the array is 0, Excel will return NUM# in the cells selected. Excel returns the determinant of the array for =MDETERM and the transpose of the array for =TRANSPOSE. To see how these operate and the results obtained, let us look again at the data in Figure 12.1. We will also use the intercept variable (constant), as shown in Figure 12.4, and this is reproduced in Figure A.1. In Figure A.1, the area that is highlighted in black will be referred to with matrix notation as X (matrices other than those that are m × 1 or 1 × m are always designated with bold roman capitals). The cost column (not highlighted) will be designated with matrix notation as y (matrices that are m × 1 or 1 × m are called vectors and are always referred to with bold roman lowercase letters).

Image described by surrounding text. — **Figure A.1** Hospital data

Manipulating a Matrix with Excel

Figure A.2 shows the data given in Figure A.2, but with several modifications. Now, the independent variable array is designated as simply y , the independent variable array is designated as X , and a new array, designated as X ′, is shown in cells G2:P4. In the formula line in Figure A.2 is the expression {=TRANSPOSE(C2:E11)}. This is the transpose function, and the resulting array X 9 is the transpose of X . It should be noted that the braces surrounding the =TRANSPOSE function mean that it is one of those functions that enter data into more than one cell and must be invoked by holding down Ctrl+Shift and then pressing Enter.

Using the `MMULT` Function in Excel

The =MMULT array function can be used to produce two new arrays, X ′X and X ′y, which are shown in Figure A.3. The array X ′X is highlighted, and the formula for that array is shown in the formula bar. It can be seen there that the array is produced with {=MMULT(G2:P4,C2:E11)}. The braces again indicate that the function was completed with Ctrl+Shift+Enter. The array X ′y is produced with the array function {=MMULT(G2:P4,A2:A11)}.

There is something else to note about Figure A.3 and its comparison with the data in cells G3:G11 of Figure 12.4. The array X ′X includes all the values in cells G3 to G8, and the array X ′y includes the values in cells G9 to G11. The calculation of the X ′X and the X ′y arrays using the matrix capabilities of Excel produces all the values needed for the calculation of the coefficients b_j. It is important to note that the results shown in X ′X (3 × 3) would not be the same as the results for X ′X , which would be 10 × 10. Moreover, yX ′ does not even exist, because the two arrays in that format, (10 × 1) and (3 × 10), cannot be multiplied.

Solving the Regression Equations in Excel

Using the data in X ′X and X ′y from Figure A.3, it is possible to rewrite the second three expressions in Equation (12.7), as shown in Equation A.12. We have already seen that these three equations can be solved for the coefficients b_j with successive elimination. But with our understanding of matrix math, it can also be seen that Equation A.12 can be rewritten in matrix format, as shown in Equation A.14.

A.12

A.13

A.14

In matrix notation, the formulation in Equation A.14 is $b01-math-0039$ . Now, how can matrix math be used to solve this equation for b? If we were dealing strictly with scalar arithmetic and we had an equation such as $b01-math-0040$ , and x and y were known, we could solve for b by $b01-math-0041$ . But this formulation is the same as $b01-math-0042$ . The expression $b01-math-0043$ is the inverse of x and can also be written as $b01-math-0044$ . Similarly, in matrix terms, it turns out that if we can find the inverse of $b01-math-0045$ , we can solve the equation $b01-math-0046$ with the equation X ′X ⁻¹ X ′X b = X ′X ⁻¹ X y, which turns out to be X ′X ⁻¹ X y = b. Happily, Excel provides the ability to find X ′X ⁻¹. To see how Excel provides the inverse, we can look at Figure A.4. In Figure A.4, the inverse of $b01-math-0047$ is shown in the highlighted area, cells M7:O9. The {=MINVERSE()} function used to produce the inverse array is shown on the formula bar. Note again that it is enclosed in braces to denote that it was completed with Ctrl+Shift+Enter. Also, it is important to remember that the entire area consisting of cells M7:O9 was highlighted before the {=MINVERSE()} function was typed.

Explanation of Excel Output Displayed with Scientific Notation

A further point that should be made about the inverse array is that the numbers, particularly in cells M7, N7, and M8, may appear a little strange. The way these numbers are shown is known as scientific notation. In more familiar terms, the number 1.98E-05 is actually the number 0.0000198. The E-05 means that there are five decimal places before the decimal place in 1.98E-05. It should be recognized that any number that is followed by E with a minus sign and a number is a very small number. Excel uses the E convention to put numbers into cells that otherwise would be two narrow to display the string of decimal places before the actual values begin.

The E convention is also used for very large numbers. For example, a number that would be shown in Excel as 1.98E+11 would actually be 198,000,000,000. This is a relatively economical way for Excel to display very large numbers.

Now that we have found the inverse of $b01-math-0048$ , it is possible to solve for the vector b. This solution is shown in Figure A.5. The coefficients $b01-math-0049$ (found by the =MMULT function, as shown in the formula line of Figure A.5) are exactly the same (except for order and decimal places) as the b_j values found by using the built-in Excel package under Data Analysis &cmdarr; Regression. Refer to Figure 12.3 for the Data Analysis results.

The highlighted area, cells G12:G14, depicts results of finding b coefficients by using the =MMULT function. The formula used is =MMULT(M7:O9,K7:K9). — **Figure A.5** The b coefficients

Using the b Coefficients to Generate Regression Results

In general, the calculation of all the important results relative to multiple regression can all be developed, once the b coefficients are available. These include the predicted value of $b01-math-0050$ , the total sum of squares $b01-math-0051$ , regression sum of squares $b01-math-0052$ and error sum of squares $b01-math-0053$ , the value of $b01-math-0054$ and the correlation coefficient (R), the standard error of estimate $b01-math-0055$ , and the F tests. These follow the same development as that described for simple linear regression in Chapter 11. There is one place in which the remaining calculations diverge from that shown for simple linear regression, in the calculation of the standard errors for the coefficients b_j. The calculation of the standard errors of the coefficients for multiple regression relies on values that appear only in the matrix X ′X ⁻¹.

Recall that the standard error of estimate, Syx, the standard error of the predicted values of y, is found by dividing the error sum of squares, $b01-math-0056$ , by $b01-math-0057$ and taking the square root of the result. In simple linear regression, the standard error of b_j is found by dividing Syx by the square root of the sum of squares in $b01-math-0058$ . This is essentially the same as multiplying Syx by the square root of 1/SS_X. In multiple regression, the equivalent result is obtained by multiplying Syx by the square root of each of the main diagonal elements of $b01-math-0059$ .

To see the calculation of the standard errors for the coefficients b_j, consider Figure A.6. Here the value of Syx has been copied from Figure 12.3. The formula bar shows that the cell highlighted in white is calculated by multiplying the value in cell $M$11 (Syx) by the square root of the value in cell M7 (here 1.98E-05). Because cell $M$11 is fixed, the formula can then be copied into the other cells in row M and in rows N and O to produce, on the main diagonal, the standard errors for all the values of the coefficients b. The expression #NUM! that appears in the off-diagonal cells indicates, in this case, that we tried to take the square root of a negative number. In any case, it is the main diagonal elements that are of interest. It can be seen that except for ordering, they match the values found for the standard errors of the coefficients b in Figure 12.3. These standard error values can be used directly to perform the t test that determines whether the appropriate coefficients are different from 0.

Calculation of All Multiple Regression Results

Figure A.7 shows the analysis as given in Figure A.5 and Figure A.6. In addition, it shows all the other important results of multiple regression, as shown originally in Figure 12.3. The section of column A labeled $b01-math-0060$ was calculated using the matrix function =MMULT(C2:E11,G13:G15), which is essentially the formula for the prediction of the y values cast in matrix form. The values in column C labeled y − $b01-math-0061$ represent the difference between the actual values of y and the mean of y. The value in cell C14 was calculated with the formula =A2–AVERAGE($A$2:$A$11). That formula can be copied all the way to C23 to produce all the y − $b01-math-0062$ values. The value for the total sum of squares, SST, in cell H18, is calculated using the Excel function =SUMSQ(C14:C23).

The column labeled yh (for $b01-math-0063$ ) − yb (for $b01-math-0064$ ) was calculated by subtracting the predicted values of y from the average values for y. The formula for cell D14 is =A14–AVERAGE($A$2:$A$11). That formula can be copied to all the cells from D14 to D23 to create all the values of $b01-math-0065$ . The value of the sum of squares due to regression, SS_R, in cell H19 is calculated using =SUMSQ(D14:D23). The column labeled $b01-math-0066$ was calculated by subtracting each value of ŷ from y. The formula for cell E14 is =A2–A14. The sum of squares error, SS_E, in cell H20 was calculated using =SUMSQ(E14:E23).

Degrees of freedom in any multiple regression will always be $b01-math-0067$ for SS_T, equal to m (the number of predictor variables) for SS_R, and $b01-math-0068$ for SS_E. In the case of this analysis, these values are shown in cells I18:I20 as 9, 2, and 7, respectively. The mean square values in cells J19 and J20 are the sum of squares values divided by their degrees of freedom. The overall F test for the independence of y from the two variables x is calculated by dividing the mean square due to regression by the mean square error. The probability of the F value, p, is given by the Excel function =FDIST(K19,2,7).

The R² value in cell H22 is calculated by dividing $b01-math-0069$ by $b01-math-0070$ . The value of S.E. (standard error of estimate) in cell H23 is obtained by dividing $b01-math-0071$ by its degrees of freedom and taking the square root of the result. This value was used to multiply by the square root of the values in cells M12, N13, and 014 to produce the standard error of each value of b_j in cells H13:H15. The t values for each of the b_j are calculated by dividing each value of b by its standard error. The probabilities of the t values are given by =TDIST(t,d.f.,2). The degrees of freedom for any value of b_j are equal to the degrees of freedom for SS_E. Because two of the b_j values are negative, the =TDIST function will produce only the result #NUM!. In order to produce the actual probability of the coefficients, it is necessary to include the Excel =ABS function to get the absolute value of t before getting the probability of the t value. So, for example, the probability in cell J13 was obtained with =TDIST(ABS(I13),7,2). Table A.1 displays the formulas for the calculation of the multiple regression output from Figure A.7.

Table A.1 Formulas for Figure A.7

Cell or Column	Formula	Notes
G2:P4	`{=TRANSPOSE(C2:E11)}`	Remember that the braces signify that you must press Crtl+Shift to invoke the function.
G7:I9	`{=MMULT(G2:P4,C2:E11)}`	This generates the X × X matrix.
K7:K9	`{=MMULT(G2:P4,A2:A11)}`	This generates the X × y matrix.
M7:O9	`{=MINVERSE(G7:I9)}`	This generates the X × X⁻¹ matrix.
G13:G15	`{=MMULT(M7:O9,K7:K9)}`	This generates the b coefficients.
M12	`=323.9537*SQRT(M7)`	Copied from cell M12 through to cell M14 and over to cell O14
A14:A23	`{=MMULT(C2:E11,G13:G15)}`	This generates the ŷ values.
C14	`=A2–AVERAGE($A$2:$A$11)`	Copied down to cell C23
D14	`=A14–AVERAGE($A$2:$A$11)`	Copied down to cell D23
E14	`=A2–A14`	Copied down to cell E23
H18	`=SUMSQ(C14:C23)`	This generates the total sum of squares, SST.
H19	`=SUMSQ(D14:D23)`	This generates the sum of square error, SSR.
H20	`=SUMSQ(E14:E23)`	This generates the sum of square error, SSE.
I18	9	Equals n − 1 or 10 − 1 = 9
I19	2	Equals m or 2
I20	7	Equals n − m − 1 or 10 − 2 − 1=7
J19	`=H19/I19`	Copied down to cell J20
K19	`=J19/J20`
L19	`=FDIST(K19,2,7)`
H22	`=H19/H18`
H23	`=SQRT(H20/I20)`
H13	`=M12`
H14	`=N13`
H15	`=O14`
I13	`=G13/H13`	Copied down to cell I15
J13	`=TDIST(ABS(I13),I20,I19)`	Copied down to cell J15

Exercises for Appendix A

Use the data on the Hospitals worksheet of Chpt 12-1.xls, and replicate the analysis shown in Figure A.7.
1. Set up the data from the Hospitals worksheet, as shown in Figure A.7, with the variables X₁ and X₂ in cells C2:C11 and D2:D11, respectively, and supply a third independent variable, the constant, as shown in cells E2:E11.
2. Generate the transpose of the matrix X in cells G2:P4 by using =TRANSPOSE(C2:E11). Remember that the entire area into which the transpose matrix will go must be highlighted prior to invoking the function and that the function must be invoked with Ctrl+Shift+Enter.
3. Calculate the values of X′X in cells G7:I9 by using =MMULT (G2:P4,C2:E11). This function, too, requires highlighting of the entire three-by-three cell area and pressing Ctrl+Shift+Enter.
4. Use =MMULT(G2:P4,A2:A11) to calculate the values of X′y in cells K7:K9.
5. Use =MINVERSE(G7:I9) to obtain the inverse of the matrix X′X in cells M7:O9.
6. Calculate the values of b in cells G13:G15, using =MMULT (M7:O9,K7:K9).
7. Using the values of b and the original matrix X , generate the values of $b01-math-0072$ in cells A14:A23 with =MMULT(C2:D11,G13:G15).
8. Generate the values of $b01-math-0073$ in cells C14:C23, using y-AVERAGE ($A$2:$A$11), and calculate the total sum of squares (SS_T) in cell H18, using =SUMSQ(C14:C23).
9. Generate the values of $b01-math-0074$ in cells D14:D23, using yhat-AVERAGE ($A$2:$A$11), and calculate the regression sum of squares (SS_R) in cell H19, using =SUMSQ(D14:D23).
10. Generate the values of $b01-math-0075$ in cells E14:E23, and calculate the error sum of squares (SS_E) with =SUMSQ(E14:E23).
11. Calculate the value of R² in cell H22 by dividing the regression sum of squares by the total sum of squares.
12. Calculate the value of S.E. in cell H23 by taking the square root of $b01-math-0076$ .
13. Determine the degrees of freedom for $b01-math-0077$ , $b01-math-0078$ , and $b01-math-0079$ , and put these into cells H13:H15. Calculate the values of MS_R and MS_E in cells J19 and J20, and calculate the value of F in cell K19 to determine its probability in cell K19.
14. Calculate the values of the S.E. b by multiplying the value of S.E. by the square root of each value in cells M7:O9 and by putting the answer in cells M12:O14. Copy the main diagonal cells (the standard error of the coefficients) to cells H13:H15.
15. Calculate the values of t in cells I13:I15, and determine the probability of these values of t in cells J13:J15.
Give an interpretation of this analysis by briefly stating in words the following:
1. The meaning of the three b coefficients in cells G13:G15
2. The meaning of the three t tests and their probabilities
3. The meaning of the values of $b01-math-0080$ , $b01-math-0081$ , and $b01-math-0082$
4. The meaning of the F test and its probability
5. The meaning of $b01-math-0083$
6. The meaning of S.E.
Use the data on the SWC worksheet of Chpt 12-1.xls, and carry out an analysis, as shown in Figure A.7.
1. Supply a third independent variable—the constant.
2. Generate the transpose of the matrix X , using the =TRANSPOSE function. Remember that the entire area into which the transpose matrix will go must be highlighted prior to invoking the function, and that the function must be invoked with Ctrl+Shift+Enter.
3. Calculate the values of $b01-math-0084$ , using the =MMULT function. This function, too, requires highlighting the entire three-by-three cell area and pressing Ctrl+Shift+Enter.
4. Use =MMULT to calculate the values of X′y .
5. Use =MINVERSE to obtain the inverse of the matrix X′y .
6. Calculate the values of b, using the =MMULT function.
7. Using the values of b and the original matrix X , generate the values of with =MMULT.
8. Generate the values of $b01-math-0085$ , using y-AVERAGE, and calculate the total sum of squares (SS_T), using the =SUMSQ function.
9. Generate the values of $b01-math-0086$ , using $b01-math-0087$ , and calculate the regression sum of squares $b01-math-0088$ , using the =SUMSQ function.
10. Generate the values of $b01-math-0089$ , and calculate the error sum of squares (SS_E) with the =SUMSQ function.
11. Calculate the value of R² by dividing the regression sum of squares by the total sum of squares.
12. Calculate the value of S.E. by taking the square root of [SS_E/(n − m − 1)].
13. Determine the degrees of freedom for $b01-math-0090$ , $b01-math-0091$ , and $b01-math-0092$ . Calculate the values of $b01-math-0093$ and $b01-math-0094$ and the value of F and determine its probability.
14. Calculate the values of the S.E. b by multiplying the value of S.E. by the square root of each value in $b01-math-0095$ .
15. Calculate the values of t, and determine the probability of these values of t.
Give an interpretation of this analysis by briefly stating in words the following:
1. The meaning of the three b coefficients in cells G13:G15
2. The meaning of the three t tests and their probabilities
3. The meaning of the values of SS_T, SS_R, and SS_E
4. The meaning of the F test and its probability
5. The meaning of R²
6. The meaning of S.E.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Appendix A: Multiple Regression and Matrices

Create new playlist

Sign In

Sign Up