Chapter 12

Testing More Than Two Samples

IN THIS CHAPTER

Understanding why multiple t-tests won’t work

Working with ANOVA

Taking the next step after ANOVA

Working with repeated measures

Performing a trend analysis

Statistics would be limited if you could only make inferences about one or two samples. In this chapter, I discuss the procedures for testing hypotheses about three or more samples. I show what to do when samples are independent of one another, and what to do when they’re not. In both cases, I discuss what to do after you test the hypotheses.

I also introduce Excel data analysis tools that do the work for you. Although these tools aren’t at the level you’d find in a dedicated statistical package, you can combine them with Excel’s standard features to produce some sophisticated analyses.

Testing More Than Two

Imagine this situation. Your company asks you to evaluate three different methods for training its employees to do a particular job. You randomly assign 30 employees to one of the three methods. Your plan is to train them, test them, tabulate the results, and make some conclusions. Before you can finish the study, three people leave the company — one from the Method 1 group and two from the Method 3 group.

Table 12-1 shows the data.

TABLE 12-1 Data from Three Training Methods

Method 1

Method 2

Method 3

95

83

68

91

89

75

89

85

79

90

89

74

99

81

75

88

89

81

96

90

73

98

82

77

95

84

80

Mean

93.44

85.20

75.25

Variance

16.28

14.18

15.64

Standard Deviation

4.03

3.77

3.96

Do the three methods provide different results, or are they so similar that you can’t distinguish among them? To decide, you have to carry out a hypothesis test:

H0: μ1 = μ2 = μ3

H1: Not H0

with α = .05.

A thorny problem

Sounds pretty easy, particularly if you’ve read Chapter 11. Take the mean of the scores from Method 1, the mean of the scores from Method 2, and do a t-test to see if they’re different. Follow the same procedure for Method 1 versus Method 3, and for Method 2 versus Method 3. If at least one of those t-tests shows a significant difference, reject H0. Nothing to it, right? Wrong. If your α is .05 for each t-test, you’re setting yourself up for a Type I error with a probability higher than you planned on. The probability that at least one of the three t-tests results in a significant difference is way above .05. In fact, it’s .14, which is way beyond acceptable. (The mathematics behind calculating that number is a little involved, so I won’t elaborate.)

With more than three samples, the situation gets even worse. Four groups require six t-tests, and the probability that at least one of them is significant is .26. Table 12-2 shows what happens with increasing numbers of samples.

TABLE 12-2 The Incredible Increasing Alpha

Number of Samples t

Number of Tests

Pr (At Least One Significant t)

3

3

.14

4

6

.26

5

10

.40

6

15

.54

7

21

.66

8

28

.76

9

36

.84

10

45

.90

Carrying out multiple t-tests is clearly not the answer. So what do you do?

A solution

It’s necessary to take a different approach. The idea is to think in terms of variances rather than means.

I’d like you to think of variance in a slightly different way. The formula for estimating population variance, remember, is

images

Because the variance is almost a mean of squared deviations from the mean, statisticians also refer to it as Mean Square. In a way, that’s an unfortunate nickname: It leaves out “deviation from the mean,” but there you have it.

The numerator of the variance — excuse me, Mean Square — is the sum of squared deviations from the mean. This leads to another nickname, Sum of Squares. The denominator, as I say in Chapter 10, is degrees of freedom (df). So, the slightly different way to think of variance is

images

You can abbreviate this as

images

Now, on to solving the thorny problem. One important step is to find the Mean Squares hiding in the data. Another is to understand that you use these Mean Squares to estimate the variances of the populations that produced these samples. In this case, assume those variances are equal, so you’re really estimating one variance. The final step is to understand that you use these estimates to test the hypotheses I show you at the beginning of the chapter.

Three different Mean Squares are inside the data in Table 12-1. Start with the whole set of 27 scores, forgetting for the moment that they’re divided into three groups. Suppose you want to use those 27 scores to calculate an estimate of the population variance. (A dicey idea, but humor me.) The mean of those 27 scores is 85. I’ll call that mean the grand mean because it’s the average of everything.

So the Mean Square would be

images

The denominator has 26 (27 – 1) degrees of freedom. I refer to that variance as the total variance, or in the new way of thinking about this, the MSTotal. It’s often abbreviated as MST.

Here’s another variance to consider. In Chapter 11, I describe the t-test for two samples with equal variances. For that test, you put the two sample variances together to create a pooled estimate of the population variance. The data in Table 12-1 provide three sample variances for a pooled estimate: 16.28, 14.18, and 15.64. Assuming these numbers represent equal population variances, the pooled estimate is

images
images

Because this pooled estimate comes from the variance within the groups, it’s called MSWithin, or MSW.

One more Mean Square to go — the variance of the sample means around the grand mean. In this example, that means the variance in these numbers: 93.44, 85.20, and 75.25 — sort of. I said “sort of” because these are means, not scores. When you deal with means, you have to take into account the number of scores that produced each mean. To do that, you multiply each squared deviation by the number of scores in that sample.

So this variance is

images

The df for this variance is 2 (the number of samples – 1).

Statisticians, not known for their crispness of usage, refer to this as the variance between sample means. (Among is the correct word when you’re talking about more than two items.) This variance is known as MSBetween, or MSB.

So you now have three estimates of population variance: MST, MSW, and MSB. What do you do with them?

Remember that the original objective is to test a hypothesis about three means. According to H0, any differences you see among the three sample means are due strictly to chance. The implication is that the variance among those means is the same as the variance of any three numbers selected at random from the population.

If you could somehow compare the variance among the means (that’s MSB, remember) with the population variance, you could see if that holds up. If only you had an estimate of the population variance that’s independent of the differences among the groups, you’d be in business.

Ah … but you do have that estimate. You have MSW, an estimate based on pooling the variances within the samples. Assuming those variances represent equal population variances, this is a pretty solid estimate. In this example, it’s based on 24 degrees of freedom.

The reasoning now becomes: If MSB is about the same as MSW, you have evidence consistent with H0. If MSB is significantly larger than MSW, you have evidence that’s inconsistent with H0. In effect, you transform these hypotheses:

H0: μ1 = μ2 = μ3

H1: Not H0

into these

H0: σB2 ≤ σW2

H1: σB2 > σW2

Rather than multiple t-tests among sample means, you perform a test of the difference between two variances.

What is that test? In Chapter 11, I show you the test for hypotheses about two variances. It’s called the F-test. To perform this test, you divide one variance by the other. You evaluate the result against a family of distributions called the F-distribution. Because two variances are involved, two values for degrees of freedom define each member of the family.

For this example, F has df = 2 (for the MSB) and df = 24 (for the MSW). Figure 12-1 shows what this member of the F family looks like. For our purposes, it’s the distribution of possible F values if H0 is true. (Refer to “Visualizing the F-Distribution” in Chapter 11.)

image

FIGURE 12-1: The F-distribution with 2 and 24 degrees of freedom.

The test statistic for the example is

images

What proportion of area does this value cut off in the upper tail of the F-distribution? From Figure 12-1, you can see that this proportion is microscopic, as the values on the horizontal axis only go up to 5. (And the proportion of area beyond 5 is tiny.) It’s way less than .05.

This means that it’s highly unlikely that differences among the means are due to chance. It means that you reject H0.

remember This whole procedure for testing more than two samples is called the analysis of variance, often abbreviated as ANOVA. In the context of an ANOVA, the denominator of an F-ratio has the generic name error term. The independent variable is sometimes called a factor. So this is a single-factor (or one-factor) ANOVA.

In this example, the factor is Training Method. Each instance of the independent variable is called a level. The independent variable in this example has three levels.

More complex studies have more than one factor, and each factor can have many levels.

Meaningful relationships

Take another look at the Mean Squares in this example, each with its Sum of Squares and degrees of freedom. Before, when I calculated each Mean Square for you, I didn’t explicitly show you each Sum of Squares, but here I include them:

images
images
images

Start with the degrees of freedom: dfB = 2, dfW = 24, and dfT = 26. Is it a coincidence that they add up? Hardly. It’s always the case that

images

How about those Sums of Squares?

images

Again, this is no coincidence. In the analysis of variance, this always happens:

images

In fact, statisticians who work with the analysis of variance speak of partitioning (read “breaking down into non-overlapping pieces”) the SST into one portion for the SSB and another for the SSW, and partitioning the dfT into one amount for the dfB and another for the dfW.

After the F-test

The F-test enables you to decide whether or not to reject H0. After you decide to reject, then what? All you can say is that somewhere within the set of means, something is different from something else. The F-test doesn’t specify what those “somethings” are.

Planned comparisons

In order to get more specific, you have to do some further tests. Not only that, you have to plan those tests in advance of carrying out the ANOVA.

What are those tests? Given what I mention earlier, this might surprise you: t-tests. While this might sound inconsistent with the increased alpha of multiple t-tests, it’s not. If an analysis of variance enables you to reject H0, then it’s okay to use t-tests to turn the magnifying glass on the data and find out where the differences are. And as I’m about to show you, the t-test you use is slightly different from the one I discuss in Chapter 11.

These post-ANOVA t-tests are called planned comparisons. Some statisticians refer to them as a priori tests. I illustrate by following through with the example. Suppose before you gathered the data, you had reason to believe that Method 1 would result in higher scores than Method 2, and that Method 2 would result in higher scores than Method 3. In that case, you plan in advance to compare the means of those samples in the event your ANOVA-based decision is to reject H0.

The formula for this kind of t-test is

images

It’s a test of

H0: μ1 ≤ μ2

H1: μ1 > μ2

MSW takes the place of the pooled estimate sp2 I show you in Chapter 11. In fact, when I introduced MSW, I showed how it’s just a pooled estimate that can incorporate variances from more than two samples. The df for this t-test is dfW, rather than (n1 – 1) + (n2 – 1).

For this example, the Method 1 versus Method 2 comparison is

images

With df = 24, this value of t cuts off a miniscule portion of area in the upper tail of the t-distribution. The decision is to reject H0.

The planned comparison t-test formula I show you matches up with the t-test for two samples. You can write the planned comparison t-test formula in a way that sets up additional possibilities. Start by writing the numerator

images

a bit differently:

images

The +1 and –1 are comparison coefficients. I refer to them, in a general way, as c1 and c2. In fact, c3 and images can enter the comparison, even if you’re just comparing images with images:

images

The important thing is that the coefficients add up to zero.

Here’s how the comparison coefficients figure into the planned comparison t-test formula for a study that involves three samples:

images

Applying this formula to Method 2 versus Method 3:

images

The value for t indicates the results from Method 2 are significantly higher than the results from Method 3.

You can also plan a more complex comparison — say, Method 1 versus the average of Method 2 and Method 3. Begin with the numerator. That would be

images

With comparison coefficients, you can write this as

images

If you’re more comfortable with whole numbers, you can write it as:

images

Plugging these whole numbers into the formula gives you

images

Again, strong evidence for rejecting H0.

Unplanned comparisons

Things would get boring if your post-ANOVA testing is limited to comparisons you have to plan in advance. Sometimes you want to snoop around your data and see if anything interesting reveals itself. Sometimes something jumps out at you that you didn’t anticipate.

When this happens, you can make comparisons you didn’t plan on. These comparisons are called a posteriori tests, post hoc tests, or simply unplanned comparisons. Statisticians have come up with a wide variety of these tests, many of them with exotic names and many of them dependent on special sampling distributions.

The idea behind these tests is that you pay a price for not having planned them in advance. That price has to do with stacking the deck against rejecting H0 for the particular comparison.

Of all the unplanned tests available, the one I like best is a creation of famed statistician Henry Scheffé. As opposed to esoteric formulas and distributions, you start with the test I already showed you, and then add a couple of easy-to-do extras.

The first extra is to understand the relationship between t and F. I’ve shown you the F-test for three samples. You can also carry out an F-test for two samples. That F-test has dfB = 1 and dfW = (N1 – 1) + (N2 – 1). The df for the t-test, of course, is (N1 – 1) + (N2 – 1). Hmmm … seems like they should be related somehow.

They are. The relationship between the two-sample t and the two-sample F is

images

Now I can tell you the steps for performing Scheffé’s test:

  1. Calculate the planned comparison t-test.
  2. Square the value to create F.
  3. Find the critical value of F for dfB and dfW at α = .05 (or whatever α you choose).
  4. Multiply this critical F by the number of samples – 1.

    The result is your critical F for the unplanned comparison. I’ll call this F’.

  5. Compare the calculated F to F’.

    If the calculated F is greater, reject H0 for this test. If it’s not, don’t reject H0 for this test.

Imagine that in the example, you didn’t plan in advance to compare the mean of Method 1 with the mean of Method 3. (In a study involving only three samples, that’s hard to imagine, I grant you.) The t-test is

images

Squaring this result gives

images

For F with 2 and 24 df and α = .05, the critical value is 3.403. (You can look that up in a table in a statistics textbook or you can use the worksheet function F.INV.RT.) So

images

Because the calculated F, 91.61, is greater than F’, the decision is to reject H0. You have evidence that Method 1’s results are different from Method 3’s results.

Data analysis tool: Anova: Single Factor

The calculations for the ANOVA can get intense. Excel has a data analysis tool that does the heavy lifting. It’s called Anova: Single Factor. Figure 12-2 shows this tool along with the data for the preceding example.

image

FIGURE 12-2: The Anova: Single Factor data analysis tool dialog box.

The steps for using this tool are:

  1. Enter the data for each sample into a separate data array.

    For this example, the data in the Method 1 sample are in column B, the data in the Method 2 sample are in column C, and the data for the Method 3 sample are in column D.

  2. Select Data | Data Analysis to open the Data Analysis dialog box.
  3. In the Data Analysis dialog box, scroll down the Analysis Tools list and select Anova: Single Factor.
  4. Click OK to open the Anova: Single Factor dialog box.

    This is the dialog box in Figure 12-2.

  5. In the Input Range box, enter the cell range that holds all the data.

    For the example, the data are in $B$2:$D$12. (Note the $ signs for absolute referencing.)

  6. If the cell ranges include column headings, select the Labels check box.

    I included the headings in the ranges, so I selected the box.

  7. The Alpha box has 0.05 as a default. Change that value if you’re so inclined.
  8. In the Output Options, select a radio button to indicate where you want the results.

    I selected New Worksheet Ply to put the results on a new page in the worksheet.

  9. Click OK.

    Because I selected New Worksheet Ply, a newly created page opens with the results.

Figure 12-3 shows the tool’s output, after I expand the columns. The output features two tables: SUMMARY and ANOVA. The SUMMARY table provides summary statistics of the samples — the number in each group, the group sums, averages, and variances. The ANOVA table presents the Sums of Squares, df, Mean Squares, F, P-value, and critical F for the indicated df. The P-value is the proportion of area that the F cuts off in the upper tail of the F-distribution. If this value is less than .05, reject H0.

image

FIGURE 12-3: Output from the Anova: Single Factor analysis tool.

Comparing the means

Excel’s ANOVA tool does not provide a built-in facility for carrying out planned (or unplanned) comparisons among the means. With a little ingenuity, however, you can use the Excel worksheet function SUMPRODUCT to do those comparisons.

The worksheet page with the ANOVA output is the launching pad for the planned comparisons. In this section, I take you through one planned comparison — the mean of Method 1 versus the mean of Method 2.

Begin by creating columns that hold important information for the comparisons. Figure 12-4 shows what I mean. I put the comparison coefficients in column J, the squares of those coefficients in column K, and the reciprocal of each sample size (1/n) in column L.

image

FIGURE 12-4: Carrying out a planned comparison.

A few rows below those cells, I put t-test-related information — the t-test numerator, the denominator, and the value of t. I use separate cells for the numerator and denominator to simplify the formulas. You can put them together in one big formula and just have a cell for t, but it’s hard to keep track of everything.

SUMPRODUCT takes arrays of cells, multiplies the numbers in the corresponding cells, and sums the products. (This function is on the Math & Trig Functions menu, not the Statistical Functions menu.) I used SUMPRODUCT to multiply each coefficient by each sample mean and then add the products. I stored that result in K11. That’s the numerator for the planned comparison t-test. The formula for K11 is

=SUMPRODUCT(J5:J7,D5:D7)

The array J5:J7 holds the comparison coefficients, and D5:D7 holds the sample means.

K12 holds the denominator. I selected K12 in Figure 12-4 so that you could see its formula in the Formula bar:

=SQRT(D13*(SUMPRODUCT(K5:K7,L5:L7)))

D13 has the MSW. SUMPRODUCT multiplies the squared coefficients in K5:K7 by the reciprocals of the sample sizes in L5:L7 and sums the products. SQRT takes the square root of the whole thing.

K13 holds the value for t. That’s just K11 divided by K12.

K14 presents the P-value for t — the proportion of area that t cuts off in the upper tail of the t-distribution with df = 24. The formula for that cell is

=T.DIST.RT(K13,C13)

The arguments are the calculated t (in K13) and the degrees of freedom for MSW (in C13).

If you change the coefficients in J5:J7, you instantaneously create and complete another comparison.

In fact, I’ll do that right now, and show you Scheffé’s post hoc comparison. That one, in this example, compares the mean of Method 1 with the mean of Method 3. Figure 12-5 shows the extra information for this test, starting a couple of rows below the t-test.

image

FIGURE 12-5: Carrying out a post hoc comparison.

Cell K16 holds F, the square of the t value in K13. K17 has F’, the product of C12 (dfB, which is the number of samples – 1) and G12 (the critical value of F for 2 and 24 degrees of freedom and α = .05). K16 is greater than K17, so reject H0 for this comparison.

Another Kind of Hypothesis, Another Kind of Test

The preceding ANOVA works with independent samples. As Chapter 11 explains, sometimes you work with matched samples. For example, sometimes a person provides data in a number of different conditions. In this section, I introduce the ANOVA you use when you have more than two matched samples.

This type of ANOVA is called repeated measures. You’ll see it called other names, too, like randomized blocks or within subjects.

Working with repeated measures ANOVA

To show how this works, I extend the example from Chapter 11. In that example, ten people participate in a weight-loss program. Table 12-3 shows their data over a three-month period.

TABLE 12-3 Data for the Weight-Loss Example

Person

Before

One Month

Two Months

Three Months

Mean

1

198

194

191

188

192.75

2

201

203

200

196

200.00

3

210

200

192

188

197.50

4

185

183

180

178

181.50

5

204

200

195

191

197.50

6

156

153

150

145

151.00

7

167

166

167

166

166.50

8

197

197

195

192

195.25

9

220

215

209

205

212.25

10

186

184

179

175

181.00

Mean

192.4

189.5

185.8

182.4

187.525

Is the program effective? This question calls for a hypothesis test:

H0: μBefore = μ1 = μ2 = μ3

H1: Not H0

Once again, I set α = .05

As in the previous ANOVA, start with the variances in the data. The MST is the variance in all 40 scores from the grand mean, which is 187.525:

images

The people participating in the weight-loss program also supply variance. Each one’s overall mean (his or her average over the four measurements) varies from the grand mean. Because these data are in the rows, I call this MSRows:

images

The means of the columns also vary from the grand mean:

images

One more source of variance is in the data. Think of it as the variance left over after you pull out the variance in the rows and the variance in the columns from the total variance. Actually, it’s more correct to say that it’s the Sum of Squares left over when you subtract the SSRows and the SSColumns from the SST.

This variance is called MSError. As I say earlier, in the ANOVA the denominator of an F is called an error term. So the word error here gives you a hint that this MS is a denominator for an F.

To calculate MSError, you use the relationships among the Sums of Squares and among the df.

images

Here’s another way to calculate the dfError:

images

To perform the hypothesis test, you calculate the F:

images

With 3 and 27 degrees of freedom, the critical F for α = .05 is 2.96. (Look it up or use the Excel worksheet function F.INV.RT.) The calculated F is larger than the critical F, so the decision is to reject H0.

What about an F involving MSRows? That one doesn’t figure into H0 for this example. If you find a significant F, all it shows is that people are different from one another with respect to weight and that doesn’t tell you very much.

As is the case with the ANOVA I showed you earlier, you plan comparisons to zero in on the differences. You can use the same formula, except you substitute MSError for MSW:

images

The formula works out to the expression on the right because in a repeated measures design all the Ns are the same.

The df for this test is dfError.

For Scheffé’s post hoc test, you also follow the same procedure as earlier and substitute MSError for MSW. The only other change is to substitute dfColumns for dfB and substitute dfError for dfW when you find F’.

Getting trendy

In situations like the one in the weight-loss example, you have an independent variable that’s quantitative — its levels are numbers (0 months, 1 month, 2 months, 3 months). Not only that, but in this case, the intervals are equal.

With that kind of an independent variable, it’s often a good idea to look for trends in the data rather than just plan comparisons among means. If you graph the means in the weight-loss example, they seem to approximate a line, as Figure 12-6 shows. Trend analysis is the statistical procedure that examines that pattern. The objective is to see if the pattern contributes to the significant differences among the means.

image

FIGURE 12-6: The means for the weight-loss example.

A trend can be linear, as it apparently is in this example, or nonlinear (in which the means fall on a curve). In this example, I only deal with linear trend.

To analyze a trend, you use comparison coefficients — those numbers you use in planned comparisons. You just use them in a slightly different way than you did before.

Here, you use comparison coefficients to find a Sum of Squares for linear trend. I abbreviate that as SSLinear. This is a portion of SSColumns. In fact,

images

Also,

images

After you calculate SSLinear, you divide it by dfLinear to produce MSLinear. This is extremely easy because dfLinear = 1. Divide MSLinear by MSError and you have an F. If that F is higher than the critical value of F with df = 1 and dfError at your α –level, then weight is decreasing in a linear way over the time period of the weight-loss program.

The comparison coefficients are different for different numbers of samples. For four samples, the coefficients are –3, –1, 1, and 3. To form the SSLinear, the formula is

images

In this formula, N is the number of people and c represents the coefficients. Applying the formula to this example,

images

This is such a large proportion of SSColumns that SSNonlinear is really small:

images

As I point out earlier, df = 1, so MSLinear is conveniently the same as SSLinear.

Finally,

images

The critical value for F with 1 and 27 degrees of freedom and α = .05 is 4.21 (which I use F.INV.RT to calculate). Because the calculated value is larger than the critical value, statisticians would say the data shows a significant linear component. This, of course, verifies what you see shown earlier in Figure 12-6.

Data analysis tool: Anova: Two Factor Without Replication

Huh? Is that a misprint? Two-Factor??? Without Replication?? What’s that all about?

Here’s the story: If you’re looking through the data analysis tools for something like Anova: Single Factor Repeated Measures, you won’t find it. The tool you’re looking for is there, but it’s hiding out under a different name.

Figure 12-7 shows this tool’s dialog box along with the data for the preceding weight-loss example.

image

FIGURE 12-7: The Anova: Two Factor Without Replication data analysis tool dialog box.

The steps for using this tool are:

  1. Type the data for each sample into a separate data array. Put the label for each person in a data array.

    For this example, the labels for Person are in column B. The data in the Before sample are in column C, the data in the 1 Month sample are in column D, the data for the 2 Month sample are in column E, and the data for the 3 Month sample are in column F.

  2. Select DATA | Data Analysis to open the Data Analysis dialog box.
  3. In the Data Analysis dialog box, scroll down the Analysis Tools list and select Anova: Two Factor Without Replication.
  4. Click OK to open the select Anova: Two Factor Without Replication dialog box.

    This is the dialog box shown in Figure 12-7.

  5. In the Input Range box, type the cell range that holds all the data.

    For the example, the data are in $B$2:$F$12. Note the $ signs for absolute referencing. Note also — and this is important — the Person column is part of the data.

  6. If the cell ranges include column headings, select the Labels option.

    I included the headings in the ranges, so I selected the box.

  7. The Alpha box has 0.05 as a default. Change that value if you want a different α.
  8. In the Output Options, select a radio button to indicate where you want the results.

    I selected New Worksheet Ply to put the results on a new page in the worksheet.

  9. Click OK.

    Because I selected New Worksheet Ply, a newly created page opens with the results.

Figure 12-8 shows the tool’s output, after I expand the columns. The output features two tables: SUMMARY and ANOVA.

image

FIGURE 12-8: Output from the Anova: Two Factor Without Replication data analysis tool.

The SUMMARY table is in two parts. The first part provides summary statistics for the rows. The second part provides summary statistics for the columns. Summary statistics include the number of scores in each row and in each column along with the sums, means, and variances.

The ANOVA table presents the Sums of Squares, df, Mean Squares, F, P-values, and critical F-ratios for the indicated df. The table features two values for F. One F is for the rows, and the other is for the columns. The P-value is the proportion of area that the F cuts off in the upper tail of the F-distribution. If this value is less than .05, reject H0.

Although the ANOVA table includes an F for the rows, this doesn’t concern you in this case, because H0 is only about the columns in the data. Each row represents the data for one person. A high F just implies that people are different from one another, and that’s not news.

Analyzing trend

Excel’s Anova: Two-Factor Without Replication tool does not provide a way for performing a trend analysis. As with the planned comparisons, a little ingenuity takes you a long way. The Excel worksheet functions SUMPRODUCT and SUMSQ help with the calculations.

The worksheet page with the ANOVA output gives the information you need to get started. In this section, I take you through the analysis of linear trend.

I start by putting the comparison coefficients for linear trend into J15 through J18, as shown in Figure 12-9.

image

FIGURE 12-9: Carrying out a trend analysis.

In J22 through J24, I put information related to SSLinear — the numerator, the denominator, and the value of the Sum of Squares. I use separate cells for the numerator and denominator to simplify the formulas.

As I point out earlier, SUMPRODUCT takes arrays of cells, multiplies the numbers in the corresponding cells, and sums the products. (This function is on the Math & Trig menu, not the Statistical Functions menu.) I used SUMPRODUCT to multiply each coefficient by each sample mean and then add the products. I stored that result in J22. That’s the numerator for the SSLinear. I selected J22 so that you could see its formula in the Formula bar:

=B15*SUMPRODUCT(J15:J18,D15:D18)^2

The value in B15 is the number in each column. The array J15:J18 holds the comparison coefficients, and D15:D18 holds the column means.

J23 holds the denominator. Its formula is

=SUMSQ(J15:J18)

SUMSQ (another function on the Math & Trig Functions menu) squares the coefficients in J15:J18 and adds them.

J24 holds the value for SSLinear. That’s J22 divided by J23.

Figure 12-9 also shows that in the ANOVA table I’ve inserted two rows above the row for Error. One row holds the SS, df, MS, F, P-Value, and critical F for Linear, and the other holds these values for Nonlinear. SSNonlinear in B26 is B24-B25.

The F for Linear is D25 divided by D27. The formula for the P-Value in F25 is

=F.DIST.RT(E25,C25,C27)

The first argument, E25, is the F. The second and third arguments are the df.

The formula for the critical F in F25 is

=F.INV.RT(0.05,C25,C27)

The first argument is α, and the second and third are the df.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset