Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

7
Other Experimental Designs

Introduction

Over the years, the need to develop experimental designs that efficiently address important issues in the face of various constraints on experimental materials, protocols, and costs has led to the creation of new experimental designs. We have seen some examples:

The shoe research manager needed to compare three tread designs in a situation where it was advantageous (for reasons of statistical precision) to use blocks (boys) that had only two experimental units (feet) in each block. Balanced incomplete block designs, developed for agricultural needs, provided a clever and efficient way to do that.
Industrial processes posed new problems that agriculturally motivated designs did not address adequately. Industrial processes often involve a large number of process variables, in which case running an experiment with a full factorial set of treatment combinations is often prohibitive because of cost or time requirements. This constraint led to experiments with all factors at only two or three levels. Then, when there are enough factors that full factorial treatment combinations still become excessive, research led to the selection of cleverly selected fractions of the set of all possible factor combinations. There is no free lunch, though. We usually pay a cost for fractionalization in that the design then cannot detect interactions that might be important. If subject-matter context supports assuming certain interactions are negligible, this cost is often acceptable. If the experiment is a “screening” experiment, aimed at finding the most important factor effects as a preliminary to deeper experimentation with a subset of potential factors, this sort of staging can be an effective strategy. These sorts of trade-offs and decisions are made based on subject-matter knowledge and theory and helped by theory and analyses pertaining to statistical precision and the efficiency of candidate designs. It is not easy to make these decisions. They are not readily reduced to an algorithm.

Thus far in this text, we have considered two basic families of experimental designs: the completely randomized design and the randomized block design. In these designs, treatments are either assigned to a single set of experimental units completely at random, or they are randomly assigned to experimental units within each of multiple blocks of experimental units. Variations within these two families of designs have had to do with treatment or block structure. Both blocks and treatments can have multifactor structures, and both block and treatment factors can be either quantitative or qualitative. These aspects of the design affect the analysis, but do not change the basic structure of the design—the way treatments are assigned to experimental units. As broad as these two families are, extensions or modifications are still called for in many contexts. In this chapter, we consider some additional designs that modify or extend the basic CRD and RCB designs in various ways. Designs that depart from these basic structures are also discussed.

Latin Square Design

Example: Gasoline additives and car emissions

Reducing the emissions from automobiles is an area of ongoing widespread research. Federal and state governments set progressively lower limits, and regulators evaluate automobiles and fuels to determine whether limits are met or adequate progress is being made. Researchers therefore search for effective and cost-effective methods to reduce emissions. In Chapter 5 we saw an experiment pertaining to the effects of the ethanol/gasoline/air mix on CO emissions from automobiles. We now consider another emissions reduction example from Box, Hunter, and Hunter (2005). The following story is mine.

Suppose that a chemical manufacturer has developed three candidate gasoline additives. The director of research wants to do some testing to compare the additives. Preliminary tests on the company’s test engine have shown some apparent differences among the additives, but the director now wants a more stringent, realistic test. He wants to test these additives under the sorts of conditions that real drivers driving real cars in real traffic impose. He hires a testing lab to design and conduct an experiment.

After some fruitful discussion, the test lab rep says, “How about this: You provide us with four cars and we will instrument them”. You choose the cars to cover a reasonable variety of makes and sizes. They will be instrumented to capture and measure the emissions produced by a drive of whatever specified duration and driving conditions we decide on. I will employ just one driver in order to eliminate driver-to-driver variability and hold down costs. Driving each of the four cars, he will make one run using gasoline treated with each of the three additives plus a control run with no additive, in a random order with suitable purging of the fuel tank, fuel lines, carburetor, and exhaust pipes and chambers between runs. Thus, a total of 16 test drives. Showing off, he adds: “This is a randomized block experimental design with four blocks (the cars) and four treatments (the additives, counting the control condition of ‘no additive’ as one of the additives).”

The director replies, “I think drivers can make a difference. Two drivers driving the same car are apt to produce different amounts of emissions (from the engines, mind you) just because of driving style. I think we need more drivers in the experiment.”

The lab rep says, “OK, I’ll hire four different drivers. I’ll (randomly) assign each driver to one of the cars for the duration of the test. Each of the four car/driver combinations will make single runs using each of the four additives, in a random order, as before. Still a total of 16 test drives.”

DIRECTOR:

“The problem with that, I think, is that if we see an apparent difference among car/driver pairs, we won’t know whether it’s the car or the driver. I don’t want one carmaker looking bad in this test just because its driver happened to be lead-footed.”

LAB REP:

“OK, then. Suppose we run tests with all 16 combinations of four drivers and four cars. For each of those combinations we’ll make the four runs with the four additives, as before. Now, our experiment is a randomized block with 16 blocks, structured as a 4 × 4 factorial combination of cars and drivers, and four runs in each block, one with each additive. (He sketches a table showing this design.) This makes for a total of 64 test runs. That’s going to cost you.” And he gives a figure.

DIRECTOR:

“Wow! That’s too much.” He then one-ups the lab rep by asking, “Say, have you ever heard of a Latin square design?” (He heard about that design from his neighbor who happens to be a FLS.) “That might be appropriate here.”

LAB REP:

“I’ll look into it.”

A week later, the lab rep reports back. “That Latin thingy looks like a possible solution. Here’s a table (Table 7.1) that shows the combinations of cars, drivers, and additives we will run.” The experiment still has a total of 16 runs.

Table 7.1 4 × 4 Latin Square Design for Additive Experiment. ^a

*Driver*	*Car*
	A	B	C	D
I	A1	A2	A4	A3
II	A4	A3	A1	A2
III	A2	A4	A3	A1
IV	A3	A1	A2	A4

^{^a} Cell entries denote the additive assigned to a given car/driver combination.

“This table has a special characteristic: Each additive appears once in each row and each column. That is, each additive is run once with each car and each driver. Thus, when you compare the Additive means, each mean is calculated over runs that include all four cars and all four drivers. This balance means that we will get fairly clean comparisons of the effects of the four additives: the car and driver effects will cancel out. My own FLS, who happens to be my sister-in-law, says that the assumption we’re making is that the differences (in emissions) among additives should be consistent across cars and drivers and car/driver combinations. Similarly, we’re assuming that the difference between drivers should be consistent across cars; lead-footed in one car, lead-footed in the others.”

DIRECTOR:

“Seems reasonable to me. Let’s roll.”

They then work out the protocol. Each test drive is to follow a specified course of 150 miles, with a mix of urban and rural driving, at the posted speed limits, conditions permitting. Each driver’s four runs will be done in a random order. The engines, fuel tanks, and exhaust systems will be carefully purged between runs. At the end of each run, the cumulative emissions data will be collected, and perhaps, other variables such as elapsed time, fuel consumption, and ambient temperature will be collected and entered into an Excel spreadsheet. The test managers will also check the odometers to be sure the drive covered 150 miles.

Incidentally, as is often the case, there is a measurement issue in this experiment. You could measure emissions per mile driven or emissions per gallon of fuel consumed (or both). The former measures the combined effect of engine design and additive. The latter measures additive effect more directly.

The experiment was conducted without a hitch, and the emissions data (in coded units) are given in Table 7.2.

Table 7.2 Results of Car Emission Latin Square Experiment. ^a

Source: Reproduced from BHH (2005, Table 4.8, p. 157), with permission of John Wiley & Sons.

*Driver*	*Car*
	A	B	C	D
I	A1	CL	A2	A3
I	19	24	23	26
II	A2	A3	A1	CL
II	23	24	19	30
III	CL	A2	A3	A1
III	15	14	16	16
IV	A3	A1	CL	A2
IV	19	18	19	16

^{^a} Emissions as a function of car, driver, and additive.

Details

Randomization in a Latin square design is done by starting with a basic Latin square and then randomly assigning the factor levels to the symbols. In this experiment, when the additives were randomly assigned to symbols, the treatment of no additives was labeled A2 in Table 7.1. This treatment is denoted by CL, for control, in Table 7.2. Also, the random assignment of the three additives to symbols resulted in A4 being assigned the A2 additive; A1 and A3 actually were assigned the A1 and A3 symbols. The additive labels in each cell of Table 7.2 are the chemical company’s additive names, not the generic names in Table 7.1. I point all this out in case any alert reader wonders why the A2s in Table 7.1 are not in the same boxes as they are in Table 7.2.

(All this detail is part of my story creation. The original example in BHH did not identify one of the additive levels as the control condition (no additive), but that seems to me the right thing to do in an experiment like this. While we want to know whether any additive enables car manufacturers to meet regulatory limits, we also want to know how much reduction it provides compared to adding nothing. That knowledge could help define further research.)

Let’s look at the Latin square design in the contexts we have seen in previous chapters. First, note that the Latin square design could be called a very incomplete block design. In this example, we have 16 blocks, the car/driver combinations, with only one experimental unit (a 150 mile prescribed drive) per block. It’s going to be difficult (impossible) to measure the variability among experimental units or variability among treatment differences within a block in this situation (as with the boys’ shoes). All of our information about additive differences comes from interblock comparisons; there is no intrablock information.

In the Latin square, each additive is assigned to four blocks but, systematically, not at random. That is why the Latin square design is not a special case of the randomized block design. The assignment is constrained to achieve the balance discussed earlier. Randomization enters the design as described previously in this section by randomly assigning the A, B, C, and D labels to the four cars in the experiment and by similarly randomly assigning the labels for drivers and additives. Also, when order might be a concern, as here, the 16 runs should be done in a random order. If they don’t have a schedule laid out, and a test manager to assure the schedule is followed, test personnel might be tempted to run a more convenient order. For example, all four drivers might do their A1-additive runs on Monday, their A2-additive runs on Tuesday, etc. That reduces the chance of putting the wrong additive in a car on any given day. But, it’s possible that there could be a learning curve or a boredom trend at work in this test, so a finding of apparent differences among additives could be really the effect of learning or boredom in repeatedly driving the 150 mile course.

Second, note that the Latin square is a fractional factorial arrangement of blocks and treatments. The emissions experiment has three four-level factors (two block factors, one treatment factor). One replication of the full set of factorial combinations (the alternative design that was rejected as too expensive) would have 4 × 4 × 4 = 64 runs. The 4 × 4 Latin square design specifies a particular 1/4 fraction of those 64 runs.

Analysis 1: Plot the data

I have repeatedly said that initial data plots should show all the dimensions of the data, if possible. The fractional nature of this experiment makes it impossible to produce a meaningful data display showing all four dimensions of the Latin square design: cars, drivers, additives, and emissions. For example, if you plot emissions versus additive, by car, the four points for a given car differ not only by additive but also by driver, so you cannot graphically isolate the effects of car, driver, and additive. This is the graphical manifestation of the fact, as will be seen, that by the fractional nature of this design, one cannot evaluate interactions in the ANOVA.

Under the assumption of no interaction, we can (and in this case, have to) go straight to main-effect plots, which are given in Figure 7.1. Each point in Figure 7.1 is the average for a given factor level, averaged over the four runs in the experiment that were done at that factor level. For example, the four runs, for additive A1, included all four cars and all four drivers as specified in the Latin square design table. All three plots have the same vertical axis for ease of comparison.

c7-fig-0001 — **Figure 7.1** Average Emissions by Car, Driver, and Additive.

Figure 7.1 shows, by the greater spread among the four drivers, that drivers apparently have more of an effect on car emissions than do cars (which might be an indication that emissions were measured on a per-gallon-consumed basis, not a per-mile-driven basis) or additives. Additive was a qualitative treatment factor, as far as we know, so the apparent linear trend for additives in the figure is not meaningful. It is just happenstance in the random assignment of labels to additives. Now, if we found out that A1–A3 were three decreasing concentration levels of one additive (and CL was zero additive), then additive is in fact a quantitative factor, and we would want to plot the emission averages versus concentration to see if increasing concentrations resulted in reduced emissions. What the additive main-plot display shows is that all three additives resulted in lower emissions than the control treatment of no additive. That’s an encouraging sign. We will soon see if that apparent difference is “real or random.”

First, though, if there are no, or only minor, differences among cars, then it makes sense to plot the data in a way that ignores cars. One of the nice features of the Latin square (and other well-chosen fractional factorials) is that if one of the three factors in the experiment turns out to have a negligible effect, the design collapses to a balanced two-factor design. By ignoring cars, we’re left with a 4 × 4 arrangement of drivers (blocks) and additives (treatments) in what is a randomized block design with one replication of each treatment in each block. Thus, we display the data in an interaction plot. Figure 7.2 provides that interaction plot of emissions versus driver.

c7-fig-0002 — **Figure 7.2** Interaction Plot of Emissions Data by Additive and Driver.

Figure 7.2 shows an intriguing pattern. The lower-left panel shows that for drivers I and II (the red and black plotting symbols), there was a substantial difference in emissions for the four additives, while for drivers III and IV (green and blue), there was not. This looks like a classic case of interaction, but with no replication, we have to treat these inconsistencies as random variation. Further experimentation is necessary to resolve the issue of real versus random interaction. Based on these data, I would like to have an off-the-record chat with these drivers to see if they deviated from the assigned drive in any way. If a driver drove over the speed limit and then took a half-hour break midway through the run for lunch or other refreshment, that might affect his car’s emissions. Not accusin,’ just sayin.’

From the graphical evidence, we might choose A1 as the winning additive: it had the lowest emissions for two drivers and not so bad for the other two drivers. On the other hand, if we could get everybody to drive like drivers III and IV (the green and blue symbols), we might be able to get away with using no additive. Obviously, though, we need a lot more data before making any such decisions that have nationwide ramifications. Some possible follow-on experiments are discussed in a later section.

ANOVA

The ANOVA for this Latin square experiment can separate out the variation associated with the main effects of car, driver, and additive. No interactions can be evaluated. The MSs for all three ANOVA entries are calculated from the variances of the four means in each of the panels in Figure 7.1. The ANOVA for the emissions data (Table 7.3) shows that only the differences among drivers stand out appreciably above the residual error variability. This result is consistent with the visual impression in Figure 7.1. The aforementioned data plots have shown us the inconsistencies that contribute to this error variability.

Table 7.3 ANOVA for Emissions Experiment.

*Source*	DF	SS	MS	F	P
Car	3	24	8.0	1.5	.31
Driver	3	216	72.0	13.5	.004
Additive	3	40	13.3	2.5	.16
Error	6	32	5.3
Total	15	312

Just as we simplified plots of the data by ignoring cars, we can simplify the ANOVA by dropping the car source of variability (which means merging these three df and corresponding SS with the error line in the ANOVA). (Mathematically, we’re taking the car effect out of the statistical model underlying the analysis.) The result, in Table 7.4, doesn’t change our conclusions: substantial differences among drivers, some evidence of differences on average among additives, but no way to test for interaction. We need a bigger and better experiment to decide if we need better additives or better drivers in order to reduce automobile emissions.

Table 7.4 Reduced ANOVA of Emissions Data.

*Source*	DF	SS	MS	F	P
Driver	3	216	72.0	11.6	.002
Additive	3	40	13.3	2.14	.17
Error	9	56	6.2
Total	15	312

Discussion

This example illustrates a larger truth: findings in a tightly controlled laboratory experiment may not carry over to a much noisier environment, especially one involving people who have a myriad of unpredictable ways to use and abuse the scientist’s or engineer’s creations. Variability happens! Anybody who makes or sells consumer products knows this. We saw this phenomenon manifested in the case study in Chapter 1 and in the textile production example in Chapter 6. Machines were consistent; human operators were not (because of human nature, not deliberate actions). One thing though, the results of this experiment validated the research director’s aversion to running the experiment with one driver only. It also validated his recognition that the additives needed to be evaluated in realistic driving situations, not just in the lab. His company would like eventually to market their additive to millions of drivers. Betting the company’s bottom line on only lab data, or on one driver’s data, is not an acceptable risk. Inadequate market research has torpedoed more than one product. Remember New Coke? Remember the Edsel? (Well, probably not.)

Follow-on experiments

There are various ways to extend this emissions experiment. These include:

Repeat the exact same Latin square experiment: same drivers, cars, and additives, same test and measurement protocols. This would provide a direct measure of repeatability—variability.
Run the same Latin square again but with four different drivers.
Run a different Latin square, by rerandomizing the assignments of factor levels, with the same drivers, cars, and additives. Choose the second Latin square so that the combined set of 32 runs will help one separate out some of the interactions. (The combinatorial problems that can be worked in this situation are beyond the scope of this text.)
Run a different Latin square with four new drivers, same four cars and additives.
Run the 48 runs needed to complete the full 4³ combinations of drivers, cars, and additives. The design can be analyzed as a complete three-way classification. Note, though, that this two-stage design (and randomizations) is not the same as a RBD with 16 blocks, four treatments assigned randomly to four experimental units in each block. The analysis would need to reflect the staging of the experiment which, in effect, is an additional blocking factor.

Here, the additive manufacturer is considering augmenting the first Latin square with another Latin square to help resolve some of the questions raised by the first experiment. The lapse in time between the two experiments could introduce some new sources of variation. The drivers might say, “You mean you want me to take four more drives over the same course? Boring.” (The test director might want an observer to ride with each driver to assure protocol is followed and to collect ancillary data—for example, traffic conditions.) In hindsight, it might have been good to consider the alternative of a replicated Latin square design (total of 32 runs) which would fall between the 16-run Latin square and the 64-run randomized block.

Some of these replicated Latin square designs are used in the context of “repeated measures” designs discussed later in this chapter. In these designs, individual experimental units are measured repeatedly. They may also have different treatments applied to them sequentially.

Exercise

Write out the ANOVA tables, Source, and df, for alternative designs 1, 2, and 5 in the above list.

Extensions

The emission example had two blocking factors and one treatment factor. Other experimental situations may have one blocking factor and two treatment factors. This turns the design into an incomplete block design. For the 4 × 4 Latin square, we would have four blocks of four experimental units and 16 treatments. The Latin square layout would define which four treatment combinations are assigned to each block, and those four combinations would be randomly assigned to the four experimental units (runs, in this case) in a block.

Latin squares of any size can be constructed. Cochran and Cox (1957) catalog some designs up to 12 × 12. Not all physical situations lend themselves to having three factors with the same number of levels. However, there are some tricks to pull. For example, in the emissions situation, another candidate design could have been a 5 × 5 Latin square with five cars and five drivers. The treatments could have been the three additives and two control runs.

In the basic Latin square design, the three factors are generically called rows, columns, and treatments. It is possible that any of these three generic factors could be factorial combinations of other factors. For example, a 6 × 6 Latin square might be run in which the six treatments were the six combinations of a three-level and a two-level factor. Then, the 5 df for treatments in the ANOVA could be separated into, say, factor F1 with 2 df, factor F2 with 1 df, and F1 × F2 interaction with 2 df.

Another extension is to add a fourth factor. If the Latin square is of at least dimension 3 × 3, this can be done in a balanced way. This four-factor design is called—are you ready?—a Graeco-Latin square. Table 7.5 gives a 4 × 4 Graeco-Latin square design.

Table 7.5 4 × 4 Graeco-Latin Design.

*Row*	*Column*
	A1	B3	C4	D2
	B2	A4	D3	C1
	C3	D1	A2	B4
	D4	C2	B1	A3

The alphanumeric characters in the table indicate the combinations of treatment factors to be run in each row/column combination. For example, the upper left entry of A1 means that, say, when driver 1 uses car 1, the treatment combination will be factor1 at the A level and factor2 at the “1” level. Note that each number occurs once in each row and column and once with each letter. This balance means that in the ANOVA, we can separate the SS into rows, columns, Factor1, and Factor2, each with three df. This leaves 3 df for error. Under the (strong) assumption of no interactions among the blocking and treatment factors, this experiment provides clean estimates of the effects of all four factors.

In fractional factorial terminology, the 4 × 4 Graeco-Latin square is a 1/16th fraction of a 4⁴ set of factor combinations.

In the Table 7.5 design, the second treatment factor could be the order of the runs. In the preceding, the Latin square experiment was done by randomizing the order each driver did his four runs. Alternatively, we could balance the run orders using this Graeco-Latin square. Thus, driver 1’s first run would be car (column) 1 using additive A. Her second run would be car 4 using additive D, then car 2 with B, and then car 3 with C.

Latin square and Graeco-Latin square designs offer a chance to evaluate the effects of three or four factors with a minimum of runs. They yield clean estimates of the effects of these factors only if there are no interactions among any of them, as is the case with fractional factorial arrangements in any situation. However, as is stated in BHH (2005, p. 160), to use a Latin square design to study process factors known to interact is an “abuse” of the design. Subject-matter knowledge is required to avoid this abuse and the resulting misleading conclusions.

Split-Unit Designs

Consider again the commercial-scale tomato fertilizer experiments discussed in Chapter 3. Suppose it was decided to run the experiment with 300 plants treated with Fertilizer A and 300 plants treated with Fertilizer C. Rather than randomly assign fertilizers to individual plants, suppose it was decided that the experimental unit would be a plot with 30 plants (perhaps a 6 × 5 grid, perhaps a row). Fertilizers would be applied simultaneously to these groups of contiguous plants, which is much more convenient than applying fertilizer one plant at a time. Now, the experiment would have a total of 20 experimental units (consisting of 30 contiguous plants), and each fertilizer would be randomly assigned to 10 of these eus.

Suppose that the experimenter decides that the amount of fertilizer is another important factor. If the experiment is done with just one level of fertilizer and we see a difference between Fertilizers A and C, could it be that the same difference would occur if we had used either a higher or lower amount of fertilizer? Also, will applying more fertilizer lead to more tomatoes? If so, will increased yield offset the increased expense? I don’t want to wait until the next crop and run experiments at a different level of fertilizer. Can we vary the amount of fertilizer in the current design? In particular, could we consider three levels of fertilizer, say, low, medium, and high? Inquiring (scientific) minds want to know. The tomato mogul asks his FLS, “How can we include fertilizer amount in the experiment?”

The experiment now has six treatments: the six combinations of two fertilizers each at three levels. The FLS comes up with two experimental designs.

Let’s suppose we start over and redefine the experimental unit structure as groups of 10 plants. Our layout would then have 60 eus. With each fertilizer at three amounts, this makes a total of six treatments. We would then randomly assign each of these six fertilizer/amount combinations to 10 eus each. This would be a completely randomized experiment with six treatments in a 2 × 3 structure with 10 replicates (of groups of 10 plants) of each.

On the other hand, suppose we split each eu of 30 plants that have already been randomly assigned a fertilizer into three subunits of 10 contiguous plants each. Within each experimental unit, then, we would randomly assign the three fertilizer levels to one subunit each. This way, we could measure the effect of fertilizer level within each experimental unit (contiguous group of 30 plants). The variability among the three subunits within an experimental unit should be less than the variability among the experimental units over the whole field. Thus, we ought to be able to estimate the effect of fertilizer level more precisely than with the completely randomized experiment.

In this second experimental design, the design at the subunit level is a randomized block design. Each “main-plot” unit (30 plants) is a block of three “subplot” units, randomly assigned the three levels of the fertilizer assigned to the main plot. We started with a completely randomized design for the fertilizer factor and then, in essence, embedded a randomized block design for the level factor. This gives us an experiment with two sizes of experimental units and with two levels of randomization. Figure 7.3 illustrates the experimental structure and randomization. It shows a subset of the main units and their fertilizer assignment; then within each main unit, the three subunits are shown with their randomly assigned levels of fertilizer.

c7-fig-0003 — **Figure 7.3** Schematic Illustrating Split-Unit Design for Tomato Fertilizer Experiment. The design has 20 main-plot units, with each fertilizer randomly assigned to 10 units. Each main-plot unit is divided into three subplots, and the three fertilizer levels are randomly assigned to one subplot unit each.

This hybrid design is called a split-unit or split-plot design. The latter term reflects the design’s origins in agricultural experiments such as the tomato fertilizer experiment, where plots of land are split into subplots. Split unit is a more general term: one experimental unit is split into subunits to which subsequent treatments are applied. Manufacturing processes often involve a number of sequential steps, and such situations make split-unit experimental designs feasible and practical. At selected steps, the material being processed can be split into subbatches for the application of factors involved in the next step.

Split-unit experiments can be difficult to recognize. A spreadsheet of data can look like any of a variety of multifactor experiments. It can take a lot of detective work to find out what factors, if any, are blocking factors and which are treatment factors and, critically, what were the experimental units for the application of the treatment factors. You have to understand the experimental protocol before you can make sense of the data and extract any message hidden in that cloud (see front cover). The following example is a case of aggregated units, rather than split units. One treatment factor is applied to individual units; then subgroups of units are aggregated for the application of the second treatment factor. So, the experiment has treatment factors that are applied to different sizes of experimental units via aggregation, rather than splitting. Recognizing this is key to running the correct ANOVA.

Example: Corrosion Resistance

I again turn to the classic Box, Hunter, and Hunter (2005) text for an example. As always, the following story is my own embellishment.

A team of chemists in a chemical products company is charged with developing coating materials that will prevent corrosion on structures such as buildings and bridges made of steel. Protection from corrosion is important to assure structural integrity and to minimize the amount of inspection and maintenance that is required.

Coating is applied to steel parts by spraying the coating on a part and then curing (baking) the part at a specified temperature for a specified time. At this stage of their investigation, the chemists have selected four coating materials to compare. They have settled on what the curing time should be, but they want their experiment to include curing temperature as an experimental variable to be investigated. The reason for this objective is that an ideal coating material should provide good corrosion prevention over a range of curing temperatures. This is an important characteristic because steel companies and other customers for their coatings will not necessarily be able to control curing temperature very precisely and consistently. A coating that is effective when cured at 370° (F), say, but ineffective if cured at 360° or 380° is not a good product, particularly if the user’s furnace cannot control temperature that precisely. The terminology sometimes used is to say that such a coating is not “robust.” Robust products mean better structures, happier customers, and more sales. That’s the ultimate goal. More discussion of robust designs is given later in this chapter.

Developing and improving chemical products is a statistics-rich environment. Efficient experimentation is essential to success. Thus, large, successful chemical companies have in-house departments of friendly, local statisticians. Smaller companies often hire university or private statistical consultants, which is the case here.

Two chemists meet with a statistician from the local university. They describe the coating and curing processes and tell the statistician that they have 24 steel bars to use in the experiment. They say that after a bar is coated and cured, it is subjected to a corrosive environment, and then corrosion resistance is measured by standard techniques.

After some discussion, it is decided to experiment at three temperatures, 360, 370, and 380°F, for reasons discussed earlier. The statistician suggests that they run all 12 treatment combinations, four materials at three temperatures, on two bars each. The treatment combinations will be randomly assigned to bars and run in a random order. Thus, the design would be a completely randomized experiment with 12 treatments applied independently to two randomly selected experimental units each. The FLS even goes so far as to lay out the random treatment assignment and a random order of experimentation. He also stresses the importance of true replication: between each run, the coating and curing processes must be shut down and restarted in order to really have multiple independent applications of treatments to experimental units.

The chemists frown. “It takes a lot of time to shut down the temperature chamber and restart it. Besides, our chamber will hold more than a single bar. If we cure several bars at a time, we’ll save a lot of time and expense.” They suggest putting two bars with each coating in the chamber for a given temperature setting. Thus, they could do eight bars at 360°F, say, as one “heat,” and another eight at 370°F and the remaining eight at 380 F. Coatings would be randomly assigned to bars within each heat group. That way, the experiment could be done in three heats, rather than 24. “Neat, huh?”

“I’m sorry,” says the FLS (searching for a polite way to say that this is a bad idea). “With only one heat at each of the three temperatures, your experiment won’t have any replication of the curing-temperature treatment. We won’t be able to estimate the inherent variability of the process and won’t be able to tell whether any apparent differences among temperatures are real or random.”

“I have an idea, though,” he says. “How about if we do heats of four bars—one with each coating material. This will mean a total of six heats, two at each temperature, thus (minimally) replicating the temperature treatment. We should randomly assign temperatures to groups of four bars and randomly order these six heats. We also should randomize the oven positions for the four bars in each heat. How about this as a workable compromise?”

“Well, OK,” say the chemists. “But could we run the heats in this order: 360, 370, 380, 380, 370, 360? Ramping up and then ramping back down will save us time.”

The FLS is not too happy with this plan, but he doesn’t want to push too hard. He asks, “In your experience is there any carry-over effect, or hysteresis, when you do this?” “No, don’t think so,” respond the chemists. So, that is the way the experiment is run.

Note that this experiment has treatments applied to two different experimental units. It is not a completely randomized design. Coating materials are assigned to single bars with six bars being randomly assigned to each of the four coatings. Thus, for the coating treatment, the design is a CRD, and a single bar is the experimental unit that gets the randomly assigned coating material. However, the curing-temperature treatment is applied to groups of four bars; thus, the group of four bars is the experimental unit for the temperature treatment. We therefore have a split-unit design, in reverse. We got this experimental structure by aggregating subunit experimental units (the coated bars), rather than splitting main units, as in the aforementioned agricultural example. The result is the same, though: different experimental units for different treatment factors in the same experiment. Our analysis will have to reflect that experimental structure.

Table 7.6 Corrosion-Resistance Experiment Data.

Reproduced from BHH (2005, Table 9.1, p. 336), with permission of John Wiley and Sons.

*Temp. (°F)*	*Heat*	*Coating*
		1	2	3	4
360	1	67	73	83	89
	6	33	8	46	54
370	2	65	91	87	86
	5	140	142	121	150
380	3	155	127	147	212
	4	108	100	90	153