Chapter 11

ANOVA (Analysis of Variance)

Abstract

Analysis of variance of is a technique for testing hypothesis. It is also a very useful tool for obtaining estimates of various parameters, or functions of the parameters involved in the two models. In this chapter we introduce one-way ANOVA and two-way ANOVA. The basic principle underlying the technique is that the total variation in the dependent variable is broken into one which is attributed to some specific causes is known as the variation between the samples and the one which attributed to chance is called the variations within the samples.

Keywords

Assumptions; one-way ANOVA; two-way ANOVA

11.1 Introduction

In this chapter we briefly introduce Analysis of variance (ANOVA) which is statistical test used to determine if more than two population means are equal. The test uses the F-distribution (probability distribution) function and information about the variances of each population (within) and grouping of populations (between) to help decide if variability between and within each population is significantly different. It is based on the comparison of the average value of a common component. The method of ANOVA tests the hypotheses that:

H0=μ1=μ2==μkH1=Not all the means are equal

image

The purpose of ANOVA test statistic is

1. To see if more than two population means are equal.

2. To know the difference between the within-sample estimate of the variance and the between the sample estimate of the variance and how to calculate them. The within-sample variance is often called the unexplained variation.

The between-sample variance or error is the average of the square variations of each population mean from the mean or all the data and is an estimate of σ2 only if the null hypothesis, H0 is true.

When the null hypothesis is false this variance is relatively large and by comparing it with the within-sample variance we can tell statistically whether H0 is true or not.

The ANOVA technique helps in performing the test in one go and, therefore, is considered to be important technique of analysis. The basic principle underlying the technique is that the total variation in the dependent variable is broken into two parts—one which can be attributed to some specific causes and the other that may be attributed to chance. The one which is attributed to the specific causes is known as the variation between the samples and the one which is attributed to chance is called the variation within the samples. In ANOVA it is assumed that each of the samples drawn from a normal population and each of these populations has an equal variance.

The measure of variability used in the analysis of variance is called a “mean square,” i.e.,

Meansquare=SumofsquareddeviationsfrommeanDegreesoffreedom

image

The ANOVA table stands for Analysis of Variance, which is used as a statistical technique or test in detecting the difference in population means or whether or not the means of different groups are all equal when there are more than two populations.

11.2 Assumptions

1. The samples are independently drawn.

2. The populations are normally distributed with common variance.

3. They occur at random and independent of each other in the groups.

4. The effects of various components are additive.

5. Variances of populations are equal.

11.3 One-Way ANOVA

The one-way ANOVA is also called a single-factor analysis of variance because there is only one independent variable or factor. The independent population means are equal.

The one-way ANOVA compares the means of the samples or groups in order to make inferences about the population means of the independent variables are equal. It involves single independent variable.

When using one-way analysis of variance, the process of looking up the resulting value of F in an F-distribution table, is proven to be reliable under the following assumptions:

• The values in each of the groups (as a whole) follow the normal curve.

• With possibly different population averages (though the null hypothesis is that all of the group averages are equal) and equal population standard deviations.

The assumption that the groups follow the normal curve is the usual one made in most significance tests, though here it is somewhat stronger in that it is applied to several groups at once. Of course many distributions do not follow the normal curve, so here is one reason that ANOVA may give incorrect results. It would be wise to consider whether it is reasonable to believe that the groups’ distributions follow the normal curve.

Of course the different population averages imposes no restriction on the use of ANOVA; the null hypothesis, as usual, allows us to do the computations that yield F.

The third assumption, that the populations’ standard deviations are equal, is important in principle, and it can only be approximately checked by using as bootstrap estimates the sample standard deviations. In practice statisticians feel safe in using ANOVA if the largest sample SD is not larger than twice the smallest.

Completely randomized design involves the testing of equality of means of two or more groups. In this design there is only one independent variable and one dependent variable. The dependent variable is metric whereas the independent variable is a categorical variable. We briefly explain the steps involved in one-way ANOVA.

Step 1: Set up null hypothesis (H0) and alternative hypothesis (H1).

Step 2: Find the total T of all observations in all the samples, i.e.,

X1+X2++Xk.

image

Step 3: Find the value of the correction factor T2N,image where

N=n1+n2+nk

image

Step 4: Calculate the sum of squares of deviations SST where

SST=X12+X22++Xk2T2N

image

Step 5: Find the sum of squares of deviations between the samples SSB where

SSB=[X12n1+X22n2++Xk2nk]T2N

image

Step 6: Calculate the mean square deviations within the samples (SSW)where

SSW=SSTSSB

image

Step 7: Find degrees of freedom v1=df1=k−1 (k=number of columns) and degrees of freedom

v2=df2=Nk

image

Step 8: Find the mean square deviation between the samples (MSB), where

MSB=SSBv1

image

and the mean square deviation within the samples (MSW), where

MSW=SSWv2

image

Step 9: In this step calculate the F statistic by applying the formula

F=MSBMSWwhen MSB>MSW

image

and

F=MSWMSBwhen MSW>MSB

image

Step 10: In this Final step, compare calculated value of F with the tabulated value. If the calculated value is less than the tabulated value of F, accept the null hypothesis and if the calculated value is greater than table value of F, reject the null hypothesis.

Table value of F, reject the null hypothesis.

ANOVA Table

Source of variation Sum of scores Degrees of freedom Mean square Statistic—F
Between varieties (column means)

SSB

SSW

df1=v1

df2=v2

MSB

MSW

F=MSBMSWimage

Image

The method is explained with the help of an example given below:

Example 11.1: Three varieties of A, B, C wheat were shown in 4 plots each and the following yields in tonnes per acre were obtained:

A B C
7 3 4
4 5 5
6 5 4
8 7 2

(The table value of F=at 5% level of significance for (2, 9) degrees of freedom is 2.165).

Test the significance of difference between yields of the varieties.

Solution: Null Hypothesis H0: Varieties of wheat are not significantly different from each other in their yielding capacities.

Alternative Hypothesis H1: Varieties of wheat are significantly different from each other in their yielding capacities.

We have the following table:

Sample A Sample B Sample C
X1 X12image X2 X22image X3 X32image
7 49 3 9 3 9
4 16 5 25 5 25
6 36 5 25 4 16
8 64 7 49 2 4

Image

n1=4,n2=4,n3=4,N=n1+n2+n3=4+4+4=12andk=3

image

X1=25,X2=20,X3=15

image

X12=165,X22=108,X32=61

image

T=X1+X2+X3=25+20+15=60

image

T2=(60)2=3600

image

Correctionfactor=T2N=360012=300

image

Totalsumofsquares(SST)=X12+X22+X32T2N=165+108+61300=334300=34

image

Degreesoffreedom=df1=v1=k1=31=2

image

Degreesoffreedom=df2=v2=Nk=123=9

image

Sumofsquaresbetweenthesamples(SSB)=[X12n1+X22n2++Xk2nk]T2N=[(25)24+(20)24++(15)24]300=12504300=312.5300=12.5

image

Sum of squares=within the varieties (SSW)=SSW−SSB=34−12.5=21.5

Mean square between varieties=MSB=SSBv1=12.52=6.25image

Mean square within varieties MSW=SSWv2=21.52=2.39image (approx.)

Test statistic=F=MSBMSW=6.252.39=2.615image

ANOVA Table

Source of variation Sum of scores Degrees of freedom Mean square Statistic—F
Between varieties (column means) SSB=12.5 df1=v1=2 MSB=6.25 F=MSBMSW=2.615image
Within samples (errors) SSW=21.5 df2=v2=9 MSW=2.39  
Total 34.0    

Image

Since the calculated value of F is less than the table value of F for (2, 9) df at 5% level of significance we accept Null hypothesis.

11.3.1 Two-Way ANOVA

A one-way analysis is used to compare the populations for one variable or factor. The one-way ANOVA measures the significant effect of one independent variable. The two-way analysis of variance (ANOVA) test is an extension of the one-way ANOVA test that examines the influence of different categorical independent variables on one dependent variable. It involves two independent variables. The “two-way” comes because each item is classified in two ways, as opposed to one way. The test is useful when we desire to compare the effect of multiple levels of two factors and we have multiple observations at each level.

An important advantage of this design is it is more efficient than its one-way counterpart. There are two assignable sources of variation—age and gender in our example—and this helps to reduce error variation thereby making this design more efficient. Unlike one-way ANOVA it enables us to test the effect of two factors at the same time. The assumptions in both versions remain the same—normality, independence, and equality of variance. One can also test for independence of the factors provided there are more than one observation in each cell. The only restriction is that the number of observations in each cell has to be equal (there is no such restriction in case of one-way ANOVA).

11.3.1.1 Assumptions

The assumptions in both versions remain the same—normality, independence, and equality of variance.

The populations from which the samples were obtained must be normally or approximately normally distributed.

• The samples must be independent.

• The variances of the populations must be equal.

• The groups must have the same sample size.

11.3.1.2 Hypothesis

There are three sets of hypothesis with the two-way ANOVA.

The null hypothesis for each of the sets are given below:

1. The population means of the first factor are equal. This is like the one-way ANOVA for the row factor.

2. The population means of the second factor are equal. This is like the one-way ANOVA for the column factor.

3. There is no interaction between the two factors. This is similar to performing a test for independence with contingency tables.

The two independent variables in a two-way ANOVA are called factors. The idea is that there are two variables, factors, which affect the dependent variable. Each factor will have two or more levels within it, and the degrees of freedom for each factor is one less than the number of levels.

11.4 Working Rule

The following steps are involved in the short-cut method for ANOVA:

Step 1: Set up null hypothesis (H0) and alternative hypothesis (H1)

Step 2: Find the sum of the values of all items of all samples and denote it by T
i.e.,

T=X1+X2++Xk

image

Step 3: Find the value of correction factor=T2Nimage

Step 4: Find the sum of squares of all items of all samples

Step 5: Find the total sum of squares SST, where

SST=X12+X22++Xk2T2N

image

Step 6: Find the total sum of squares between the samples SSC

=[X12n1+X22n2++Xk2nk]T2N

image

Step 7: Find the sum of squares within the samples SSE
where

SSE=SSTSSC

image

Step 8: Set up ANOVA table and calculate F, which is the test statistic.

Source of variation Sum of scores Degrees of freedom Variance ratio F
Between columns SSC c−1 MSC FC=MSCMSEimage
Between rows SSR r−1 MSR
Error SSE (c−1) (r−1) MSE FR=MSRMSEimage

Image

Example 11.2:

Plot of land Yield A Yield B Yield C Yield D
I 3 4 6 6
II 6 4 5 3
III 6 6 4 7

Image

(For (3,9) df F0.05=4.76 and for (2, 6) df F0.05=5.14).

Solution:

Null hypothesis

1. There is no significant difference in the yield of four varieties of wheat.

2. There is no significant difference in the plots of land with regard to yield.

X1 X12image X2image X22image X3 X32image X4 X42image
3 16 4 16 6 36 6 36
6 36 4 16 5 25 3 09
6 36 6 36 4 16 7 49
X1=15image X12=15image X2=14image X22=68image X3=15image X32=77image X4=16image X42=94image

Image

We have

X1=15,X2=14,X3=15,X4=16,

image

n1=3,n2=3.n3=3,n4=3,N=3+3+3+3=12

image

T=X1+X2+X3+X4=15+14+15+16=60

image

Correctionfactor=T2N=(60)212=360012=300

image

Total sum of squares

=SST=X12+X22++Xk2T2N=88+68+77+94300=320300=20

image

Sum of squares between the samples (i.e., between columns) (SSC)

=[X12n1+X22n2++Xk2nk]T2N=[(15)23+(14)23+(15)23+(16)23]300=9023300=300.67300=0.67=303.5300=3.5

image

Sum of squares within the samples (between rows) SSR

=[(19)24+(18)24+(23)24]300

image

Error=SST – (SSC+SSR)=20−(0.67+3.5)=15.83

Source of variation Sum of squares Degrees of freedom (v) Variance ratio F
Between columns (yields) 0.67 v1=c–1=3 MSC=MSCv1=0.223image FC=MSCMSE=0.076image
Between rows 3.50 v2=2 MSR=SSRR1=1.75image FR=MSRMSE=0.597image
Error 15.83 6 MSE=15.836=2.930image  

Image

For (3, 6) df, the calculated value of F is less than the table value of F, hence we accept null hypothesis, i.e., there is no significant difference in the yields of four varieties of wheat.

For (3, 6) df, the calculated value of F is less than the table value of F, hence we accept null hypothesis, i.e., there is no significant difference in the yields of four varieties of wheat.

For (2, 6) df, the calculated value of F is less than the table value of F, hence we accept null hypothesis, i.e., there is no significant difference in the y plots of land with regard to yield.

Exercise 11.1

1. A farmer applied three types of fertilizers on four separate plots. The figure on yields per acre are tabulated below:

Fertilizers/plots Yield Total
A B C D
Nitrogen 6 4 8 6 24
Potash 7 6 6 9 28
Phosphates 8 5 10 9 32
Total 21 15 24 24 84

Image


Find out if the plots are materially different in fertility, as also, if the three fertilizers make any material difference in yields?

2. The varieties of A, B, C wheat were sown in four plots each and the following yields in tonnes per acre were obtained:

A B C
8 7 2
4 5 5
6 5 4
7 3 4


Test the significance of difference between the yield of the varieties.
(Table value of F at 5% level of significance for (2, 9) is (4, 26)).

3. The three samples below have been obtained from normal populations with equal variances. Test the hypothesis at 5% level that the population means are equal.

X Y Z
8 7 12
10 5 9
7 10 13
14 9 12
11 9 14


Test the significance of difference between the yield of the varieties.
(Table value of F at 5% level of significance for (2, 12) is 3.88).

4. The varieties of A, B, C wheat were sown in four plots each and the following yields in tonnes per acre were obtained:

A B C
20 18 25
21 20 28
 17 22
23 17 28
16 25 32
20 15  


Test at 5% level of significance whether the average yields of land under different varieties seed show significance difference.
(Table value of at 5% level for (2, 12) is 3.88).

5. The following data represents the number of units of a product produced by 3 different workers using three different types of machines:

Workers Machine A Machine B Machine C
X 8 32 20
Y 28 36 38
Z 6 28 14

Image

Test (a) whether the mean productivity is the same for the different machine types, and (b) whether the three workers differ with respect to mean productivity. Table value of FCimage at 5% level for (2, 4) df is 6.95 (And the table value of FR=6.95image).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset