Chapter 9

Chi-Square Distribution

Abstract

This chapter discusses about the chi-square distribution. Basically chi-square distribution is the measure of which enables us to find out the degree of degree of discrepancy between the observed and expected frequencies. In this chapter we discuss the uses of chi-square test as a test for goodness of fit, as a test for independence of attributes and homogeneity chi-square.

Keywords

Calculation of expected frequencies; chi-square distribution; mean of chi-square distribution; variance of chi-square distribution; conditions for using chi-square; test for goodness of fit; test for independence of attributes and homogeneity chi-square

9.1 Introduction

In this chapter we introduce chi-square distribution, the measure of which enables us to find out the degree of discrepancy between the observer and expected frequencies and then to determine whether the discrepancy between the observed and expected frequencies is due to error of sampling or due to chance.

The chi-square is denoted by the symbol χ2. It was discovered by Helmert in 1875 and was rediscovered by Karl Pearson in 1900. Chi-square is always positive. The value of χ2 lies between 0 and .

Since χ2 is not derived from the observations in a population, it is not a parameter.

Chi-square test is not a parametric test.

The χ2 distribution is used for testing the goodness of fit. It is used for finding association and relation between attributes. It is also used to test the homogeneity of independent estimates of population. Chi-square is computed on the basis of frequencies in a sample and the value of χ2 so obtained is a statistic.

Chi-Square Test (χ2):

χ2 test is defined as

χ2=(OiEi)2Ei

image

where Oi=Observed frequency; Ei=Expected frequency.

9.2 Contingency Table

A classification table containing r rows and c columns figuring observed frequencies is called contingency table. A 2×2 contingency table is of the form:

ab
cd

If the data is divided into m classes A1, A2, …, Am according to an attribute A and n classes B1, B2, …, Bn according to another attribute B, the m×n contingency table can be formed as follows:

AttributesB1B2BjBn
A1A11A12A1jO1n
A2A21A22A2jO2n
ententent ent ent
AiAi1Ai2AijOin
ententent ent ent
AmAm1Am2AmjOmn

Image

where Oij denote the ith row jth column frequency of the cell belonging to both the classes Ai and Bj.

9.3 Calculation of Expected Frequencies

Consider the 2×2 contingency table (Table 9.1).

Table 9.1

2×2 Contingency table

  Total
a b a+b
c d c+d
Total a+c b+d N=a+b+c+d

where a, b, c, and d are observed frequencies.

The expected frequencies corresponding the cell frequencies a, b, c, and d can be calculated and expressed in the form of the given Table 9.2.

Table 9.2

Expected frequency table for 2×2 contingency table

(a+c)(a+b)Nimage (b+d)(a+b)Nimage
(a+c)(c+d)Nimage (b+d)(c+d)Nimage

Image

This method can be extended to compute the expected frequencies of a m×n contingency table.

9.4 Chi-Square Distribution

Let samples of size n be drawn from a normal population with standard deviation σ and if for each sample we calculate: a sampling distribution of χ2 can be obtained. It is given by

y=f(χ2)=y0e12χ2(χ2)12(ν2)=y0e12χ2(χ)(ν2)

image

where ν=n1image is the number of degrees of freedom and y0 is a constant depending on such that the total area under the curve one. The χ2 distribution corresponding to various values of are shown in the Fig. 9.1.

image
Figure 9.1 Level of significance

Maximum value of y:

We have

y=y0e1/2χ2(χ2)1/2(ν2) (9.1)

image (9.1)

Differentiating Eq. (9.1) with respect to χ2 we get

dydχ2=y0[12(ν2)(χ2)1/2(ν4)e1/2χ212eχ2/2(χ2)1/2(ν2)]

image

Taking dydx=0image for maximum value of y we get

χ2=ν2image for ν2image (since χ2 cannot be negative)

The constant y0 is related in such a way that the area χ2=0 to is unity.

In this case

y0=12ν/2Γ(ν/2)

image

9.4.1 Characteristic Function of χ2 Distribution

We have

φ(χ2(t))=E(eitχ2)=0eitχ2f(χ2)dχ2=12ν/2Γ(ν/2)0e1/2(12it)χ2(χ2)1/2(n1)dχ2=(12it)n/2

image

9.5 Mean and Variance of Chi-Square

The moment generating function of χ2 with respect to the origin is

M0(t)=0etχ212ν/2Γ(ν/2)e12(χ2)(12χ2)12(ν1)d(χ2)=12ν2Γ(ν/2)0exp{(12t2)χ2}(χ2)ν21d(χ2)=12ν2Γ(ν/2).Γ(12ν){12(12t)}ν2=(12t)ν2,(|2t|<1)=1+ν2(2t)+12ν(12ν+1)2!(2t)2++12ν(12ν+1)(12ν+r1)r!(2t)r+

image

μr=coefficienttrr!=2r12ν(12ν+1)(12ν+2)(12ν+r1)=ν(ν+2)(ν+4)(ν+2r2)

image

Hence the mean value of χ2 is v and variance is v (v+2)−v2=2v.

Thus χ2n/2nimage is a standard variate.

9.6 Additive Property of Independent Chi-Square Variate

Theorem 1

If χ12image and χ22image are independent χ2 variates with n1 and n2 degrees of freedom, then χ12+χ22image is a χ2 variate with n1+n2 degrees of freedom.

Proof

The moment generating function of

(χ12+χ22)=(m.g.fofχ12)(m.g.fofχ22)=(12t)n12(12t)n22=(12t)(n1+n2)2

image

which is the moment generating function of χ2 variate with n1+n2 degrees of freedom.

Hence proved.

Theorem 2

The chi-square distribution tends to normal distribution as n tends to infinity.

Proof

We have

M(t)=ent2nMo(t2n)=ent2n(12t2n)n2

image

Therefore

logeM(t)=nt2nn2loge(12t2n)

image

or

logeM(t)=nt2n+n2(2t2n+12(2t2n)2+)

image

or

logeM(t)=12t2+O(1n)

image

Further, as

nlogeM(t)12t2

image

or

M(t)e12t2

image

Hence the result.

Example 9.1: Show that the value of χ2 for the contingency table.

Classes A A1 Total
B a b a+b
B1 c d c+d
Total a+c b+d N=a+b+c+d

Image

a, b, c, and d are cell frequencies

Calculated from the independent frequencies is

χ2=N(adbc)2(a+b)(c+d)(b+d)(a+c),(N=a+b+c+d)

image

Solution: Since the marginal total is fixed the probability for a member belonging to class A is a+cNimage and is a constant.

Further the attributes being independent, the probability for it to belong to both classes A and B is (a+bN)(a+cN)image.

The expected frequency of the class A, denoted by E(A) is given by

E(A)=N(a+bN)(a+cN)=(a+b)(a+c)N

image

The expected frequency in each cell=product of column total and row total/whole total

Similarly we get

E(B)=(a+b)(b+d)NE(C)=(c+d)(a+c)NE(D)=(c+d)(b+d)N

image

By the definition of we have

χ2=[aE(A)]2E(A)+[bE(B)]2E(B)+[cE(C)]2E(C)+[dE(D)]2E(D)[a(a+b)(a+c)a+b+c+d]2(a+b)(a+c)a+b+c+d+[b(a+b)(b+d)a+b+c+d]2(a+b)(b+d)a+b+c+d+[c(c+d)(a+c)a+b+c+d]2(c+d)(a+c)a+b+c+d+[d(c+d)(b+d)a+b+c+d]2(c+d)(b+d)a+b+c+d=(adbc)2a+b+c+d[1(a+b)(a+c)+1(a+b)(b+d)+1(c+d)(a+c)+1(c+d)(b+d)]=(adbc)2a+b+c+d[a+b+c+d(a+b)(a+c)(b+d)+a+b+c+d(a+c)(c+d)(b+d)]=(adbc)2[1(a+b)(a+c)(b+d)+1(a+c)(c+d)(b+d)]=(adbc)2[c+d+a+b(a+b)(a+c)(b+d)(c+d)]=[(c+d+a+b)(adbc)2(a+b)(a+c)(b+d)(c+d)]=N(adbc)2(a+b)(c+d)(b+d)(a+c)

image

Example 9.2: Show that for 2 degrees of freedom the probability p of a value of χ2 greater than χ02image is e12χ02image and hence that χ02=2loge(1p)image. Deduce the value of when p=0.05.

Solution: We know that

p(χ2)=12ν2/2Γ(ν/2)χ0eχ22χν1d(χ2)

image

when ν=2image

p(χ2)=120Γ(2/2)χ0eχ22χd(χ2)=χ0eχ22χd(χ2)=[eχ22]χ0=e12χ02

image

e12χ02=pore12χo2=(1p)

image

or

χ022=loge(1p)

image

or

χ02=2loge(1p).

image

When p=0.05 we have

χ02=2loge(10.05)χ02=2loge(1120)χ02=2loge(20)=3.012

image

Example 9.3: Prove that for a χ2 distribution with n degrees of freedom

μr+1=2r(μr+nμr1),(ν>0)

image

Solution: The moment generating function of χ2 distribution with n degrees of freedom about the mean is

μ(t)=ent(12t)n/2

image

Applying logarithms on both sides and differentiating we get

μ(t)μ(t)=n+n2(212t)

image

i.e.,

(12t)μ(t)=2ntμ(t)

image

Using Leibnitz theorem and differentiating with respect to t we get

(12t)μr+1(t)+2(2)μr(t)=2ntμr(t)+2nrμr1(t)

image

Pitting t=0 and using the relation

μr=[ddtμr(t)]t=0=μr(ν)

image

We get

μr+1=2r(μr+nμr1)

image

9.7 Degrees of Freedom

It is the number of values in a set, which may be arbitrarily assigned. The number of independent variables usually called the degrees of freedom. It is denoted by the symbol v. If these are normalized variables subject to k linear constraints then the degrees of freedom is v=nk.

If the data is given in the form of a row containing n observations, then the degrees of freedom is n−1. Similarly if the data is given in the form of a column, the degrees of freedom is n−1. Where n is the number of observation in the column. If there are r rows and c columns. The degree of freedom is given by (r−1) (c−1).

9.8 Conditions for Using Chi-Square Test

1. The total number observations used in this test must be large (i.e., n≥50).

2. Each of the observations making up sample for χ2 test should be independent of each other.

3. The test is wholly dependent on the degree of freedom.

4. The frequencies used in χ2 test should be absolute and not relative in terms.

5. The expected frequency of any item or cell should not be less than 5. If it is less than 5, then the frequencies from the adjacent items or cells should be pooled together in order to make it 5 or more than 5 (preferably not less than 10).

6. The observations collected for χ2 should be based on the method of random sampling.

The constraints on the cell frequencies if any should be linear.

9.9 Uses of Chi-Square Test

Chi-square test an important test. If we require only the degrees of freedom for using this test. It is a powerful test. It is used

1. as a test of goodness of fit.

2. as a test of independence of attributes, and

3. as a test of homogeneity.

9.9.1 Chi-Square Test as a Test of Goodness of Fit

Chi-square test is applied as a test of goodness of fit to determine whether the actual (i.e., observed) the expected (i.e., theoretical) frequencies. The degrees of freedom in this case are ν=n1image where n is the number of observations.

Example 9.4: In 90 throws of a die, face 1 turned 9 times, face 2 or 3 turned 24 times, face 4 or 5 turned 36 times, and face 6 turned 18 times. Test at 10% level, if the die is honest, it being given that χ2 for 3 df=6.25 at 10% level of significance.

Solution:

H0: The die is honest.

H1: The die is not honest.

Expected frequencies for eachface=90×16=15image

Level of significance=10% (i.e., 0.01)

Degrees of freedom=4−1=3

Chi-square value for 3 df at 10% level of significance=6.25

We have the following table:

Face turned Observed Oi Expected Ei (OiEi)image (OiEi)2Eiimage
1 9 15 −6 2.4
2 or 3 27 30 −3 0.3
4 or 5 36 30 6 1.2
6 18 15 3 0.6
Total 90 90  4.5

Image

χ2=(OiEi)2Ei=4.5

image

Since the calculated value of χ2 is less than the table value at 10% level of significance and for 3 df, we accept the null hypothesis and conclude that the die is honest.

Example 9.5: Find whether or not the following observed distribution of phenotypes in a sample of 384 Drosophila flies have a significance goodness of fit with proposed median 9:3:3:1 distribution (test at 5% level of significance)?

Phenotypes AB Ab aB ab Total
Number of files 232 76 58 18 384

Image

Solution: Degrees of freedom=4−1=3

Null hypothesis H0: 9:3:3:1

i.e., Variation is not significant.

Alternative hypothesis H1: 1:1:1:1

Level of significance=α=0.05

Table value χ2=7.82 for 3 df

Observed Oi Expected Ei (OiEi)image (OiEi)2image (OiEi)2Eiimage
232 38416×9=216image 16 256 1.185
76 38416×3=72image 4 16 0.223
58 38416×3=72image −14 196 2.723
18 38416×124image −6 36 2.25
Total=384    6.381

Image

The calculated value of χ2 for 3 df at 5% level of significance is less than the table value. Hence we accept H0, i.e., variation is not significant.

Example 9.6: Among 64 offspring’s of a certain cross between guinea pigs 32 were red, 10 were black, and 22 were white. According to the genetic model these numbers should be in the ratio 9:3:4. Are the data consistent with the model at 5% level?

Solution:

H0: Data are consistent with the model.

H1: Data are not consistent with the model.

Level of significance=5%

Degrees of freedom=n−1=3−1=2

Table value of χ2 for 2 df at 5% level=5.991

Expected frequencies are in the ratio 9:3:4

i.e.,

64(916)=36,64(316)=12,64(416)=16

image
Observed Oi Expected Ei (OiEi)2image (OiEi)2Eiimage
32 36 16 0.4444
10 12 4 0.3333
22 16 36 2.25
Total 64 56 3.0277

Image

χ2=(OiEi)2Ei=3.0277

image

The calculated value of χ2 for 2 df at 5% level of significance is less than the table value. Hence we accept H0, i.e., data are consistent with the model.

Example 9.7: A sample analysis of examination results of 500 students was made. It was found that 230 students had failed. one hundred sixty had secured third class, 80 were placed in second class, and 30 got first class. Do these figures commensurate with the general examination result, which is in the ratio 4:3:2:1 for various categories respectively?

Solution: Null Hypothesis H0: The observed results commensurate with general examination result

Alternative Hypothesis H1: It is not true that the observed results commensurate with general examination result.

Level of significance=5%

Degrees of freedom=n−1=4−1=3

The total frequency=N=500

Table value χ2 for 3 df at 5% level of significance=7.81 for 3 df

Dividing 500 in the ratio 4:3:2:1

We get 200, 150, 100, and 50

Therefore the expected frequencies are 200, 150, 100, and 50 corresponding to the observed frequencies 230, 160, 80, and 30.

Class/division Observed Oi Expected Ei (OiEi)image (OiEi)2Eiimage
Failed 230 200 30 4.500
Third 160 150 10 0.6666
Second 80 100 –20 4.0000
First 30 50 –20 8.00

Image

χ2=(OiEi)2Ei=17.1666

image

Since the calculated value χ2 is greater than the table value of χ2—the null hypothesis is rejected.

Example 9.8: The table below gives the number of aircraft accidents that occurred during the various days of the week. Test whether the accidents are uniformly distributed over the week?

Days Mon Tue Wed Thu Fri Sat
No. of accidents 14 18 12 11 15 14

Image

Solution: Null Hypothesis H0: The accidents are uniformly distributed over the week.

Alternative Hypothesis H1: The accidents are not uniformly distributed over the week.

Total number of accidents=14+18+12+11+15+14=84

Level of significance=5%

Degrees of freedom=6−1=5

Table value of χ2=11.07

The expected frequency of each day accidentsis=846=14image

Day Observed Oi Expected Ei (OiEi)image (OiEi)2image (OiEi)2Eiimage
Mon 14 14 0 0 0
Tue 18 14 4 16 1.1428
Wed 12 14 −2 4 0.2857
Thu 11 14 −3 9 0.6428
Fri 15 14 1 1 0.0714
Sat 14 14 0 0 0.000
Total 84 84   2.1427

Image

Since 2.1427<11.07, the calculated value of χ2 at 5% level, for 5 df is less than the table value of χ2.

The null hypothesis is accepted. That is the accidents are uniformly distributed over the week.

Example 9.9: Four coins are tossed 160 times and the following results were obtained:

Number of heads 0 1 2 3 4
Observed frequencies 17 52 54 31 6

Image

Under the assumption that coins are balanced, find the expected frequencies of getting 0, 1, 2, 3, or 4 heads and test the goodness of fit.

Solution:

Null Hypothesis H0: The coins are balanced.

Alternative hypothesis H1: The coins are not balanced.

Level of significance=5%

Degrees of freedom=5−1=4

Table value of χ2=9.488

We have N=160, p=12image, q=12image

The expected frequencies of 0, 1, 2, 3, or 4 successes are

160×C04(12)0(12)4=10image 160×C14(12)1(12)3=40image
160×C24(12)2(12)2=60image 160×C34(12)3(12)1=40image
160×C44(12)4(12)0=10image  
No. of heads Observed Oi Expected Ei (OiEi)image (OiEi)2image (OiEi)2Eiimage
0 17 10 7 49 4.900
1 52 40 12 144 3.600
2 54 60 −6 36 0.600
3 31 40 −9 81 2.025
4 6 10 −4 16 1.600
Total 160 160   12.725

Image

χ2=(OiEi)2Ei=12.725

image

The calculated value of χ2 is greater than the table value of χ2 at 5% level of significance and 4 df. Therefore null hypothesis is rejected. The coins are not balanced. Hence the fit is poor.

Example 9.10: Fit a Poisson distribution to the following data and test the goodness of fit:

x 0 1 2 3 4 5 6
F 275 72 30 7 5 2 1

Image

Solution: Null Hypothesis H0: The Poisson fit is good to the given data.

Alternative Hypothesis H1: The Poisson fit is not a good fit to the given data.

Level of significance=5%

Meanofthedistribution=fixifi=0+72+60+21+20+10+6275+72+30+7+5+2+1=189392=0.482

image

The frequencies of 0, 1, 2, 3, 4, 5, and 6 successes by using recurrence formula for Poisson distribution are the expected frequencies.

Therefore the expected frequencies are

N(0)=392e0.482=242.1N(1)=392e0.482(0.482)=116.7N(2)=116.7(0.4822)=28.12N(3)=28.12(0.4823)=4.52N(4)=4.52(0.4824)=0.54N(5)=0.54(0.4825)=0.052N(6)=0.052(0.4826)=0.004

image

The frequency table is:

x 0 1 2 3 4 5 6
Observed frequency 275 72 30 7 5 2 1
Expected frequency 242.1 116.7 28.1 4.5 0.5 0.1 0

Image

Since the last four frequencies are small, we regroup the last four frequencies and obtain the following table:

Calculation of χ2

Observed Oi Expected Ei (OiEi)image (OiEi)2image (OiEi)2Eiimage
275 242.1 32.9 1082.41 4.4709
72 116.7 44.7 1998.09 17.1216
30 28.1 1.9 3.61 0.1285
15 5.1 9.9 98.01 19.217
Total    40.938

Image

χ2=(OiEi)2Ei=40.938

image

Degrees of freedom for the Poisson fit=n−2=4−2=2

Table value of χ2 at 5% for 2 df=5.991

The calculated value of χ2 is 40.938 which is greater than the table value.

We reject null hypothesis and conclude that the Poisson fit is not a good fit to the given data.

9.9.2 Test for Independence of Attributes

The χ2 test can also be applied to test the association between the attributes such as honesty, smoking, drinking, etc. when the sample data is presented in the form of contingency table with any number of rows and columns.

Example 9.11: The following table gives classification of 150 workers according to sex and nature of work. Test whether the nature of work is independent of the sex of the work?

 Stable Unstable Total
Males 60 30 90
Females 15 45 60
Total 75 75 150

Image

Solution: Null Hypothesis H0: The nature of the work is independent of the sex of the worker.

Alternative Hypothesis H1: The nature of the work is not independent of the sex of the worker.

Degrees of freedom=(r−1) (c−1)=(2−1) (2−1)=1

Level of significance=5%

Table value of χ2=3.84

Expected frequencies are given in the following table:

 Stable Unstable
Males 75×90150=45image 75×90150=45image
Females 75×90150=30image 75×90150=30image

Image

Calculation of χ2

Observed Oi Expected Ei (OiEi)image (OiEi)2image (OiEi)2Eiimage
60 45 15 225 5.00
15 30 −15 225 7.50
30 45 −15 225 5.00
45 30 15 225 7.50
Total    25

Image

χ2=(OiEi)2Ei=25

image

The calculated value of χ2 is greater than the table value at 5% and df. Hence we reject the null hypothesis.

We conclude that the nature of work is independent of the sex of the worker.

Example 9.12: In a certain sample of 2000 families, 1400 families are consumers of tea, out of 1800 Hindu families 1236 families consume tea. Use χ2 test and state whether there is any significant difference between consumption of tea among Hindu and nonhindu families?

Solution: From the given data, the 2×2 contingency table that can be formed is given below.

 Hindu Nonhindu Total
Consuming tea 1236 164 1400
Not consuming tea 564 36 600
Total 1800 200 200

Image

Null Hypothesis H0: The attributes are independent, i.e., there is no significant difference between the communities as far as consuming of tea is concerned.

Alternative Hypothesis H1: The attributes are not independent.

Level of significance: 5%

df=1, Table value of χ2 at 5% for 1 df=3.841

The expected frequencies corresponding to the given observed frequencies are given in the table below:

1800×14002000=1264image 200×14002000=140image
1800×6002000=540image 200×6002000=60image

Image

Computation of χ2

Observed Oi Expected Ei (OiEi)image (OiEi)2image (OiEi)2Eiimage
1236 1260 −24 576 0.457
564 540 24 576 1.068
164 140 24 576 4.114
36 60 −24 576 9.600
Total    15.239

Image

χ2=(OiEi)2Ei=15.239

image

The calculated value of χ2 is higher than the table value of χ2 at 5% and 1 df. Therefore the null hypothesis is rejected. The two communities differ significantly as far as consumption of tea is concerned.

Example 9.13: A tobacco company claims that there is no relationship between smoking and lung ailments. To investigate the claims, a random sample of 300 males in the age group of 40 and 50 years is given a medical test. The observed sample results are tabulated below:

 Lung ailment Nonlung ailment Total
Smokers 75 105 180
Nonsmokers 25 95 120
Total 100 200 300

Image

On the basis of this information, can it be concluded that smoking and long ailments are independent (given χ0.052=3.841image for 1 df).

Solution: Null hypothesis H0: The smoking and lung ailments are not associated.

Alternative Hypothesis H1: The smoking and lung ailments are associated.

Level of significance=5%

Table value of χ2 for 1 df=3.841

The expected frequencies are:

100×180300=60image 200×180300=120image
100×120300=40image 200×120300=80image

Image

Computation of χ2

Observed Oi Expected Ei (OiEi)image (OiEi)2image (OiEi)2Eiimage
75 60 15 225 3.750
25 40 −15 225 5.625
105 120 −15 225 1.875
95 80 15 225 2.813
Total    14.063

Image

χ2=(OiEi)2Ei=14.063

image

For 1 df at 5% level of significance the calculated value of χ2 is more than the table value. Hence we reject the null hypothesis.

Therefore smoking and lung ailments are not independent.

Example 9.14: Given the following contingency table for hair color and eye color, find the value of χ2? Is there good association between the two?

 Hair color Total
Fair Brown Black  
Eye color Blue 15 5 20 40
Gray 20 10 20 50
Brown 25 15 20 60
Total  60 30 60 150

Image

Solution: Null Hypothesis H0: The two attributes hair color and eye color are independent.

Alternative Hypothesis H1: The two attributes hair color and eye color are not independent.

Level of significance=9.488 for 4 df at 5% level of significance.

Table of expected frequencies are:

60×40150=16image 30×40150=8image 60×40150=16image
60×50150=20image 30×50150=10image 60×50150=20image
60×60150=24image 30×60150=12image 60×60150=24image

Image

Computation of χ2

Observed Oi Expected Ei (OiEi)image (OiEi)2image (OiEi)2Eiimage
15 16 −1 1 0.0625
5 8 −3 9 1.125
20 16 4 16 1
20 20 0 0 0
10 10 0 0 0
20 20 0 0 0
25 24 1 1 0.042
15 12 3 9 0.75
20 24 −4 16 0.666
Total    3.6458

Image

χ2=(OiEi)2Ei=14.063

image

Since the calculated value of χ2 for (3−1) (3−1)=2×2=4 df at 5% level, is less than the table value 9.488. Hence we accept null hypothesis, i.e., the hair color and eye color are independent.

Example 9.15: The following table shows the result of an experiment to investigate the effect of vaccination induced on the animals against a particular disease. Use the χ2 test to test the hypothesis that the vaccinated and unvaccinated groups, i.e., vaccination and the disease are independent.

 Got disease Did not get disease
Vaccinated 9 42
Not vaccinated 17 28

(Value of χ2 for 1 df at 5% level is equal to 3.841).

Solution: We have a=9, b=42, c=17, d=28

N=a+b+c+d=9+42+17+28=96

Null Hypothesis H0: The vaccination and disease are independent.

Alternative Hypothesis H1: The vaccination and disease are not independent.

Table value of χ2 for 1 df at 5% level=3.841

χ2=N(adbc)2(a+b)(c+d)(b+d)(a+c)=96(9×2817×42)2(51)(26)(70)(45)=96(462)24176900=4.906

image

The calculated value of χ2 is more than the table value of χ2 at 5% level and 1 df. Therefore we reject H0 and conclude that the disease and vaccination are not independent.

9.9.2.1 Yate’s Correction

In a 2×2 contingency table, if the cell frequency is small, Yate’s correction is necessary. If the expected value or frequency in any observation less than 5 in 1 df we apply Yate’s correction. We subtract 0.5 from the absolute difference between observed and expected frequencies. By making Yate’s correction becomes continuous and the formula after making Yate’s correction is

χ2=(|OiEi|0.5)2Ei(0.5isYatescorrection)

image

Example 9.16: Two batches each of 12 animals are taken for test of inoculation. One batch was inoculated and the others batch was not inoculated. The frequencies of the dead and surviving animals are given below in both cases. Can the inoculation be regarded as effective against the disease?

 Dead Survived Total
Inoculated 2 10 12
Not inoculated 8 4 12
Total 10 14  

Image

(χ0.052image for 1 df=3.841)

Solution: Null Hypothesis H0: Inoculation and the disease are independent.

Alternative Hypothesis H1: Inoculation and the disease are not independent.

Df=(r1)(c1)=(21)(21)=1

image

Level of significance=5%

Table value of χ2 for 1 df at 5% level=3.841

Frequencies in the cells are small (a and d are small) so, we make Yate’s correction. The expected frequencies are given in the table below:

10×1224=5image 14×1224=7image
10×1224=5image 14×1224=7image

Image

Computation of χ2

Observed Oi Expected Ei |OiEi|image |OiEi|0.5image (|OiEi|0.5)2image (|OiEi|0.5)2Eiimage
2 5 −3 2.5 6.25 1.25
10 7 3 2.5 6.25 0.89285
8 5 3 2.5 6.25 1.25
4 7 −3 2.5 6.25 0.89285
Total     4.2857

Image

χ2=(|OiEi|0.5)2Ei=4.2857

image

For 1 df at 5% level of significance, the calculated value of χ2 is greater than the table value. Hence H0 is rejected. The inoculation and disease and independent we conclude that inoculation is effective against the disease.

9.9.3 Homogeneity Chi-Square

Chi-square test may be used to test the homogeneity of the attributes in respect of particular characteristics. It is performed to decide whether separate samples are sufficiently uniform to be added together. Chi-square test may also be used to test the population variance.

9.9.4 Chi-Square Distribution of Sample Variance

Let σ2 denote the population variance and S2 denote the sample variance. The sampling distribution (n1)S2σ2image has χ2 distribution with n−1 degrees of freedom. It is very useful in making inference about the population variance σ2 by using sample variance S2. Also I is used in making interval estimate of the population variance which is given by

(n1)S2χ2α/2α(n1)Sχ21α/2

image

where α is the level of significance, and χα2image is the value of χ2 distribution giving an area to the right of χα2image.

The test may also be used as hypothesis test for the value of the population variance.

9.9.5 Testing a Hypothesis About the Variance of Normally Distributed Population—Decision Rule

Let σ02image denote the hypothesized value of the population variance. Then the decision rule for accepting or rejecting H0 is as follows:

1. Two-tailed test

Decision rule
H0:σ2=σ02image Accept H0 if computed value of χ2>χα/22image (table value)
H1:σ2σ02image Reject H0 if computed value of χ2>χα/22image (table value)

2. One-tailed test

 
H0:σ2σ02image Accept H0 if computed value of χ2< table value of χα2image
H1:σ2<σ02image Reject H0 if computed value of χ2> table value of χα2image

Image

9.9.5.1 Solved Examples

  Example 9.17: According to the census report of 2004, the numbers of women to every 1000 men in the following 5 states/union territories are as follows:

In Andaman 846, in Delhi 821, in Chandigarh 777, in Daman and Diu 710, in Dadar and Nagar Haveli 812. By an appropriate statistical method determine whether 1:1 sex ratio among human population can be attributed to 5 regions (χ0.052image for 4 df=7.88).

Solution: Null hypothesis H0: Sex ratio is 1:1 among human population of 5 states, Alternative hypothesis H1: Sex ratio is not 1:1 among the human population of 5 states. Table value of χ2 for 4 df at 5% level=7.88

Computation of χ2 for 5 states

 Observed Oi Expected Ei |OiEi|image (|OiEi|)2Eiimage χ2 df
Andaman Men=1000 923 77 6.42 12.84 1
Women=846 923 77 6.42
Delhi Men=1000 910.5 89.5 8.79 17.58 1
Women=821 910.5 89.5 8.79
Chandigarh Men=1000 888.5 111.5 13.99 27.98 1
Women=777 888.5 111.5 19.99
Damn and Diu Men=1000 855 145 24.59 49.18 1
Women=710 855 145 24.59
Dadar and Nagar Haveli Men=1000 906 94 9.75 19.5 1
Women=812 906 94 9.75
Men 5000 Total 127.08 5
Women 3966

Image

Chi-square for summed data

Observed Oi Expected Ei |OiEi|image (|OiEi|)2Eiimage df
Men=500 4483 517 59.62 2−1=1
Women=3966 4483 517 59.62
 Chi-square df
Total 127.08 5
Summed 119.2 1
Homogeneity 7.88 4

Image

The calculated value of χ2 at 5% level of significance for 4 df is less than the table value of χ2. We accept H0.

Example 9.18: Heights in cm of 10 students are given below:

61, 65, 67, 66, 68, 70, 64, 65, 68, 71

Can we say that variance of the distribution heights of all students from which the above sample of 10 students was drawn is equal to 50?

Solution: We have

Meanofthesample=61+65+62+66+68+70+64+65+68+7110=66010=66

image

Null Hypothesis H0:σ2=σ02=50image

Alternative Hypothesis H1:σ2σ02image, i.e., σ250image (Two-tailed test)

Level of significance: 5% (α=0.05) χ1α/22=χ0.9752=2.70image

Computation of χ2

xi xix¯image (xix¯)2image
61 −5 25
65 −1 1
62 −4 16
66 0 0
68 2 4
70 4 16
64 −2 4
65 −1 1
68 2 4
71 5 25

Image

Test statistic is

χ2=(n1)S2σ02=(n1)σ02(xix¯)2n1=(xix¯)2σ02=9650=1.92

image

Since calculated value of χ2 for 9 df at 5% level of significance is less than the value of χ1α/22image at 5% level, we reject null hypothesis and conclude that population variance is not 50.

Exercise 9.1

1. The following figures show the distribution of digits in numbers chosen at random from a telephone directory:

Digits 0 1 2 3 4 5 6 7 8 9
Frequency 1026 1107 997 996 1075 993 1107 972 964 853

Image


Test whether the digits may be taken to occur equally frequently in directory (χ0.052=16.919image for 9 df)?

(Hint:Expectedfrequencyofeachobservation=10,000/10=1000)

image

Ans: Null hypothesis is rejected

2. A die is thrown 264 times with the following results show that the die is biased:

No. appeared on the die 1 2 3 4 5 6
Frequency 40 32 28 58 54 60

Image


(Given χ2 for 5 df at 5% level=11.07)

3. On the basis of information given below about the treatment of 200 patients suffering from a disease, state whether the new treatment is comparatively superior to the conventional treatment?

 Favorable Not favorable Total
New treatment 60 30 90
Conventional treatment 40 70 110

Image


(χ0.052=3.841image for 1 df)

Ans: Null hypothesis is rejected. The new treatment is superior to conventional statement

4. Two hundred digit were chosen at random from a set of tables the frequencies of the digits were

Digit 0 1 2 3 4 5 6 7 8 9
Frequency 18 19 23 21 16 25 22 20 21 15

Image


Use χ2 test to assess the correctness of the hypothesis that the digits were distributed in equal numbers in the tables from which these were chosen (For df at 5% level of significance χ2 value is 16.919).

5. In a sample of owls, it is found that red male are 35, red female are 70, gray male are 50, gray female are 45. Coloration is due to the plumage. Is the coloration independent of sex of the sample?

Ans: The coloration of the individuals are not independent of sex

6. The following table given the frequencies of occurrence of the digits 0, 1, 2, …, 9, in the last place in the four-figure logarithm of numbers. Examine if there is any peculiarity?

Digit 0 1 2 3 4 5 6 7 8 9
Frequency 6 16 15 10 12 12 3 2 9 5

Image


(χ2 Value for 5 df at 5% level of significance is 11.07)

Ans: H0 accepted. There is peculiarity

7. A die was thrown 498 times. Denoting x to be the number appearing on the top face of it, the observed frequency of x is given below:

X 1 2 3 4 5 6
t 69 78 85 82 86 98

Image


What opinion you would form for the accuracy of the die (χ2 value for 5 df at 5% level of significance is 11.07)?

Ans: Die is unbiased

8. Among 64 offsprings of a certain cross between guinea pigs 34 were red, 10 were black, and 30 were white. According to the genetic model these numbers should be in the ratio 9:3:4. Are the data consistent with the model at 5% level?

Ans: The data are consistent with the model

9. In an investigation into the health and nutrition of the two groups of children of different social states, the following results are obtained:

Health Social status Total
 Poor Rich  
Below normal 130 20 150
Normal 102 108 210
Above normal 24 96 120

Image


Discuss the relation between the health and their social status.

Ans: Health and social status are associated, i.e., H0 is rejected

10. The following table gives a classification of a sample of 160 plants of their leaf color and flatness:

 Flat leaves Curled leaves Total
White flower 99 36 135
Red flower 20 5 25
Total 119 41 160

Image


Test whether the flower color is independent of the flatness of leaf?

Ans: H0 is rejected, flower color is independent of the flatness of leaf

11. From the following information, state whether the two attributes viz., condition of house and condition of child are independent:

Condition of the child Condition of the house Total
 Clean Dirty  
Clean 69 51 120
Fairly clean 81 20 101
Dirty 35 44 79
Total 185 115 300

Image


(χ2 at 5% level for 2 df=5.991)

Ans: H0 rejected. There is an association between the condition of the child and condition of the house

12. A certain drug was administered to 500 people out of a total of 800 included in the sample to test its efficiency against typhoid. The results are given below:

 Typhoid No typhoid Total
Drug 200 300 500
No drug 280 20 300
Total 480 320 800

Image


On the basis of the data can we say that the drug is effective in preventing typhoid?

Ans: The drug is effective

13. In an experiment on the immunization of goats from anthrax the following results were obtained. Derive your inference on the vaccine.

 Died of anthrax Survived Total
Inoculated with vaccine 2 10 12
Not inoculated 6 6 12
Total 8 16  

Image

Ans: H0 is accepted. The vaccine is ineffective in controlling the disease

14. Fifty students were selected at random from 500 students enrolled in a computer program were classified according to age and grade points giving the following data:

 20 and under Age in years 21–30 Age above 30
 3 5 2
5.1 to 7.5 8 7 5
7.6 to 10 4 Grade parts 8
  up to 5.0  

Image


Test at 5% level of significance the hypothesis that the age and grade points are independent (χ2 at 5% level for 2 df=5.991).

Ans: H0 accepted. Grade points and age are independent of each other

15. The following data related to the sales in a time of a trade depression of a certain commodity demand:

District where sales District not hit by depression District hit by depression Total
Satisfactory 350 80 330
Not satisfactory 140 30 170
Total 390 110 500

Image


Do these data suggest that the sales are significantly affected by depression (χ0.052=3.841image for 1 df)?

Ans: H0 is accepted. The sales are not significantly affected by depression

16. A survey of 200 families having three children selected at random gave the following results:

Male births 0 1 2 3
No. of families 40 58 62 40

Image


Test the hypothesis that male and female births are equally likely at 5 % level of significance (χ0.052=7.82image for 3 df)?

17. Two researchers adopted different sampling techniques. While investigating same group of students to find the number of students falling into different intelligence level. The results are as follows:

Researchers Below average Average Above average Genius Total
X 86 60 44 10 200
Y 40 33 25 2 100
Total 126 93 69 12 300

Image


Would you say that the sampling techniques adopted by the two researchers are significantly different (χ0.052image for 2 df and 3 df are 5.991 and 7.82, respectively)?

18. In the following data find whether there is any significant liking in the habit of soft designs among categories of employees?

Soft drinks Clerks Teachers Officers
Pepsi 10 25 65
Thumbs-up 15 30 65
Fanta 50 60 30

Image


(χ2 for 4 df at 5% level of significance=9.4888)

Ans: H0 rejected. Habit of drinking soft drinks depends on the category

19. A firm manufacturing rivets wants to limit variations to their length as much as possible. The lengths (in cm) of 10 rivets manufactured by a new process are

21.5 1.99 2.05 2.12 2.17
2.01 1.98 2.03 2.25 1.92

Image


Examine whether the new process can be considered superior to the old if the population has standard deviation 0.145 cm (χ0.052=16.919image for 9 df)?

Ans: H0 accepted. The new process cannot be considered superior to the old process

20. Four coins were tossed 160 times and the following results were obtained:

No. of heads 0 1 2 3 4
Observed frequencies 17 52 54 31 6

Image


Under the assumption that wins are balanced, find the expected frequencies of getting 0, 1, 2, 3, or 4 heads and test the goodness of fit (χ0.052image value is 9.488)?

Ans: The fit is poor

21. One of 8000 graduates in a city, 800 are males; out of 1600 graduate employees 120 are females. Use χ2 test to determine if any destruction is made in appointment on the basis of sex (χ0.052image for 1 df=3.841).

Ans: H0 rejected. There is distribution

22. A survey of 200 families having three children selected at random gave the following results:

Male birth 0 1 2 3
No. of families 40 58 62 40

Image


Test the hypothesis that male and female births are equally likely at 5% level of significance (χ0.052image value is 7.82).

Ans: H0 rejected

23. In the accounting department of bank 100 accounts are selected at random and examined for errors. The following results have been obtained:

No. of errors 0 1 2 3 4 5 6
No. of accounts 36 40 19 2 0 2 1

Image


Does this information verify that errors are distributed according poison probability (χ0.052=7.815image for 3 df)?

Ans: H0 accepted

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset