In this chapter we study some commonly used procedures in the theory of testing of hypotheses. In Section 10.2 we describe the classical procedure for constructing tests based on likelihood ratios. This method is sufficiently general to apply to multi-parameter problems and is specially useful in the presence of nuisance parameters. These are unknown parameters in the model which are of no inferential interest. Most of the normal theory tests described in Sections 10.3 to 10.5 and those in Chapter 12 can be derived by using methods of Section 10.2. In Sections 10.3 to 10.5 we list some commonly used normal theory-based tests. In Section 10.3 we also deal with goodness-of-fit tests. In Section 10.6 we look at the hypothesis testing problem from a decision-theoretic viewpoint and describe Bayes and minimax tests.
In Chapter 9 we saw that UMP tests do not exist for some problems of hypothesis testing. It was suggested that we restrict attention to smaller classes of tests and seek UMP tests in these subclasses or, alternatively, seek tests which are optimal against local alternatives. Unfortunately, some of the reductions suggested in Chapter 9, such as invariance, do not apply to all families of distributions.
In this section we consider a classical procedure for constructing tests that has some intuitive appeal and that frequently, though not necessarily, leads to optimal tests. Also, the procedure leads to tests that have some desirable large-sample properties.
Recall that for testing against , Neyman-Pearson MP test is based on the ratio f1(x)/f0(x). If we interpret the numerator as the best possible explanation of x under H1 and the denominator as the best possible explanation of X under H0, then it is reasonable to consider the ratio
as a test statistic for testing against . Here L(θ; x) is the likelihood function of x. Note that for each x for which the MLEs of θ under Θ1 and Θ0 exist the ratio is well defined and free of θ and can be used as a test statistic. Clearly we should reject H0 if .
The statistic r is hard to compute; only one of the two supremas in the ratio may be attained.
Let be a vector of parameters, and let X be a random vector with PDF (PMF) fθ. Consider the problem of testing the null hypothesis against the alternative .
We leave the reader to show that the statistics λ(X) and r(X) lead to the same criterion for rejecting H0.
The numerator of the likelihood ratio λ is the best explanation of X (in the sense of maximum likelihood) that the null hypothesis H0 can provide, and the denominator is the best possible explanation of X. H0 is rejected if there is a much better explanation of X than the best one provided by H0.
It is clear that . The constant c is determined from the size restriction
If the distribution of λ is continuous (that is, the DF is absolutely continuous), any size α is attainable. If, however, λ(X) is a discrete RV, it may not be possible to find a likelihood ratio test whose size exactly equals α. This problem arises because of the nonrandomized nature of the likelihood ratio test and can be handled by randomization. The following result holds.
The GLR test is of the type obtained in Section 9.4 for families with an MLR except for the boundary . In other words, if the size of the test happens to be exactly α, the likelihood ratio test is a UMP level α test. Since X is a discrete RV, however, to obtain size α may not be possible. We have
If such a c′ does not exist, we choose an integer c′ such that
The situation in Example 1 is not unique. For one-parameter exponential family it can be shown (Birkes [7]) that a GLR test of against is UMP of its size. The result holds also for the dual and, in fact, for a much wider class of one-parameter family of distributions.
The GLR test is specially useful when θ is a multiparameter and we wish to test hypothesis concerning one of the parameters. The remaining parameters act as nuisance parameters.
The computations in Example 2 could be slightly simplified by using Theorem 2. Indeed is a minimal sufficient statistic for θ and since and S2 are independent the likelihood is the product of the PDFs of and S2. We note that and . We leave it to the reader to carry out the details.
In Example 3 we can obtain the same GLR test by focusing attention on the joint sufficient statistic where and are sample variances of the X’s and the Y’s, respectively. In order to write down the likelihood function we note that are independent RVs. The distributions and are the same as in Example 2 except that m is the sample size. Distributions of and require appropriate modifications. We leave the reader to carry out the details. It turns out that the GLR test coincides with the UMP unbiased test in this case.
In certain situations the GLR test does not perform well. We reproduce here an example due to Stein and Rubin.
We will use the generalized likelihood ratio procedure quite frequently hereafter because of its simplicity and wide applicability. The exact distribution of the test statistic under H0 is generally difficult to obtain (despite what we saw in Examples 1 to 3 above) and evaluation of power function is also not possible in many problems. Recall, however, that under certain conditions the asymptotic distribution of the MLE is normal. This result can be used to prove the following large-sample property of the GLR under H0, which solves the problem of computation of the cut-off point c at least when the sample size is large.
We will not prove this result here; the reader is referred to Wilks [118, p. 419]. The regularity conditions are essentially the ones associated with Theorem 8.7.4. In Example 2 the number of parameters unspecified under H0 is one (namely, σ2), and under H1 two parameters are unspecified (μ and σ2), so that the asymptotic chi-square distribution will have 1 d.f. Similarly, in Example 3, the d.f. = 4 − 3 = 1.
find the GLR test of against .
.
In this section we consider a variety of tests where the test statistic has an exact or a limiting chi-square distribution. Chi-square tests are also used for testing some nonparametric hypotheses and will be taken up again in Chapter 13.
We begin with tests concerning variances in sampling from a normal population. Let X1,X2,…,Xn be iid RVs where σ2 is unknown. We wish to test a hypothesis of the type , where σ0 is some given positive number. We summarize the tests in the following table.
Reject H0 at level α if | ||||
H0 | H1 | μ Known | μ Unknown | |
I. | ||||
II. | ||||
III. |
Remark 1. All these tests can be derived by the standard likelihood ratio procedure. If μ is unknown, tests I and II are UMP unbiased (and UMP invariant). If μ is known, tests I and II are UMP (see Example 9.4.5). For tests III we have chosen constants c1, c2 so that each tail has probability α/2. This is the customary procedure, even though it destroys the unbiasedness property of the tests, at least for small samples.
A test based on a chi-square statistic is also used for testing the equality of several proportions. Let X1,X2,…,Xk be independent RVs with , .
If n1,n2,…,nk are large, we can use Theorem 1 to test against all alternatives. If p is known, we compute
and if we reject H0. In practice p will be unknown. Let . Then the likelihood function is
so that
The MLE of p under H0 is therefore given by
that is,
Under certain regularity assumptions (see Cramér [17, pp. 426–427]) it can be shown that the statistic
is asymptotically . Thus the test rejects , p unknown, at level α if .
It should be remembered that the tests based on Theorem 1 are all large-sample tests and hence not exact, in contrast to the tests concerning the variance discussed above, which are all exact tests. In the case , UMP tests of and exist and can be obtained by the MLR method described in Section 9.4. For testing , the usual test is UMP unbiased.
In the case , if n1 and n2 are large, a test based on the normal distribution can be used instead of Theorem 1. In this case the statistic
Where , is asymptotically (0,1) under . If p is known, one uses p instead of . It is not too difficult to show that Z2 is equal to Y1, so that the two tests are equivalent.
For small samples the so-called Fisher-Irwin test is commonly used and is based on the conditional distribution of X1 given . Let . Then
where
It follows that
On the boundary of any of the hypotheses , , or we note that so that
which is a hypergeometric distribution. For testing this conditional test rejects if , where k(t) is the largest integer for which . Obvious modifications yield critical regions for testing , and against corresponding alternatives.
In applications a wide variety of problems can be reduced to the multinomial distribution model. We therefore consider the problem of testing the parameters of a multinomial distribution. Let (X1,X2,…,Xk−1) be a sample from a multinomial distribution with parameters n, p1, p2, …,pk−1, and let us write , and . The difference between the model of Theorem 1 and the multinomial model is the independence of the Xi’s.
To use Theorem 2 to test , we need only to compute the quantity
from the sample; if n is large, we reject H0 if .
Theorem 2 has much wider applicability, and we will later study its application to contingency tables. Here we consider the application of Theorem 2 to testing the null hypothesis that the DF of an RV X has a specified form.
The proof of Theorem 3 is obvious. One frequently selects A1, A2,…,Ak as disjoint intervals. Theorem 3 is especially useful when one or more of the parameters associated with the DF F are unknown. In that case the following result is useful.
Remark 2. Any application of Theorem 3 or 4 requires that we choose sets A1, A2,…,Ak, and frequently these are chosen to be disjoint intervals. As a rule of thumb, we choose the length of each interval in such a way that the probability under H0 is approximately 1/k. Moreover, it is desirable to have or, rather, for each i. If any of the ei’s is < 5, the corresponding interval is pooled with one or more adjoining intervals to make the cell frequency at least 5. The number of degrees of freedom, if any pooling is done, is the number of classes after pooling, minus 1, minus the number of parameters estimated.
Finally, we consider a test of homogeneity of several multinomial distributions. Suppose we have c samples of sizes n1,n2,…,nc from c multinomial distributions. Let the associated probabilities with the jth population be (p1j,p2j,…,prj), where , . Given observations , with we wish to test H0: , for , . The case c = 1 is covered by Theorem 2. By Theorem 2 for each j
has a limiting distribution. Since samples are independent, the statistic
has a limiting distribution. If pi’s are unknown we use the MLEs
for pi and we see that the statistic
has a chi-square distribution with d.f. We reject H0 at (approximate) level α is .
Under the null hypothesis that the proportions of viewers who prefer the four types of programs are the same in each city, the maximum likelihood estimates of pi, are given by
Here p1 = proportion of people who prefer mystery, and so on. The following table gives the expected frequencies under H0.
Expected Number of Responses Under H0 | ||||
Program Type | Toledo | Columbus | Cleveland | Cincinnati |
Mystery | 150×0.33 = 49.5 | 200×0.33 = 66 | 250×0.33 = 82.5 | 200×0.33 = 66 |
Soap | 150×0.24 = 36 | 200×0.24 = 48 | 250×0.24 = 60 | 200×0.24 = 48 |
Comedy | 150×0.28 = 42 | 200×0.28 = 56 | 250×0.28 = 70 | 200×0.28 = 56 |
News | 150×0.15 = 22.5 | 200×0.15 = 30 | 250×0.15 = 37.5 | 200×0.15 = 30 |
Sample | 150 | 200 | 250 | 200 |
Size |
It follows that
Since and , the number of degrees of freedom is and we note that under H0
With such a large P-value we can hardly reject H0. The data do not offer any evidence to conclude that the proportions in the four cities are different.
Type of Meter | 1 | 2 | 3 | 4 |
Number of Breakdowns Reported | 30 | 40 | 33 | 47 |
Is there evidence to conclude that the chances of failure of the four types are not equal (Natrella [75, p. 9-4])?
Category | A | B | C | D |
Proportion | 0.87 | 0.09 | 0.03 | 0.01 |
A new lot of 1336 thermometers is submitted by the manufacturer for inspection and test, and the following distribution into the four categories results:
Category | A | B | C | D |
Number of Thermometers Reported | 1188 | 91 | 47 | 10 |
Does this new lot of thermometers differ from the previous experience with regard to proportion of thermometers in each category (Natrella [75, p. 9-2])?
X-value | 0–1.99 | 2–3.99 | 4–5.99 | 6–7.99 | 8–9.99 |
Frequency | 38 | 55 | 54 | 41 | 62 |
Do these data offer any evidence that the program is not written properly?
Number of trials | 1 | 2 | 3 | 4 | 5 or more |
Frequency | 40 | 32 | 15 | 7 | 6 |
Can we conclude that the coin is fair?
x | 0 | 1 | 2 | 3 | 4 |
Frequency: | 8 | 46 | 55 | 40 | 11 |
Face Value | Die 1 | Die 2 | Die 3 |
1 | 50 | 62 | 38 |
2 | 48 | 55 | 60 |
3 | 69 | 61 | 64 |
4 | 45 | 54 | 58 |
5 | 71 | 78 | 73 |
6 | 77 | 50 | 67 |
Sample Size | 360 | 360 | 360 |
Are all the dice equally loaded? That is, test the hypothesis , , where pi1 is the probability of getting an i with die 1, and so on.
Party Affiliation | |||
Preference | Democrat | Republican | Independent |
Albert | 160 | 70 | 90 |
Basu | 32 | 45 | 25 |
Chatfield | 30 | 23 | 15 |
Undecided | 28 | 12 | 20 |
Sample Size | 250 | 150 | 150 |
Are the proportions of voters in favor of Albert, Basu, and Chatfield the same within each political affiliation?
In this section we investigate one of the most frequently used types of tests in statistics, the tests based on a t-statistic. Let X1, X2,…,Xn be a random sample from (μ,σ2), and, as usual, let us write
The tests for usual null hypotheses about the mean can be derived using the GLR method. In the following table we summarize the results.
Reject H0 at level α if | ||||
H0 | H1 | σ2Known | σ2Unknown | |
I. | ||||
II. | ||||
III. |
Remark 1. A test based on a t-statistic is called a t-test. The t-tests in I and II are called one-tailed tests; the t-test in III, a two-tailed test.
Remark 2. If σ2 is known, tests I and II are UMP and test III is UMP unbiased. If σ2 is unknown, the t-tests are UMP unbiased and UMP invariant.
Remark 3. If n is large we may use normal tables instead of t-tables. The assumption of normality may also be dropped because of the central limit theorem. For small samples care is required in applying the proper test, since the tail probabilities under normal distribution and t-distribution differ significantly for small n (see Remark 6.4.2).
We next consider the two-sample case. Let X1, X2,…,Xm and Y1, Y2,…,Yn be independent random samples from (μ1, ) and (μ2, ), respectively. Let us write
and
is sometimes called the pooled sample variance. The following table summarizes the two sample tests comparing μ1 and μ2:
H0 | H1 | Reject H0 at level α if | ||
(δ = Known Constant | Known | Unknown, | ||
I. | ||||
II. | ||||
III. |
Remark 4. The case of most interest is that in which . If , are unknown and , σ2 unknown, then is an unbiased estimate of σ2. In this case all the two-sample t-tests are UMP unbiased and UMP invariant. Before applying the t-test, one should first make sure that , σ2 unknown. This means applying another test on the data. We will consider this test in the next section.
Remark 5. If is large, we use normal tables; if both m and n are large, we can drop the assumption of normality, using the CLT.
Remark 6. The problem of equality of means in sampling from several populations will be considered in Chapter 12.
Remark 7. The two sample problem when , both unknown, is commonly referred to as Behrens-Fisher problem. The Welch approximate t-test of is based on a random number of d.f. f given by
where
and the t-statistic
with f d.f. This approximation has been found to be quite good even for small samples. The formula for f generally leads to noninteger d.f. Linear interpolation in t-table can be used to obtain the required percentiles for f d.f.
Quite frequently one samples from a bivariate normal population with means μ1,μ2, variances , , and correlation coefficient ρ, the hypothesis of interest being . Let (X1; Y1), (X2, Y2),…, (Xn, Yn) be a sample from a bivariate normal distribution with parameters μ1, μ2, , , and ρ. Then Xj – Yj is , where . We can therefore treat , , as a sample from a normal population. Let us write
The following table summarizes the resulting tests:
H0 | H1 | ||
d0 = Known Constant | Reject H0 at level α if | ||
I. | |||
II. | |||
III. |
Remark 8. The case of most importance is that in which . All the t-tests, based on Dj’s, are UMP unbiased and UMP invariant. If σ is known, one can base the test on a standardized normal RV, but in practice such an assumption is quite unrealistic. If n is large one can replace t-values by the corresponding critical values under the normal distribution.
Remark 9. Clearly, it is not necessary to assume that (X1, Y1),…,(Xn, Yn) is a sample from a bivariate normal population. It suffices to assume that the differences Di form a sample from a normal population.
Patient | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
Hours Gained | 0.7 | −1.1 | 3.4 | 0.8 | 2.0 | 0.1 | −0.2 | 3.0 |
Assuming that these patients form a random sample from a population of such patients and that the number of additional hours gained from the drug is a normal random variable, test the hypothesis that the drug has no effect at level .
Perform a test at level 0.05 to see whether the two methods differ with regard to their average performance (Natrella [75, p. 3-23]).
Perform an appropriate test of the hypothesis that the two averages are the same against a one-sided alternative that the average of Method A exceeds that of Method B. Use . (Natrella [75, p. 3-38]).
Year | ||||||||
1 | 2 | 3 | 4 | 5 | 6 | 7 | ||
FraternityFraternity | 2.4 | 2.0 | 2.3 | 2.1 | 2.1 | 2.0 | 2.0 | |
Nonfraternity | 2.4 | 2.2 | 2.5 | 2.4 | 2.3 | 1.8 | 1.9 |
Assuming that the populations were normal, test at the 0.025 level of significance whether membership in a fraternity is detrimental to grades.
The term F-tests refers to tests based on an F-statistic. Let X1,X2,…,Xm and Y1,Y2,…,Yn be independent samples from and , respectively. We recall that and are independent RVs, so that the RV
is distributed as .
The following table summarizes the F-tests:
Reject H0 at level α if | ||||
H0 | H1 | μ1, μ2 Known | μ1, μ2 Unknown | |
I. | ||||
II. | ||||
III. |
Remark 1. Recall (Remark 6.4.5) that
Remark 2. The tests described above can be easily obtained from the likelihood ratio procedure. Moreover, in the important case where μ1, μ2 are unknown, tests I and II are UMP unbiased and UMP invariant. For test III we have chosen equal tails, as is customarily done for convenience even though the unbiasedness property of the test is thereby destroyed.
An important application of the F-test involves the case where one is testing the equality of means of two normal populations under the assumption that the variances are the same, that is, testing whether the two samples come from the same population. Let X1, X2, …,Xm and Y1, Y2, …,Yn be independent samples from and , respectively. If but is unknown, the t-test rejects if , where c is selected so that , that is, , where
s1, s2 being the sample variances. If first an F-test is performed to test , and then a t-test to test at levels α1 and α2, respectively, the probability of accepting both hypotheses when they are true is
and if F is independent of T, this probability is (1–α1)(1–α2). It follows that the combined test has a significance level . We see that
and . In fact, α will be closer to , since for small α1 and α2, α1α2 will be closer to 0.
We show that F is independent of T whenever . The statistic is a complete sufficient statistic for the parameter (see Theorem 8.3.2). Since the distribution of F does not depend on μ1, μ2, and , it follows (Problem 5) that F is independent of V whenever . But T is a function of V alone, so that F must be independent of T also.
In Example 1, the combined test has a significance level of
Let X1, X2,…,Xn be a sample from a probability distribution with PDF (PMF) fθ, . In Section 8.8 we described the general decision problem, namely, once the statistician observes x, she has a set of options available. The problem is to find a decision function d that minimizes the risk in some sense. Thus a minimax solution requires the minimization of max R(θ, δ) , while a Bayes solution requires the minimization of , where π is the a priori distribution on Θ. In Remark 9.2.1 we considered the problem of hypothesis testing as a special case of the general decision problem. The set contains two points, a0 and a1; a0 corresponds to the acceptance of , and a1 corresponds to the rejection of H0. Suppose that the loss function is defined by
Then
A minimax solution to the problem of testing against , where , is to find a rule δ that minimizes
We will consider here only the special case of testing against . In that case we want to find a rule δ which minimizes
We will show that the solution is to reject H0 if
provided that the constant k is chosen so that
where δ is the rule defined in (5); that is, the minimax rule δ is obtained if we choose k in (5) so that
or, equivalently, we choose k so that
Let δ* be any other rule. If , then and δ* cannot be minimax. Thus, , which means that
By the Neyman-Pearson lemma, rule δ is the most powerful of its size, so that its power must be at least that of δ*, that is,
so that
It follows that
and hence that
This means that
and thus
Note that in the discrete case one may need some randomization procedure in order to achieve equality in (8).
We next consider the problem of testing against from a Bayesian point of view. Let π(θ) be the a priori probability distribution on Θ.
Then
The Bayes solution is a decision rule that minimizes R(π, δ). In what follows we restrict our attention to the case where both H0 and H1 have exactly one point each, that is, , . Let and . Then
where , .
The a posteriori distribution of θ is given by
Thus
It follows that we reject H0, that is, if
which is the case if and only if
as asserted.
Remark 1. In the Neyman-Pearson lemma we fixed , the probability of rejectingH0 when it is true, and minimized , the probability of accepting H0 when it is false. Here we no longer have a fixed level α for . Instead we allow it to assume any value as long as R(π, δ), defined in (12), is minimum.
Remark 2. It is easy to generalize Theorem 1 to the case of multiple decisions. Let X be an RV with PDF (PMF) fθ , where θ can take any of the k values θ1 , θ2,…, θk . The problem is to observe x and decide which of the θi’ is the correct value of θ. Let us write , , and assume that , , , is the prior probability distribution on . Let
The problem is to find a rule δ that minimizes R(π ,δ). We leave the reader to show that a Bayes solution is to accept if
where any point lying in more than one such region is assigned to any one of them.
to test against . If the a priori distribution on θ is , , and , find the Bayes solution. Find the power of the test at and .