Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

10
SOME FURTHER RESULTS ON HYPOTHESES TESTING

10.1 INTRODUCTION

In this chapter we study some commonly used procedures in the theory of testing of hypotheses. In Section 10.2 we describe the classical procedure for constructing tests based on likelihood ratios. This method is sufficiently general to apply to multi-parameter problems and is specially useful in the presence of nuisance parameters. These are unknown parameters in the model which are of no inferential interest. Most of the normal theory tests described in Sections 10.3 to 10.5 and those in Chapter 12 can be derived by using methods of Section 10.2. In Sections 10.3 to 10.5 we list some commonly used normal theory-based tests. In Section 10.3 we also deal with goodness-of-fit tests. In Section 10.6 we look at the hypothesis testing problem from a decision-theoretic viewpoint and describe Bayes and minimax tests.

10.2 GENERALIZED LIKELIHOOD RATIO TESTS

In Chapter 9 we saw that UMP tests do not exist for some problems of hypothesis testing. It was suggested that we restrict attention to smaller classes of tests and seek UMP tests in these subclasses or, alternatively, seek tests which are optimal against local alternatives. Unfortunately, some of the reductions suggested in Chapter 9, such as invariance, do not apply to all families of distributions.

In this section we consider a classical procedure for constructing tests that has some intuitive appeal and that frequently, though not necessarily, leads to optimal tests. Also, the procedure leads to tests that have some desirable large-sample properties.

Recall that for testing against , Neyman-Pearson MP test is based on the ratio f₁(x)/f₀(x). If we interpret the numerator as the best possible explanation of x under H₁ and the denominator as the best possible explanation of X under H₀, then it is reasonable to consider the ratio

as a test statistic for testing against . Here L(θ; x) is the likelihood function of x. Note that for each x for which the MLEs of θ under Θ₁ and Θ₀ exist the ratio is well defined and free of θ and can be used as a test statistic. Clearly we should reject H₀ if .

The statistic r is hard to compute; only one of the two supremas in the ratio may be attained.

Let be a vector of parameters, and let X be a random vector with PDF (PMF) f_θ. Consider the problem of testing the null hypothesis against the alternative .

We leave the reader to show that the statistics λ(X) and r(X) lead to the same criterion for rejecting H₀.

The numerator of the likelihood ratio λ is the best explanation of X (in the sense of maximum likelihood) that the null hypothesis H₀ can provide, and the denominator is the best possible explanation of X. H₀ is rejected if there is a much better explanation of X than the best one provided by H₀.

It is clear that . The constant c is determined from the size restriction

If the distribution of λ is continuous (that is, the DF is absolutely continuous), any size α is attainable. If, however, λ(X) is a discrete RV, it may not be possible to find a likelihood ratio test whose size exactly equals α. This problem arises because of the nonrandomized nature of the likelihood ratio test and can be handled by randomization. The following result holds.

The GLR test is of the type obtained in Section 9.4 for families with an MLR except for the boundary . In other words, if the size of the test happens to be exactly α, the likelihood ratio test is a UMP level α test. Since X is a discrete RV, however, to obtain size α may not be possible. We have

If such a c′ does not exist, we choose an integer c′ such that

The situation in Example 1 is not unique. For one-parameter exponential family it can be shown (Birkes [7]) that a GLR test of against is UMP of its size. The result holds also for the dual and, in fact, for a much wider class of one-parameter family of distributions.

The GLR test is specially useful when θ is a multiparameter and we wish to test hypothesis concerning one of the parameters. The remaining parameters act as nuisance parameters.

The computations in Example 2 could be slightly simplified by using Theorem 2. Indeed is a minimal sufficient statistic for θ and since and S² are independent the likelihood is the product of the PDFs of and S². We note that and . We leave it to the reader to carry out the details.

In Example 3 we can obtain the same GLR test by focusing attention on the joint sufficient statistic where and are sample variances of the X’s and the Y’s, respectively. In order to write down the likelihood function we note that are independent RVs. The distributions and are the same as in Example 2 except that m is the sample size. Distributions of and require appropriate modifications. We leave the reader to carry out the details. It turns out that the GLR test coincides with the UMP unbiased test in this case.

In certain situations the GLR test does not perform well. We reproduce here an example due to Stein and Rubin.

Example 4.

Let X be a discrete RV with PMF

under the null hypothesis , and

under the alternative , where α and c are constants with

To test the simple null hypothesis against the composite alternative at the level of significance α, let us compute the likelihood ratio λ. We have

since . Similarly . Also

and

The GLR test rejects H₀ if , where k is to be determined so that the level is α. We see that

provided that . But implies , so that , or , as required. Thus the GLR size α test is to reject H₀ if . The power of the GLR test is

for all . The test is not unbiased and is even worse than the trivial test .

Another test that is better than the trivial test is to reject H₀ whenever (this is opposite to what the likelihood ratio test says). Then

for all , and the test is unbiased.

We will use the generalized likelihood ratio procedure quite frequently hereafter because of its simplicity and wide applicability. The exact distribution of the test statistic under H₀ is generally difficult to obtain (despite what we saw in Examples 1 to 3 above) and evaluation of power function is also not possible in many problems. Recall, however, that under certain conditions the asymptotic distribution of the MLE is normal. This result can be used to prove the following large-sample property of the GLR under H₀, which solves the problem of computation of the cut-off point c at least when the sample size is large.

We will not prove this result here; the reader is referred to Wilks [118, p. 419]. The regularity conditions are essentially the ones associated with Theorem 8.7.4. In Example 2 the number of parameters unspecified under H₀ is one (namely, σ²), and under H₁ two parameters are unspecified (μ and σ²), so that the asymptotic chi-square distribution will have 1 d.f. Similarly, in Example 3, the d.f. = 4 − 3 = 1.

PROBLEMS 10.2

Prove Theorems 1 and 2.
A random sample of size n is taken from PMF , , Find the form of the GLR test of against , , .
Find the GLR test of against , based on a sample of size 1 from b(n, p).
Let X₁,X₂,…,X_n be a sample from , where both μ and σ² are unknown. Find the GLR test of against .
Let X₁, X₂,…,X_k be a sample from PMF
1. Find the GLR test of against .
2. Find the GLR test of against .
For a sample of size 1 from PDF

find the GLR test of against .
Let X₁,X₂,…,X_n be a sample from G(1,ß):
1. Find the GLR test of against .
2. Find the GLR test of against .
Let (X₁, Y₁), (X₂, Y₂),…,(X_n, Y_n) be a random sample from a bivariate normal population with , , , , and . Show that the likelihood ratio test of the null hypothesis against reduces to rejecting H₀ if , where , and being the sample covariance and the sample variances, respectively. (For the PDF of the test statistic R, see Problem 7.7.1.)
Let X₁, X₂,…,X_m beiid G(1, θ) RVs and let Y₁, Y₂,…,Y_n beiid G(1, μ) RVs, where θ and μ are unknown positive real numbers. Assume that the X’s and the Y’s are independent. Develop an α-level GLR test for testing against .
A die is tossed 60 times in order to test , (die is fair) against , . Find the GLR test.
Let X₁, X₂, …,X_n be iid with common PDF , , and = 0 otherwise. Find the level α GLR test for testing against .
.
Let X₁,X₂,…,X_n be iid RVs with common Pareto PDF fθ(x) = θ/x² for x > θ, and = 0 elsewhere. Show that the family of joint PDFs has MLR in X(₁) and find a size α test of H₀: θ = θ₀ against H₁ : θ > θ₀. Show that the GLR test coincides with the UMP test.

10.3 CHI-SQUARE TESTS

In this section we consider a variety of tests where the test statistic has an exact or a limiting chi-square distribution. Chi-square tests are also used for testing some nonparametric hypotheses and will be taken up again in Chapter 13.

We begin with tests concerning variances in sampling from a normal population. Let X₁,X₂,…,X_n be iid RVs where σ² is unknown. We wish to test a hypothesis of the type , where σ₀ is some given positive number. We summarize the tests in the following table.

			Reject H₀ at level α if
	H₀	H₁	μ Known	μ Unknown
I.
II.
III.

Remark 1. All these tests can be derived by the standard likelihood ratio procedure. If μ is unknown, tests I and II are UMP unbiased (and UMP invariant). If μ is known, tests I and II are UMP (see Example 9.4.5). For tests III we have chosen constants c₁, c₂ so that each tail has probability α/2. This is the customary procedure, even though it destroys the unbiasedness property of the tests, at least for small samples.

A test based on a chi-square statistic is also used for testing the equality of several proportions. Let X₁,X₂,…,X_k be independent RVs with , .

If n₁,n₂,…,n_k are large, we can use Theorem 1 to test against all alternatives. If p is known, we compute

and if we reject H₀. In practice p will be unknown. Let . Then the likelihood function is

so that

The MLE of p under H₀ is therefore given by

that is,

Under certain regularity assumptions (see Cramér [17, pp. 426–427]) it can be shown that the statistic

(1)

is asymptotically . Thus the test rejects , p unknown, at level α if .

It should be remembered that the tests based on Theorem 1 are all large-sample tests and hence not exact, in contrast to the tests concerning the variance discussed above, which are all exact tests. In the case , UMP tests of and exist and can be obtained by the MLR method described in Section 9.4. For testing , the usual test is UMP unbiased.

In the case , if n₁ and n₂ are large, a test based on the normal distribution can be used instead of Theorem 1. In this case the statistic

(2)

Where , is asymptotically (0,1) under . If p is known, one uses p instead of . It is not too difficult to show that Z² is equal to Y₁, so that the two tests are equivalent.

For small samples the so-called Fisher-Irwin test is commonly used and is based on the conditional distribution of X₁ given . Let . Then

where

It follows that

On the boundary of any of the hypotheses , , or we note that so that

which is a hypergeometric distribution. For testing this conditional test rejects if , where k(t) is the largest integer for which . Obvious modifications yield critical regions for testing , and against corresponding alternatives.

In applications a wide variety of problems can be reduced to the multinomial distribution model. We therefore consider the problem of testing the parameters of a multinomial distribution. Let (X₁,X₂,…,X_k−1) be a sample from a multinomial distribution with parameters n, p₁, p₂, …,p_k−1, and let us write , and . The difference between the model of Theorem 1 and the multinomial model is the independence of the X_i’s.

To use Theorem 2 to test , we need only to compute the quantity

from the sample; if n is large, we reject H₀ if .

Theorem 2 has much wider applicability, and we will later study its application to contingency tables. Here we consider the application of Theorem 2 to testing the null hypothesis that the DF of an RV X has a specified form.

The proof of Theorem 3 is obvious. One frequently selects A₁, A₂,…,A_k as disjoint intervals. Theorem 3 is especially useful when one or more of the parameters associated with the DF F are unknown. In that case the following result is useful.

Theorem 4.

Let H₀: X ~ F_θ, where θ = (θ₁,θ₂,…,θ_r) is unknown. Let X₁,X₂,…,X_n be independent observations on X, and suppose that the MLEs of θ₁,θ₂,…,θ_r exist and are, respectively, , ,…, . Let A₁, A₂,…,A_k be a collection of disjoint Borel sets that cover the real line, and let

where , and P_θ is the probability distribution associated with F_θ. Let Y₁,Y₂,…,Y_k be the RVs, defined as follows: Y_i = number of X₁,X₂,…,X_n in A_i, .

Then the RV

is asymptotically distributed as a RV (as n → ∞).

The proof of Theorem 4 and some regularity conditions required on F_θ are given in Rao [88, pp. 391–392].

To test , where F is completely specified, we reject H₀ if

provided that n is sufficiently large. If the null hypothesis is , where F_θ is known except for the parameter θ, we use Theorem 4 and reject H₀ if

where r is the number of parameters estimated.

Example 4.

In a 72-hour period on a long holiday weekend there was a total of 306 fatal automobile accidents. The data are as follows:

Number of Fatal Accidents per Hour	Numbers of Hours
0 or 1	4
2	10
3	15
4	12
5	12
6	6
7	6
8 or more	7

Let us test the hypothesis that the number of accidents per hour is a Poisson RV. Since the mean of the Poisson RV is not given, we estimate it by

Let us now estimate , , . Note that

so that . Thus

images

The observed and expected frequencies are as follows:

				i
	0 or 1	2	3	4	5	6	7	8 or more
Observed Frequency, o_i	4	10	15	12	12	6	6	7
Expected Frequency = 72 = e_i	5.38	9.28	13.14	13.96	11.87	8.41	5.10	5.10 4.86

Thus

Since we estimated one parameter, the number of degrees of freedom is . From Table ST3, , and since 2.74 < 12.6, we cannot reject the null hypothesis.

Remark 2. Any application of Theorem 3 or 4 requires that we choose sets A₁, A₂,…,A_k, and frequently these are chosen to be disjoint intervals. As a rule of thumb, we choose the length of each interval in such a way that the probability under H₀ is approximately 1/k. Moreover, it is desirable to have or, rather, for each i. If any of the e_i’s is < 5, the corresponding interval is pooled with one or more adjoining intervals to make the cell frequency at least 5. The number of degrees of freedom, if any pooling is done, is the number of classes after pooling, minus 1, minus the number of parameters estimated.

Finally, we consider a test of homogeneity of several multinomial distributions. Suppose we have c samples of sizes n₁,n₂,…,n_c from c multinomial distributions. Let the associated probabilities with the jth population be (p_1j,p_2j,…,p_rj), where , . Given observations , with we wish to test H₀: , for , . The case c = 1 is covered by Theorem 2. By Theorem 2 for each j

has a limiting distribution. Since samples are independent, the statistic

has a limiting distribution. If p_i’s are unknown we use the MLEs

for p_i and we see that the statistic

has a chi-square distribution with d.f. We reject H₀ at (approximate) level α is .

Example 5.

A market analyst believes that there is no difference in preferences of television viewers among the four Ohio cities of Toledo, Columbus, Cleveland, and Cincinnati. In order to test this belief, independent random samples of 150, 200, 250, and 200 persons were selected from the four cities and asked, “What type of program do you prefer most: Mystery, Soap, Comedy, or News Documentary?” The following responses were recorded.

	City
Program Type	Toledo	Columbus	Cleveland	Cincinnati
Mystery	50	70	85	60
Soap	45	50	58	40
Comedy	35	50	72	67
News	20	30	35	33
Sample Size	150	200	250	200

Under the null hypothesis that the proportions of viewers who prefer the four types of programs are the same in each city, the maximum likelihood estimates of p_i, are given by

Here p₁ = proportion of people who prefer mystery, and so on. The following table gives the expected frequencies under H₀.

	Expected Number of Responses Under H₀
Program Type	Toledo	Columbus	Cleveland	Cincinnati
Mystery	150×0.33 = 49.5	200×0.33 = 66	250×0.33 = 82.5	200×0.33 = 66
Soap	150×0.24 = 36	200×0.24 = 48	250×0.24 = 60	200×0.24 = 48
Comedy	150×0.28 = 42	200×0.28 = 56	250×0.28 = 70	200×0.28 = 56
News	150×0.15 = 22.5	200×0.15 = 30	250×0.15 = 37.5	200×0.15 = 30
Sample	150	200	250	200
Size

It follows that

Since and , the number of degrees of freedom is and we note that under H₀

With such a large P-value we can hardly reject H₀. The data do not offer any evidence to conclude that the proportions in the four cities are different.

PROBLEMS 10.3

The standard deviation of capacity for batteries of a standard type is known to be 1.66 ampere-hours. The following capacities (ampere-hours) were recorded for 10 batteries of a new type: 146, 141, 135, 142, 140, 143, 138, 137, 142, 136. Does the new battery differ from the standard type with respect to variability of capacity (Natrella [75, p. 4-1])?
A manufacturer recorded the cut-off bias (volts) of a sample of 10 tubes as follows:12.1, 12.3, 11.8, 12.0, 12.4, 12.0, 12.1, 11.9, 12.2, 12.2. The variability of cut-off bias for tubes of a standard type as measured by the standard deviation is 0.208 volts. Is the variability of the new tube, with respect to cut-off bias less than that of the standard type (Natrella [75, p. 4–5])?
Approximately equal numbers of four different types of meters are in service and all types are believed to be equally likely to break down. The actual numbers of breakdowns reported are as follows:

Type of Meter 1 2 3 4

Number of Breakdowns Reported 30 40 33 47

Is there evidence to conclude that the chances of failure of the four types are not equal (Natrella [75, p. 9-4])?
Every clinical thermometer is classified into one of four categories, A, B, C, D, on the basis of inspection and test. From past experience it is known that thermometers produced by a certain manufacturer are distributed among the four categories in the following proportions:

Category A B C D

Proportion 0.87 0.09 0.03 0.01

A new lot of 1336 thermometers is submitted by the manufacturer for inspection and test, and the following distribution into the four categories results:

Category A B C D

Number of Thermometers Reported 1188 91 47 10

Does this new lot of thermometers differ from the previous experience with regard to proportion of thermometers in each category (Natrella [75, p. 9-2])?
A computer program is written to generate random numbers, X, uniformly in the interval 0 . From 250 consecutive values the following data are obtained:

X-value 0–1.99 2–3.99 4–5.99 6–7.99 8–9.99

Frequency 38 55 54 41 62

Do these data offer any evidence that the program is not written properly?
A machine working correctly cuts pieces of wire to a mean length of 10.5 cm with a standard deviation of 0.15 cm. Sixteen samples of wire were drawn at random from a production batch and measured with the following results (centimeters): 10.4, 10.6, 10.1, 10.3, 10.2, 10.9, 10.5, 10.8, 10.6, 10.5, 10.7, 10.2, 10.7, 10.3, 10.4, 10.5. Test the hypothesis that the machine is working correctly.
An experiment consists in tossing a coin until the first head shows up. One hundred repetitions of this experiment are performed. The frequency distribution of the number of trials required for the first head is as follows:

Number of trials 1 2 3 4 5 or more

Frequency 40 32 15 7 6

Can we conclude that the coin is fair?
Fit a binomial distribution to the following data:

_x ₀ ₁ ₂ ₃ ₄

_Frequency: ₈ ₄₆ ₅₅ ₄₀ ₁₁
Prove Theorem 1.
Three dice are rolled independently 360 times each with the following results.

Face Value Die 1 Die 2 Die 3

1 50 62 38

2 48 55 60

3 69 61 64

4 45 54 58

5 71 78 73

6 77 50 67

Sample Size 360 360 360

Are all the dice equally loaded? That is, test the hypothesis , , where p_i1 is the probability of getting an i with die 1, and so on.
Independent random samples of 250 Democrats, 150 Republicans, and 100 Independent voters were selected 1 week before a nonpartisan election for mayor of a large city. Their preference for candidates Albert, Basu, and Chatfield were recorded as follows.

Party Affiliation

Preference Democrat Republican Independent

Albert 160 70 90

Basu 32 45 25

Chatfield 30 23 15

Undecided 28 12 20

Sample Size 250 150 150

Are the proportions of voters in favor of Albert, Basu, and Chatfield the same within each political affiliation?
Of 25 income tax returns audited in a small town, 10 were from low- and middle- income families and 15 from high-income families. Two of the low-income families and four of the high-income families were found to have underpaid their taxes. Are the two proportions of families who underpaid taxes the same?
A candidate for a congressional seat checks her progress by taking a random sample of 20 voters each week. Last week, six reported to be in her favor. This week nine reported to be in her favor. Is there evidence to suggest that her campaign is working?
Let {X₁₁,X₂₁,…,X_r1},…,{X_1c,X_2c,…,X_rc} be independent multinomial RVs with parameters (n₁,p₁₁,p₂₁,…,p_r1),…, (n_c,p_1c,p_2c,…,p_rc) respectively. Let and . Show that the GLR test for testing ,for , , where p_j’s are unknown against all alternatives can be based on the statistic

Face Value	Die 1	Die 2	Die 3
1	50	62	38
2	48	55	60
3	69	61	64
4	45	54	58
5	71	78	73
6	77	50	67
Sample Size	360	360	360

	Party Affiliation
Albert	160	70	90
Basu	32	45	25
Chatfield	30	23	15
Undecided	28	12	20
Sample Size	250	150	150

10.4 t-TESTS

In this section we investigate one of the most frequently used types of tests in statistics, the tests based on a t-statistic. Let X₁, X₂,…,X_n be a random sample from (μ,σ²), and, as usual, let us write

The tests for usual null hypotheses about the mean can be derived using the GLR method. In the following table we summarize the results.

			Reject H₀ at level α if
	H₀	H₁	σ²Known	σ²Unknown
I.
II.
III.

Remark 1. A test based on a t-statistic is called a t-test. The t-tests in I and II are called one-tailed tests; the t-test in III, a two-tailed test.

Remark 2. If σ² is known, tests I and II are UMP and test III is UMP unbiased. If σ² is unknown, the t-tests are UMP unbiased and UMP invariant.

Remark 3. If n is large we may use normal tables instead of t-tables. The assumption of normality may also be dropped because of the central limit theorem. For small samples care is required in applying the proper test, since the tail probabilities under normal distribution and t-distribution differ significantly for small n (see Remark 6.4.2).

We next consider the two-sample case. Let X₁, X₂,…,X_m and Y₁, Y₂,…,Y_n be independent random samples from (μ₁, ) and (μ₂, ), respectively. Let us write

and

is sometimes called the pooled sample variance. The following table summarizes the two sample tests comparing μ₁ and μ₂:

	H₀	H₁	Reject H₀ at level α if
	(δ = Known Constant		Known	Unknown,
I.
II.
III.

Remark 4. The case of most interest is that in which . If , are unknown and , σ² unknown, then is an unbiased estimate of σ². In this case all the two-sample t-tests are UMP unbiased and UMP invariant. Before applying the t-test, one should first make sure that , σ² unknown. This means applying another test on the data. We will consider this test in the next section.

Remark 5. If is large, we use normal tables; if both m and n are large, we can drop the assumption of normality, using the CLT.

Remark 6. The problem of equality of means in sampling from several populations will be considered in Chapter 12.

Remark 7. The two sample problem when , both unknown, is commonly referred to as Behrens-Fisher problem. The Welch approximate t-test of is based on a random number of d.f. f given by

where

and the t-statistic

with f d.f. This approximation has been found to be quite good even for small samples. The formula for f generally leads to noninteger d.f. Linear interpolation in t-table can be used to obtain the required percentiles for f d.f.

Quite frequently one samples from a bivariate normal population with means μ₁,μ₂, variances , , and correlation coefficient ρ, the hypothesis of interest being . Let (X_1; Y₁), (X₂, Y₂),…, (X_n, Y_n) be a sample from a bivariate normal distribution with parameters μ₁, μ₂, , , and ρ. Then X_j – Y_j is , where . We can therefore treat , , as a sample from a normal population. Let us write

The following table summarizes the resulting tests:

	H₀	H₁
	d₀ = Known Constant		Reject H₀ at level α if
I.
II.
III.

Remark 8. The case of most importance is that in which . All the t-tests, based on D_j’s, are UMP unbiased and UMP invariant. If σ is known, one can base the test on a standardized normal RV, but in practice such an assumption is quite unrealistic. If n is large one can replace t-values by the corresponding critical values under the normal distribution.

Remark 9. Clearly, it is not necessary to assume that (X₁, Y₁),…,(X_n, Y_n) is a sample from a bivariate normal population. It suffices to assume that the differences D_i form a sample from a normal population.

Example 3.

Nine adults agreed to test the efficacy of a new diet program. Their weights (pounds) were measured before and after the program and found to be as follows:

				Participant
	1	2	3	4	5	6	7	8	9
Before	132	139	126	114	122	132	142	119	126
After	124	141	118	116	114	132	145	123	121

Let us test the null hypothesis that the diet is not effective, , against the alternative, , that it is effective at level . We compute

Thus

Since , we cannot reject hypothesis H₀ that the diet is not very effective.

PROBLEMS 10.4

The manufacturer of a certain subcompact car claims that the average mileage of this model is 30 miles per gallon of regular gasoline. For nine cars of this model driven in an identical manner, using 1 gallon of regular gasoline, the mean distance traveled was 26 miles with a standard deviation of 2.8 miles. Test the manufacturer’s claim if you are willing to reject a true claim no more than twice in 100.
The nicotine contents of five cigarettes of a certain brand showed a mean of 21.2 milligrams with a standard deviation of 2.05 milligrams. Test the hypothesis that the average nicotine content of this brand of cigarettes does not exceed 19.7 milligrams. Use .
The additional hours of sleep gained by eight patients in an experiment with a certain drug were recorded as follows:
Patient 1 2 3 4 5 6 7 8

Hours Gained 0.7 −1.1 3.4 0.8 2.0 0.1 −0.2 3.0

Assuming that these patients form a random sample from a population of such patients and that the number of additional hours gained from the drug is a normal random variable, test the hypothesis that the drug has no effect at level .
The mean life of a sample of 8 light bulbs was found to be 1432 hours with a standard deviation of 436 hours. A second sample of 19 bulbs chosen from a different batch produced a mean life of 1310 hours with a standard deviation of 382 hours. Making appropriate assumptions, test the hypothesis that the two samples came from the same population of light bulbs at level .
A sample of 25 observations has a mean of 57.6 and a variance of 1.8. A further sample of 20 values has a mean of 55.4 and a variance of 2.5. Test the hypothesis that the two samples came from the same normal population.
Two methods were used in a study of the latent heat of fusion of ice. Both method A and method B were conducted with the specimens cooled to −0.72°C. The following data represent the change in total heat from −0.72°C to water, 0°C, in calories per gram of mass:
- Method A: 79.98,80.04,80.02,80.04,80.03,80.03,80.04,79.97,80.05,80.03, 80.02,80.00,80.02
- Method B: 80.02,79.74,79.98,79.97,79.97,80.03,79.95,79.97
Perform a test at level 0.05 to see whether the two methods differ with regard to their average performance (Natrella [75, p. 3-23]).
In Problem 6, if it is known from past experience that the standard deviations of the two methods are and , test the hypothesis that the methods are same with regard to their average performance at level .
During World War II bacterial polysaccharides were investigated as blood plasma extenders. Sixteen samples of hydrolyzed polysaccharides supplied by various manufacturers in order to assess two chemical methods for determining the average molecular weight yielded the following results:
- Method A: 62,700; 29,100; 44,400; 47,800; 36,300; 40,000; 43,400; 35,800; 33,900; 44,200; 34,300; 31,300; 38,400; 47,100; 42,100; 42,200
- Method B: 56,400; 27,500; 42,200; 46,800; 33,300; 37,100; 37,300; 36,200; 35,200; 38,000; 32,200; 27,300; 36,100; 43,100; 38,400; 39,900
Perform an appropriate test of the hypothesis that the two averages are the same against a one-sided alternative that the average of Method A exceeds that of Method B. Use . (Natrella [75, p. 3-38]).
The following grade-point averages were collected over a period of 7 years to determine whether membership in a fraternity is beneficial or detrimental to grades:
Year

1 2 3 4 5 6 7

FraternityFraternity 2.4 2.0 2.3 2.1 2.1 2.0 2.0

Nonfraternity 2.4 2.2 2.5 2.4 2.3 1.8 1.9

Assuming that the populations were normal, test at the 0.025 level of significance whether membership in a fraternity is detrimental to grades.
Consider the two sample t-statistic , where . Suppose . Let m, n → ∞ such that . Show that, under , , where , where . Thus when and and T is approximately (0,1) as . In this case, a t-test based on T will have approximately the right level.

	Year
FraternityFraternity	2.4	2.0	2.3	2.1	2.1	2.0	2.0
Nonfraternity	2.4	2.2	2.5	2.4	2.3	1.8	1.9

10.5 F-TESTS

The term F-tests refers to tests based on an F-statistic. Let X₁,X₂,…,X_m and Y₁,Y₂,…,Y_n be independent samples from and , respectively. We recall that and are independent RVs, so that the RV

is distributed as .

The following table summarizes the F-tests:

			Reject H₀ at level α if
	H₀	H₁	μ₁, μ₂ Known	μ₁, μ₂ Unknown
I.
II.
III.

Remark 1. Recall (Remark 6.4.5) that

Remark 2. The tests described above can be easily obtained from the likelihood ratio procedure. Moreover, in the important case where μ₁, μ₂ are unknown, tests I and II are UMP unbiased and UMP invariant. For test III we have chosen equal tails, as is customarily done for convenience even though the unbiasedness property of the test is thereby destroyed.

An important application of the F-test involves the case where one is testing the equality of means of two normal populations under the assumption that the variances are the same, that is, testing whether the two samples come from the same population. Let X₁, X₂, …,X_m and Y₁, Y₂, …,Y_n be independent samples from and , respectively. If but is unknown, the t-test rejects if , where c is selected so that , that is, , where

s₁, s₂ being the sample variances. If first an F-test is performed to test , and then a t-test to test at levels α₁ and α₂, respectively, the probability of accepting both hypotheses when they are true is

and if F is independent of T, this probability is (1–α₁)(1–α₂). It follows that the combined test has a significance level . We see that

and . In fact, α will be closer to , since for small α₁ and α₂, α₁α₂ will be closer to 0.

We show that F is independent of T whenever . The statistic images is a complete sufficient statistic for the parameter (see Theorem 8.3.2). Since the distribution of F does not depend on μ₁, μ₂, and , it follows (Problem 5) that F is independent of V whenever . But T is a function of V alone, so that F must be independent of T also.

In Example 1, the combined test has a significance level of

PROBLEMS 10.5

For the data of Problem 10.4.4 is the assumption of equality of variances, on which the t-test is based, valid?
Answer the same question for Problems 10.4.5 and 10.4.6.
The performance of each of two different dive bombing methods is measured a dozen times. The sample variances for the two methods are computed to be 5545 and 4073, respectively. Do the two methods differ in variability?
In Problem 3 does the variability of the first method exceed that of the second method?
Let X = (X₁, X₂,…,X_n) be a random sample from a distribution with PDF (PMF) f (x, θ), where Θ is an interval in . Let T (X) be a complete sufficient statistic for the family {f (x; θ): }. If U(X) is a statistic (not a function of T alone) whose distribution does not depend on θ, show that U is independent of T.

10.6 BAYES AND MINIMAX PROCEDURES

Let X₁, X₂,…,Xn be a sample from a probability distribution with PDF (PMF) f_θ, . In Section 8.8 we described the general decision problem, namely, once the statistician observes x, she has a set of options available. The problem is to find a decision function d that minimizes the risk in some sense. Thus a minimax solution requires the minimization of max R(θ, δ) , while a Bayes solution requires the minimization of , where π is the a priori distribution on Θ. In Remark 9.2.1 we considered the problem of hypothesis testing as a special case of the general decision problem. The set contains two points, a₀ and a₁; a₀ corresponds to the acceptance of , and a₁ corresponds to the rejection of H₀. Suppose that the loss function is defined by

(1)

Then

(2)

(3)

A minimax solution to the problem of testing against , where , is to find a rule δ that minimizes

We will consider here only the special case of testing against . In that case we want to find a rule δ which minimizes

(4)

We will show that the solution is to reject H₀ if

(5)

provided that the constant k is chosen so that

(6)

where δ is the rule defined in (5); that is, the minimax rule δ is obtained if we choose k in (5) so that

(7)

or, equivalently, we choose k so that

(8)

Let δ* be any other rule. If , then and δ* cannot be minimax. Thus, , which means that

(9)

By the Neyman-Pearson lemma, rule δ is the most powerful of its size, so that its power must be at least that of δ*, that is,

so that

It follows that

and hence that

(10)

This means that

and thus

Note that in the discrete case one may need some randomization procedure in order to achieve equality in (8).

We next consider the problem of testing against from a Bayesian point of view. Let π(θ) be the a priori probability distribution on Θ.

Then

(11)

The Bayes solution is a decision rule that minimizes R(π, δ). In what follows we restrict our attention to the case where both H₀ and H₁ have exactly one point each, that is, , . Let and . Then

(12)

where , .

The a posteriori distribution of θ is given by

(14)

Thus

It follows that we reject H₀, that is, if

which is the case if and only if

as asserted.

Remark 1. In the Neyman-Pearson lemma we fixed , the probability of rejectingH₀ when it is true, and minimized , the probability of accepting H₀ when it is false. Here we no longer have a fixed level α for . Instead we allow it to assume any value as long as R(π, δ), defined in (12), is minimum.

Remark 2. It is easy to generalize Theorem 1 to the case of multiple decisions. Let X be an RV with PDF (PMF) f_θ , where θ can take any of the k values θ₁ , θ₂,…, θ_k . The problem is to observe x and decide which of the θ_i’ is the correct value of θ. Let us write , , and assume that , , , is the prior probability distribution on . Let

The problem is to find a rule δ that minimizes R(π ,δ). We leave the reader to show that a Bayes solution is to accept if

(15)

where any point lying in more than one such region is assigned to any one of them.

PROBLEMS 10.6

In Example 1 let , , and , and choose . Find the minimax test, and compute its power at and .
A sample of five observations is taken on a b(1 ,θ) RV to test H₀: against .
1. Find the most powerful test of size .
2. If , , and , find the minimax rule.
3. If the prior probabilities of and are and , respectively, find the Bayes rule.
A sample of size n is to be used from the PDF

to test against . If the a priori distribution on θ is , , and , find the Bayes solution. Find the power of the test at and .
Given two normal densities with variances 1 and with means –1 and 1, respectively, find the Bayes solution based on a single observation when and (a) , and (b) , .
Given three normal densities with variances 1 and with means –1, 0, 1, respectively, find the Bayes solution to the multiple decision problem based on a single observation when , , .
For the multiple decision problem described in Remark 2 show that a Bayes solution is to accept if (15) holds.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

0.464	0.137	2.455	−0.323	−0.068
0.906	−0.513	−0.525	0.595	0.881
−0.482	1.678	−0.057	−1.229	−0.486
−1.787	−0.261	1.237	1 . 046	−0.508

Type of Meter	1	2	3	4
Number of Breakdowns Reported	30	40	33	47

Category	A	B	C	D
Proportion	0.87	0.09	0.03	0.01

Category	A	B	C	D
Number of Thermometers Reported	1188	91	47	10

X-value	0–1.99	2–3.99	4–5.99	6–7.99	8–9.99
Frequency	38	55	54	41	62

Table of Contents for 10 SOME FURTHER RESULTS ON HYPOTHESES TESTING

Create new playlist

Sign In

Sign Up

10.1 INTRODUCTION

10.2 GENERALIZED LIKELIHOOD RATIO TESTS

PROBLEMS 10.2

10.3 CHI-SQUARE TESTS

PROBLEMS 10.3

10.4 t-TESTS

PROBLEMS 10.4

10.5 F-TESTS

PROBLEMS 10.5

10.6 BAYES AND MINIMAX PROCEDURES

PROBLEMS 10.6

Table of Contents for
10 SOME FURTHER RESULTS ON HYPOTHESES TESTING