Chapter 4

Statistics and Sufficiency

4.1. Samples and empirical distributions

DEFINITION 4.1.– Let (E, images, P) be a statistical model. We call a measurable map from (E, images) to (E′, images′) a statistic with values in (E′, images′).

COMMENT 4.1.– It is important to emphasize the fact that a statistic does not depend on P ∈ P. The measurable map images is not a statistic. A decision function, on the other hand, is always a statistic.

In this section, we indicate certain properties of some common statistics. We first give some definitions:

– Let μ be a probability on a measurable space (E0, images0). A sequence X1,…,Xn of n independent random variables with distribution μ is called a sample of size n of the distribution μ. The result of n independent draws following μ is called a realization of this sample.

– The measure images (where δ(a) denotes the Dirac measure at the point a) is called the empirical distribution associated with the theoretical distribution μ. The empirical distribution is therefore a (random) probability on images, and we define in a natural way:

The empirical distribution function:

images

The empirical mean:

images

The empirical variance:

images

– In a similar way, we define the empirical moments, the empirical median, etc. In imagesp, we define the empirical mean and the empirical covariance matrix.

All of these random quantities are statistics when the statistical model is of the form images.

4.1.1. Properties of the empirical distribution and the associated statistics

We have the following properties:

1) For f images-measurable and bounded, images fdμ when n → ∞.

2) nFn (x) follows a binomial distribution with parameters n and μ(] – ∞, x]).

3) Let F be the distribution function of μ. images (the Glivenko–Cantelli theorem).

4)

images

where K has the distribution function images

5) If images, images.

6) If images:

images

where μ4 denotes the fourth-order central moment of X1.

images

With the exception of property (4), whose proof falls outside the scope of this book, the other properties are straightforward consequences of classical theorems in probability theory.

4.2. Sufficiency

In this section, we study a very important class of statistics: those that contain all the information given by the observations. We say that they are sufficient. We will conduct this study using a slightly more general notion: that of a sufficient sub-σ-algebra.

DEFINITION 4.2.–Let (E, images, P) be a statistical model.

1) images, a sub-σ-algebra of images, is said to be sufficient if there exists a version of Pimages common to all the distributions P ∈ P.

2) A statistic T, with values in (E′, images′), is said to be sufficient if the σ-algebra T–1(images′) is sufficient.

INTERPRETATION 4.1.– If T is sufficient, the distribution depending on T no longer depends on P, which means that T(X) contains all the information carried by X.

4.2.1. The factorization theorem

We say that (E, images, P) is dominated by a σ-finite measure m (or that P is dominated) if every distribution in P has a density with respect to m. The following theorem, called the “factorization theorem”, gives a characterization of the sufficient sub-σ-algebras in the case of a dominated model.

THEOREM 4.1.– Let (E, images, P) be a statistical model dominated by a probability P* ∈ P. A necessary and sufficient condition for the sub-σ-algebra images to be sufficient is that there exists, for all P ∈ P, a version of dP/dP* that is images-measurable. We may then choose a version of P*images as a common version of Pimages.

PROOF.–

Necessary condition: Let images be a sufficient sub-σ-algebra and let images(·|·) be a version of Pimages common to every distribution P ∈ P.

Consequently, we have:

images

or again, since images(B|∈·) is images-measurable:

images

where Pimages denotes the restriction of P to images.

Let φP be the (images-measurable!) density of Pimages with respect to images. We have:

images

that is from a fundamental property of the conditional expectation,

images

φP is therefore an images-measurable version of dP/dP*.

Sufficient condition: Let φP be an images-measurable version of dP/dP* (P ∈ P) and images be a version of P*images.

Since, for Fimages, imagesP is images-measurable, we have:

images

which is rewritten as:

images

and therefore images(B|·) is a version of P(B|images) for all Bimages and all PP.

From Doob’s lemma, we immediately have the following corollary:

COROLLARY 4.1.– Let (E, images, P) be a statistical model dominated by a probability P* ∈ P. The statistic T with values in (E′, images′) is sufficient if and only if, for all P ∈ P, there exists a positive and images′-measurable gP such that:

images

for P*-almost every x of E.

More generally, we have the following corollary:

COROLLARY 4.2.–If (E, images, P) is dominated by some σ-finite m, then T is sufficient if and only if, for all P ∈ P,

images

for m-almost every x in E, where h is positive and images-measurable and gP is positive and images′-measurable.

PROOF.– As the complete proof of this result is a challenging one, we will make the following additional hypothesis: there exists P* ∈ P such that dP*/dm is strictly positive. Under these conditions, the model is dominated by P* and, following from Corollary 4.1, T is sufficient if and only if gP images T is a version of dP/dP*. The identity dP/dm = dP/dP* · dP*/dm then lets us conclude by posing h = dP*/dm.

4.3. Examples of sufficient statistics – an exponential model

EXAMPLE 4.1. The Gaussian case.– We consider the model images. It is dominated by the Lebesgue measure on images, and the density is:

[4.1]images

images is therefore a sufficient statistic (Corollary 4.2).

– We may also use the decomposition:

images

which shows that (images, S2) is sufficient.

– If we modify the model by fixing σ2, then images is sufficient, following from the second equality of [4.1].

– If we modify the model by fixing m, then S2 is not sufficient but images is, following from the first equality of [4.1].

EXAMPLE 4.2.–Let images be a model, where images denotes the set of non-empty compact convex sets in [0, 1]2 and λC denotes a uniform distribution on C.

images is equipped with the Borel σ-algebra generated by the distance

images

where Δ denotes the symmetric difference, i.e. images.

Then, the mapping T that associates the convex envelope of points (xi, yi), i = 1,…, n with images 1, is continuous and therefore measurable. Because the density of images with respect to images is written as:

images

T is a sufficient statistic.

Figure 4.1. The convex envelope of a sample of size n taken from λc

ch4-fig4.1.gif

EXAMPLE 4.3. Exponential models.–

DEFINITION 43.– An exponential model is a dominated statistical model where the density is of the form:

images

In this formula, images·, ·images denotes the scalar product of images and T is assumed to be measurable.

This family of models contains numerous common examples (see Exercise 4.1). The model in Example 4.1 is exponential, whereas the model in Example 4.2 is not.

The form of fθ shows that T is a sufficient statistic.

4.4. Use of a sufficient statistic

We will see that if the model possesses a sufficient statistic T, we may restrict ourselves to decision functions that are functions of T; this is the Rao-Blackwell theorem.

LEMMA 4.1. Jensen’s inequality.–Let f: imagespimages be a μ-integrable, convexfunction, where μ is a probability on imagesp such that images. Then

images

This inequality is strict if f is strictly convex and μ is non-degenerate.

PROOF.–

– Let us set images. The convexity of f is reflected by the existence of a linear functional x0 such that:

images

Integrating both sides of this inequality, we find:

[4.2] images

– If f(x) > f(x0) + lx0(x – x0) for xx0 and if μ(x0) < 1, [4.2] is strict, as

images

from which

images

which implies that μ(x0)= 1.

THEOREM 4.2. Rao-Blackwell theorem.– Let (E, images, (Pθ) θ∈Θ) be a statistical model and T be a regular sufficient statistic (i.e. the common version of PT−1 (images′) is regular on images). We suppose that D = imagesp and that images is defined by a loss function L such that L(θ, ·) is convex for all θ ∈ Θ. Then, if S is a decision function that is integrable for all θ, then images is preferable to S.

PROOF.– Let N(B, T) be a regular version of PT–1(images′), independent of θ. then

images

images is therefore well defined.

Now, if R(θ, S) = +∞, the result is clear. Otherwise, we may write:

images

(from Jensen’s inequality for λ = N(·, T(X))S−1), i.e.

images

COMMENT 4.2.– If L(θ, ·) is strictly convex for at least one value of θ and if images is never degenerate, then images is strictly preferable to S.

CONCLUSION.– The Rao–Blackwell theorem states that if T is sufficient, then we may use the model images induced by T and consider the induced risk function:

images

Thus, if images is an optimal decision function in this model, then images o T is optimal in the initial model.

4.5. Exercises

EXERCISE 4.1.– Show that the following models are exponential, and determine a sufficient statistic in each case:

1) The set of Gamma distributions Γ(α, β), α, β > 0.

2) The Gaussian distributions on imagesp parametrized by their expectation values and their covariance matrices.

3) The family of Poisson distributions.

4) Every model of the form images where (E0, images0, P0) is already an exponential model.

EXERCISE 4.2.– Let (E, images, P) be a statistical model. We say that a sub-σ-algebra images of images is free if, for all Aimages, P(A) is constant as P varies in P. A statistic is said to be free if the sub-σ-algebra that generates it is as such.

1) Let images be a sufficient and complete sub-σ-algebra and images be a sub-σ-algebra of images. Show that images is free if and only if images and images are independent for all P ∈ P.

2) Let X1,…, Xn be a sample of size n of the normal distribution image(m, σ2), mimages, σimages+. Show that the statistic:

images

where r is a given positive number, is free. What are the consequences of this? (We will assume that images, where images, is complete.)

EXERCISE 4.3.–

1) We fish until we obtain r fish of a particular type, rimages* being known. Let X be the random variable associated with the number of fish needed to be caught, and p ∈]0,1[ the proportion being the type of fish that we consider (we will assume that this proportion does not vary as we fish). What is the distribution followed by X? Is this model exponential? If the answer is yes, give the normal parameter(s) and put the model in the canonical form.

2) We consider an urn containing N objects of which NA are of type A. We draw n objects without replacing them, and we call X the random variable associated with the number of objects of type A obtained in these n draws. Give the distribution of X. Is this model exponential?

3) We consider the Pareto distribution of density images. Is this model exponential? If the answer is yes, give the normal parameter(s) and put it in the canonical form. Write the distribution of the sample of size n.

EXERCISE 4.4.– The negative normal distribution with the parameter μ > 0 has the density:

images

1) Calculate the maximum likelihood estimator for an n-sample from this distribution.

2) Show that it is an exponential model for which we will determine the normal parameters. From this, deduce E(X) and Var (X). Using the properties of the exponential model, find the maximum likelihood estimator and give its limit in distribution.

Hint: Use results from Chapters 5 and 7.


1 Here, [0, 1]2n* denotes the set of 2n-tuples of elements in [0, 1] whose convex envelope is of positive measure. This set is assumed to be equipped with the trace topology from the usual topology of [0, 1]2n.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset