DEFINITION 4.1.– Let (E, , P) be a statistical model. We call a measurable map from (E, ) to (E′, ′) a statistic with values in (E′, ′).
COMMENT 4.1.– It is important to emphasize the fact that a statistic does not depend on P ∈ P. The measurable map is not a statistic. A decision function, on the other hand, is always a statistic.
In this section, we indicate certain properties of some common statistics. We first give some definitions:
– Let μ be a probability on a measurable space (E0, 0). A sequence X1,…,Xn of n independent random variables with distribution μ is called a sample of size n of the distribution μ. The result of n independent draws following μ is called a realization of this sample.
– The measure (where δ(a) denotes the Dirac measure at the point a) is called the empirical distribution associated with the theoretical distribution μ. The empirical distribution is therefore a (random) probability on , and we define in a natural way:
– The empirical distribution function:
– The empirical mean:
– The empirical variance:
– In a similar way, we define the empirical moments, the empirical median, etc. In p, we define the empirical mean and the empirical covariance matrix.
All of these random quantities are statistics when the statistical model is of the form .
We have the following properties:
1) For f -measurable and bounded, fdμ when n → ∞.
2) nFn (x) follows a binomial distribution with parameters n and μ(] – ∞, x]).
3) Let F be the distribution function of μ. (the Glivenko–Cantelli theorem).
4)
where K has the distribution function
5) If , .
6) If :
where μ4 denotes the fourth-order central moment of X1.
With the exception of property (4), whose proof falls outside the scope of this book, the other properties are straightforward consequences of classical theorems in probability theory.
In this section, we study a very important class of statistics: those that contain all the information given by the observations. We say that they are sufficient. We will conduct this study using a slightly more general notion: that of a sufficient sub-σ-algebra.
DEFINITION 4.2.–Let (E, , P) be a statistical model.
1) , a sub-σ-algebra of , is said to be sufficient if there exists a version of P common to all the distributions P ∈ P.
2) A statistic T, with values in (E′, ′), is said to be sufficient if the σ-algebra T–1(′) is sufficient.
INTERPRETATION 4.1.– If T is sufficient, the distribution depending on T no longer depends on P, which means that T(X) contains all the information carried by X.
We say that (E, , P) is dominated by a σ-finite measure m (or that P is dominated) if every distribution in P has a density with respect to m. The following theorem, called the “factorization theorem”, gives a characterization of the sufficient sub-σ-algebras in the case of a dominated model.
THEOREM 4.1.– Let (E, , P) be a statistical model dominated by a probability P* ∈ P. A necessary and sufficient condition for the sub-σ-algebra to be sufficient is that there exists, for all P ∈ P, a version of dP/dP* that is -measurable. We may then choose a version of P* as a common version of P.
PROOF.–
– Necessary condition: Let be a sufficient sub-σ-algebra and let (·|·) be a version of P common to every distribution P ∈ P.
Consequently, we have:
or again, since (B|∈·) is -measurable:
where P denotes the restriction of P to .
Let φP be the (-measurable!) density of P with respect to . We have:
that is from a fundamental property of the conditional expectation,
φP is therefore an -measurable version of dP/dP*.
– Sufficient condition: Let φP be an -measurable version of dP/dP* (P ∈ P) and be a version of P*.
Since, for F ∈ , FφP is -measurable, we have:
which is rewritten as:
and therefore (B|·) is a version of P(B|) for all B ∈ and all P ∈ P.
From Doob’s lemma, we immediately have the following corollary:
COROLLARY 4.1.– Let (E, , P) be a statistical model dominated by a probability P* ∈ P. The statistic T with values in (E′, ′) is sufficient if and only if, for all P ∈ P, there exists a positive and ′-measurable gP such that:
for P*-almost every x of E.
More generally, we have the following corollary:
COROLLARY 4.2.–If (E, , P) is dominated by some σ-finite m, then T is sufficient if and only if, for all P ∈ P,
for m-almost every x in E, where h is positive and -measurable and gP is positive and ′-measurable.
PROOF.– As the complete proof of this result is a challenging one, we will make the following additional hypothesis: there exists P* ∈ P such that dP*/dm is strictly positive. Under these conditions, the model is dominated by P* and, following from Corollary 4.1, T is sufficient if and only if gP T is a version of dP/dP*. The identity dP/dm = dP/dP* · dP*/dm then lets us conclude by posing h = dP*/dm.
EXAMPLE 4.1. The Gaussian case.– We consider the model . It is dominated by the Lebesgue measure on , and the density is:
[4.1]
is therefore a sufficient statistic (Corollary 4.2).
– We may also use the decomposition:
which shows that (, S2) is sufficient.
– If we modify the model by fixing σ2, then is sufficient, following from the second equality of [4.1].
– If we modify the model by fixing m, then S2 is not sufficient but is, following from the first equality of [4.1].
EXAMPLE 4.2.–Let be a model, where denotes the set of non-empty compact convex sets in [0, 1]2 and λC denotes a uniform distribution on C.
is equipped with the Borel σ-algebra generated by the distance
where Δ denotes the symmetric difference, i.e. .
Then, the mapping T that associates the convex envelope of points (xi, yi), i = 1,…, n with 1, is continuous and therefore measurable. Because the density of with respect to is written as:
T is a sufficient statistic.
EXAMPLE 4.3. Exponential models.–
DEFINITION 43.– An exponential model is a dominated statistical model where the density is of the form:
In this formula, ·, · denotes the scalar product of and T is assumed to be measurable.
This family of models contains numerous common examples (see Exercise 4.1). The model in Example 4.1 is exponential, whereas the model in Example 4.2 is not.
The form of fθ shows that T is a sufficient statistic.
We will see that if the model possesses a sufficient statistic T, we may restrict ourselves to decision functions that are functions of T; this is the Rao-Blackwell theorem.
LEMMA 4.1. Jensen’s inequality.–Let f: p → be a μ-integrable, convexfunction, where μ is a probability on p such that . Then
This inequality is strict if f is strictly convex and μ is non-degenerate.
PROOF.–
– Let us set . The convexity of f is reflected by the existence of a linear functional ℓx0 such that:
Integrating both sides of this inequality, we find:
[4.2]
– If f(x) > f(x0) + lx0(x – x0) for x ≠ x0 and if μ(x0) < 1, [4.2] is strict, as
from which
which implies that μ(x0)= 1.
THEOREM 4.2. Rao-Blackwell theorem.– Let (E, , (Pθ) θ∈Θ) be a statistical model and T be a regular sufficient statistic (i.e. the common version of PT−1 (′) is regular on ). We suppose that D = p and that is defined by a loss function L such that L(θ, ·) is convex for all θ ∈ Θ. Then, if S is a decision function that is integrable for all θ, then is preferable to S.
PROOF.– Let N(B, T) be a regular version of PT–1(′), independent of θ. then
is therefore well defined.
Now, if R(θ, S) = +∞, the result is clear. Otherwise, we may write:
(from Jensen’s inequality for λ = N(·, T(X))S−1), i.e.
COMMENT 4.2.– If L(θ, ·) is strictly convex for at least one value of θ and if is never degenerate, then is strictly preferable to S.
CONCLUSION.– The Rao–Blackwell theorem states that if T is sufficient, then we may use the model induced by T and consider the induced risk function:
Thus, if is an optimal decision function in this model, then o T is optimal in the initial model.
EXERCISE 4.1.– Show that the following models are exponential, and determine a sufficient statistic in each case:
1) The set of Gamma distributions Γ(α, β), α, β > 0.
2) The Gaussian distributions on p parametrized by their expectation values and their covariance matrices.
3) The family of Poisson distributions.
4) Every model of the form where (E0, 0, P0) is already an exponential model.
EXERCISE 4.2.– Let (E, , P) be a statistical model. We say that a sub-σ-algebra of is free if, for all A ∈ , P(A) is constant as P varies in P. A statistic is said to be free if the sub-σ-algebra that generates it is as such.
1) Let be a sufficient and complete sub-σ-algebra and be a sub-σ-algebra of . Show that is free if and only if and are independent for all P ∈ P.
2) Let X1,…, Xn be a sample of size n of the normal distribution (m, σ2), m ∈ , σ ∈ +. Show that the statistic:
where r is a given positive number, is free. What are the consequences of this? (We will assume that , where , is complete.)
EXERCISE 4.3.–
1) We fish until we obtain r fish of a particular type, r ∈ * being known. Let X be the random variable associated with the number of fish needed to be caught, and p ∈]0,1[ the proportion being the type of fish that we consider (we will assume that this proportion does not vary as we fish). What is the distribution followed by X? Is this model exponential? If the answer is yes, give the normal parameter(s) and put the model in the canonical form.
2) We consider an urn containing N objects of which NA are of type A. We draw n objects without replacing them, and we call X the random variable associated with the number of objects of type A obtained in these n draws. Give the distribution of X. Is this model exponential?
3) We consider the Pareto distribution of density . Is this model exponential? If the answer is yes, give the normal parameter(s) and put it in the canonical form. Write the distribution of the sample of size n.
EXERCISE 4.4.– The negative normal distribution with the parameter μ > 0 has the density:
1) Calculate the maximum likelihood estimator for an n-sample from this distribution.
2) Show that it is an exponential model for which we will determine the normal parameters. From this, deduce E(X) and Var (X). Using the properties of the exponential model, find the maximum likelihood estimator and give its limit in distribution.
Hint: Use results from Chapters 5 and 7.
1 Here, [0, 1]2n* denotes the set of 2n-tuples of elements in [0, 1] whose convex envelope is of positive measure. This set is assumed to be equipped with the trace topology from the usual topology of [0, 1]2n.