Chapter 5

Point Estimation

5.1. Generalities

5.1.1. Definition – examples

Let images be a statistical model and let g be a map from Θ to a set A with a σ-algebra images.

DEFINITION 5.1.– An estimator of g(θ) is a measurable map from E to A.

In an estimation problem, the set of decisions is therefore the set where the function g of the parameter takes its values. It acts to, in light of the observations, admit a value for g(θ) (a “function of the parameter”).

COMMENT 5.1.– If the model is written in the form images, we may consider g to be a function of P.

EXAMPLE 5.1.– In all of the examples below, images.

1) images.

2) images.

3) images, where uθ denotes a uniform distribution on [0, θ], images; g(θ) = θ.

4) P = a set of distributions of the form images, where Pm, D is a probability on images with expectation value m and covariance matrix D; g(θ) = (m, D).

5) P = the set of distributions of the form Pn, where P is a probability on images with a density fP with respect to the Lebesgue measure λ (or a known, or a square-integrable density, etc.); g(θ) = θ = fP.

Here images, or L2(λ), etc., respectively).

6) P = the set of distributions of the form Pn, where P describes the set of all probabilities on images; g(θ) = θ = FP (distribution function of P) (or also g(θ) = P).

7) P = the set of distributions of the form Pn, where P describes the set of distributions on images with positive density such that, if (X, Y) follows such a distribution, x images E(Y|X = x) is defined (Y is therefore integrable or positive); g(θ) = g(P) = E(Y | X = x).

8) P = the set of distributions of the form Pn, where P describes the set of probabilities on images with compact support in SP; g(θ) = g(P) = SP.

COMMENT 5.2.– We see that A may take a great variety of forms: images, a function space and even a class of sets (point 8) which may also, after taking the quotient, be equipped with a metric (d(A, B) = λ(A Δ B)).

5.1.2. Choice of a preference relation

1) In general, if images, we define a preference relation using the loss function:

images

The associated risk function is then the quadratic error or the quadratic risk:

images

where T denotes the estimator (i.e. the decision function).

If Eθ(T) = g(θ) and R(θ, T) is the variance of T, we write images.

COMMENT 5.3.– We may also use loss functions of the form c(θ)(g(θ) − a)2, which leads to the same preference relation between estimators, provided that c(θ) is strictly positive for all θ.

The loss function |g(θ) − a|, which generally provides a different preference relation, is sometimes used. However, it gives a less convenient cost function than the quadratic error.

2) If images, we define a preference relation in a similar way.

First of all, given X = (X1, …, Xp), a random variable with values in images such that images, we define its matrix of second-order moments by setting:

images

For centered X, CX coincides with the covariance matrix of X.

We may consider CX as defining a symmetric linear operator of images. It is then straightforward to verify that:

[5.1] images

where images denotes the scalar product of images.

In particular,

[5.2] images

Conversely, if D is a symmetric linear operator of images having property [5.2], it also has property [5.1], since

images

From this, we deduce that D = CX.

We now define a partial order relation images on the symmetric linear operators of images by posing images if and only if images for all images.

PROPERTIES.–

images

The above property allows us to define a preference relation images on the set of estimators of g(θ) by writing:

[5.3] images

If S and T are of square-integrable norm (for all θ), we also have:

images

Relation images may be interpreted in the following way: images if and only if, for all images is preferable to 〈S, y〉 as an estimator of 〈g(θ),y〉 with respect to the quadratic error.

3) Relation images has the inconvenience of not allowing a numerical measure of the risk associated with an estimator. This observation leads us to define a relation images by the equivalence

[5.4] images

This relation is less refined than the previous one since, if T images S and if {e1, …, ep} denotes the canonical basis of images, then we have, for all θ ∈ Θ,

images

Note that if T is of square-integrable norm, then images is the trace of the matrix CTg(θ).

5.2. Sufficiency and completeness

5.2.1. Sufficiency

In estimation theory, the Rao–Blackwell theorem takes the following form:

THEOREM 5.1.Let g be a function of the parameter, with values in images, and let T be an estimator of g(θ) that is integrable for all θ. If U is a sufficient statistic and if images denotes the σ-algebra generated by U, then we have:

images

PROOF.– The Rao–Blackwell theorem states that images is preferable to 〈T, y〉 for the estimation of 〈g(θ, y). Since images, we deduce that images and consequently images.

APPLICATION: Symmetrization of an estimator.– Given a statistical model of the form images where P0 is a family of probabilities on images, let us consider the statistic S defined by:

images

where x(1) denotes the smallest xi, x(2) denotes the smallest remaining xi, …, and x(n) denotes the largest xi.

This statistic is called the order statistic. The σ-algebra that generates it is the σ-algebra of symmetric Borel sets (i.e. the σ-algebra of Borel sets that are invariant under permutation of the coordinates). The images-measurable random variables are the symmetric random variables.

We then have the following theorem.

THEOREM 5.2.If the statistical model is of the form images, then the order statistic is regularly sufficient.

PROOF.– Let images and images, then

images

and, since B is symmetric, we have the equality:

images

where S denotes the set of permutations of the first n natural numbers.

The function images being images-measurable, it constitutes a regular version of images; this version is independent of P.

If T is an integrable estimator of g(P) with values in images, then Theorems 5.1 and 5.2 show that images is preferable to T, as images is the symmetrization of T, i.e.

images

since the distribution determined by U(x1, …, xn) is images.

CONCLUSION.– If the observations come from a sample, we may restrict ourselves to the study of estimators that are symmetric with respect to the data.

5.2.2. Complete statistics

DEFINITION 5.2.– Let images be a statistical model and let images be a sub-σ-algebra of images is said to be complete if for every real, images-measurable statistic U:

images

A statistic S is said to be complete if the sub-σ-algebra that generates it is complete.

Placing ourselves under the hypotheses of Theorem 5.1, and denoting by images the set of estimators T′ of g(P) such that EP(T′) = EP(T) for all P ∈ P, we then have:

THEOREM 5.3. Lehmann–Scheffé theorem.– If U is a regularly sufficient and complete statistic, images is optimal in images for images and images.

PROOF.– We first note that images (a property of the conditional expectation). However,

images

with

images

Since U is complete, we deduce that P [φ(U) = 0] = 1, ∀P ∈ P. images is therefore unique, up to an equivalence, and it is preferable to T for all images: it is optimal in images.

EXAMPLE 5.2.–

1) Given the exponential model (Example 4.3, section 4.3) of density A(θ) · exp images with respect to the Lebesgue measure λp on images, where θ ∈ Θ is an open convex set in images, T is sufficient and complete. Indeed,

images

The injectivity of the Laplace transform1 then implies that the measure with sign φ·λT−1 is zero, which means that φ = 0 λT−1 almost everywhere, i.e. φ(T) = 0 λ almost everywhere, from which Pθ [φ(T) = 0] =1.

2) The order statistic U is sufficient and complete in P, and in images where images denotes the set of discrete distributions of images as well as in images, where images denotes the set of distributions of images which have a density of the form images with the images and where the Ij are pairwise-disjoint bounded intervals.

Let us show this result for images (and therefore for P as well); for images, we may consult [FRA 57, pp. 27–30], but the proof is similar – it also uses the following lemma:

LEMMA 5.1.Let images(p1, …, pn) be a homogeneous polynomial such that images(p1, …, pn) = 0 for (p1, …, pn) ∈ [0, 1]n andpi = 1. Q is then identically zero.

PROOF.– Q being homogeneous, we may replace the conditions on the pi with the conditions p1 ≥ 0, …, pn ≥ 0.

If we then write images as a polynomial in pn, then the condition images(p1, …, pn) = 0 for p1 ≥0, …, pn ≥ 0 leads to the fact that the coefficients of pn are zero. Since the coefficients are homogeneous polynomials with n − 1 variables, we deduce the lemma by induction.

It now suffices to consider images, and to write that for a symmetric function g:

images

and deduce that images, as the integral is a homogeneous nth degree polynomial that vanishes identically (Lemma 5.1) and the coefficient of p1pn is images, since g is symmetric.

5.3. The maximum-likelihood method

5.3.1. Definition

Point estimation methods vary considerably, and depend greatly on the problem under consideration. We first refer to the Bayesian methods which we developed in section 2.3. One quite general method is that of the maximum likelihood.

DEFINITION 5.3.– Let images, be a dominated statistical model, and let T be an estimator of Θ. We say that T is a maximum-likelihood estimator if:

images

REMARK 5.1.–

1) If Θ is an open set of images and if f(x, ·) is differentiable for all x, then a maximum-likelihood estimator is a solution to the system of likelihood equations:

[5.5] images

Of course, a solution to [5.5] is not necessarily a maximum-likelihood estimator.

2) The random function f(X, ·) is called the likelihood associated with the considered model.

EXAMPLE 5.3.–

1) X = (X1, …, Xn) a sample of images. We set

images

The likelihood equation:

images

has the solution

images

We easily verify that

images

(using the fact that, for u > 0, u – 1 – log u ≥ 0).

2) X = (X1, …, Xn) a sample with uniform distribution on images. Then:

images

X(1) and X(n) – 1 are both maximum-likelihood estimators for θ: the maximum-likelihood estimator is not unique.

3) Θ = the set of compact convex sets of images with positive Lebesgue measure.

Pθ = a uniform distribution on θ, θ ∈ Θ.

X = (X1, …, Xn) a sample taken from Pθ, n ≥ 3.

A maximum-likelihood estimator of θ is the convex envelope of (X1, …, Xn). Indeed,

images

A maximum-likelihood estimator is sought among the θ such that f(X, θ) > 0, therefore among the θ that contain the convex envelope of the sample. Among these, it is necessary to seek those which maximize f(X, θ) and therefore minimize λ(θ): the result is the convex envelope.

5.3.2. Maximum likelihood and sufficiency

PROPOSITION 5.1.– If Y is a sufficient statistic for θ, every maximum-likelihood estimator is, up to an equivalence, a function of T.

PROOF.–

f(x, θ) = g[T(x), θ] h(x) (factorization theorem).

– The set of x such that h(x) = 0 is of zero measure for all Pθ.

Now, if h(x) > 0, a maximum images of f(x, θ) is also a maximum of g [T(x), θ] and therefore a function of T(x).

COMMENT 5.4.– A maximum-likelihood estimator is not always a sufficient statistic.

EXAMPLE 5.4.X = (X1, …, Xn) a sample of a uniform distribution on [θ, 2θ], 0 < θ < +∞. (X(1), …, X(n)) is sufficient and minimal, and the maximum-likelihood estimator of X(n)/2 is not therefore sufficient.

5.3.3. Calculating maximum-likelihood estimators

It is often difficult to explicitly solve the likelihood equation, even in the regular cases where the solution is unique.

We first indicate an important case where the maximum-likelihood estimator is unique:

PROPOSITION 5.2.– If X = (X1, …, Xn) is a sample of size n with density:

images

θ = (θ1, …, θk) ∈ Θ is an open set in images (exponential model), and if the matrix [2/(∂θi∂θj)ϕ] is positive-definite, ∀θ ∈ Θ; then the maximum-likelihood estimator images of θ is the unique solution to:

[5.6] images

COMMENT 5.5.– We will later prove that ϕ is infinitely differentiable.

PROOF.– images maximizes images. Differentiating, we deduce that images is the solution to [5.6] since the fact that [2/(∂θi∂θj)ϕ] is positive-definite implies the existence of a unique solution for [5.6] which, furthermore, is a maximum for the density of X.

Let us now give an example where we do not have an explicit solution: if X1, …, Xn is a sample of a Cauchy distribution with density 1/π[1 + (xθ)2], images, then the likelihood equation is written images and it may have multiple solutions. We may then use numerical methods:

5.3.3.1. The Newton–Raphson method

We write:

images

with θ1 given.

For ν = 0, we find, as an approximation of images:

images

In general, we write:

images

Under regularity conditions, and for large enough k, θk is a good approximation of images.

REMARK 5.2.–

1) In the case of the Cauchy distribution (Example 5.4), it is often interesting to take the empirical median images as an estimator of θ:

images

The determination of this estimator does not require any calculation, but images is slightly worse than images.

2) We will see later that the maximum-likelihood estimator often has good asymptotic properties.

5.4. Optimal unbiased estimators

As we saw in Chapter 2, it is necessary to restrict the set of decision rules in the hope of obtaining an optimal rule for the envisaged problem.

In estimation theory, we are often limited to the search of unbiased, minimum-dispersion estimators.

We will begin by studying some existence conditions of unbiased estimators in a more detailed manner than is generally done: we will see that the existence conditions of such an estimator are quite restrictive.

5.4.1. Unbiased estimation

DEFINITION 5.4.– Let images be a statistical model and let g be a map from Θ to images. An estimator T of g(θ) is said to be unbiased if for all θ ∈ Θ, T is Pθ-integrable and

images

EXAMPLE 5.5.–

1) In the case of a sample on images is an unbiased estimator of the mean, if it exists.

2) In the case of a sample on images and for g(θ) = Fθ(t), where Fθ is the distribution function of Pθ: the empirical distribution function:

images

is an unbiased estimator of Fθ(t).

In the usual cases, the definition of an unbiased estimator is in agreement with the general definition of an unbiased decision function given in Chapter 2. More precisely, we have the following result:

PROPOSITION 5.3.– Let T be an estimator of g(θ) such that:

1) images,

2) images;

then T is unbiased if and only if:

3) images.

PROOF.– For all (θ, θ′) ∈ Θ2, we may write:

images

Writing the previous relation for θ′ = θ and taking the difference, we obtain:

images

Moving to expected values, we obtain:

images

Result (3) is therefore equivalent to:

4) images.

Then, if T is unbiased, we have (4) and therefore (3). Conversely, if (3) is satisfied, then we have (4) and taking θ′ such that g(θ′) = Eθ(T) in (4), we deduce that T is unbiased.

5.4.1.1. Existence of an unbiased estimator

Here is a general result relating to the existence of an unbiased estimator obtained by Bickel and Lehmann [BIC 69].

We first give some definitions:

DEFINITION 5.5.– Given a statistic model of the form images, an estimator of g(P) is said to be an estimator of order n.

Assuming that g takes values from images, we say that g is estimable without bias if there exist a natural number n and an unbiased estimator of order n for g(P).

If g is estimable without bias, the degree of g is the smallest natural number n such that g(P) has an unbiased estimator of order n.

EXAMPLE 5.6.– In images, the variance is estimable without bias, with degree 2.

THEOREM 5.4. Bickel–Lehmann theorem.– If P0 is convex, and if g is estimable without bias, then the following conditions are equivalent:

1) the degree of g = n.

2)g[αP + (1 – α)Q] is a polynomial in α of degree ≤ n, for all (P, Q) ∈ P0 × P0.

– ∀P ∈ P0, ∃Q ∈ P0 such that g [αP + (1 − α)Q] is exactly of degree n.

The conditions of this theorem may seem surprising, but they are clarified by the following remark: if Tn is an unbiased estimator of order n for g, we have:

images

and the first term in this equality is a polynomial in α of degree ≤ n.

For the proof, we refer to [BOS 87b].

The Bickel–Lehmann theorem shows above all the rarity of parameters allowing an unbiased estimator. It is clear, for example, that the standard deviation is not estimable without bias.

COMMENT 5.6.– We may show that condition (2) of Theorem 5.4 does not assure the existence of an unbiased estimator of g (even if n = 1).

5.4.2. Unbiased minimum-dispersion estimator

THEOREM 5.5.Let T be an estimator of g(θ) where g takes values in images. We suppose that T is unbiased and is such that images for all θ. The two following conditions are then equivalent:

1) T is unbiased and of minimum dispersion (i.e. T is optimal for images in the family of unbiased estimators with square-integrable norm).

2) For every real-valued statistic U, which is centered and square integrable for all θ, we have:

images

PROOF.–

−1) images 2)

The result being evident for y = 0, we may suppose that y ≠ 0. Then T + αU y is an unbiased estimator for g(θ), of square-integrable norm images. Since T is optimal, we have:

images

from which

images

Letting α tend to 0+, we note that Eθ [UT , y〉] ≥ 0. We therefore have Eθ [UT , y〉] = 0, otherwise we may make γ < 0 for a well-chosen α < 0.

– 2) images 1)

Let S be unbiased such that images; we set images. Uy satisfies the hypotheses of (2), therefore

images

from which

images

that is

images

COMMENT 5.7.– Let us set T = (T1, …, Tp). It is clear that in condition (2), we may replace:

images

with

images

5.4.2.1. Application to an exponential model

Let Pθ = exp [〈θ, T〉 − ϕ(θ)] · μ, θ ∈ Θ, be an exponential model. We suppose that Θ is an open set in images and that μ is a σ-finite measure on images.

Let us first of all state the following lemma.

LEMMA 5.2.ϕ is infinitely differentiable.

PROOF.– To simplify the notation, we will only conduct the proof for p = 1.

Let there be θ0 ∈ Θ and ε > 0 such that [θ0 − 2ε, θ0 + 2ε] ⊂ Θ.

For integer k ≥ 1, we are given a constant ck such that:

images

Then, for θ ∈]θ0ε, θ0 + ε[,

[5.7] images

This last function is μ-integrable since:

[5.8] images

Inequality [5.7] and the dominated convergence theorem show that we may derive [5.8] k times under the integral sign.

We then have the following theorem:

THEOREM 5.6.– T is an optimal unbiased estimator for grad ϕ(θ) and its covariance matrix is (2ϕ/∂θi∂θj).

PROOF.–

1) From Lemma 5.2, we may derive the equality:

images

from which, setting T(x) = (T1(x), …, Tk(x)),

[5.9] images

and

[5.10] images

i) images
ii) images

2) Let U be such that EθU = 0, for all θ, i.e.

images

In images, we have (the proof is analogous in images):

images

and the dominant function is fixed and μ-integrable. We may therefore differentiate under the integral sign:

images

The second integral vanishes, and finally Eθ(TU) = 0; T is optimal.

5.4.2.2. Application to the Gaussian model

PROPOSITION 5.4.– Let (X1, …, Xn) be a sample of size n (≥ 2) of images. Then images is unbiased and of minimum dispersion for (m, σ2).

PROOF.– We use Theorem 5.5.

Let U be such that Eθ(U) = 0, then:

[5.11] images

We then differentiate with respect to m:

images

Taking [5.11] into account, we have:

[5.12] images

that is

images

We differentiate again:

[5.13] images

Taking account of [5.11] and [5.12], this implies:

images

We now differentiate [5.11] with respect to σ2:

images

that is

images

Yet images, therefore

images

REMARK 5.3.–

1) We have proved both of the following additional results: images is optimal and unbiased for m, and (n/n – 1)S2 is optimal and unbiased for σ2.

2) Theorem 5.5 does not directly apply to θ = (m,σ2) (see equation [4.1] in section 4.3).

5.4.2.3. Use of the Lehmann–Scheffé theorem

If T is unbiased and S is a complete and sufficient statistic, Theorem 5.3 affirms that images is unbiased and of minimum dispersion.

EXAMPLE 5.7.– For the model images where P0 contains the discrete distributions, images is optimal and unbiased for the distribution function at the point x, since this estimator is symmetric and the order statistic is then sufficient and complete.

5.4.3. Criticism of unbiased estimators

– Unbiased estimators have, in general, the inconvenience of not being admissible.

EXAMPLE 5.8.–

   1)Let(X1, …, Xn) be a sample of images is then preferable to images.

   2) In the model images is preferable to images.

– Some unbiased estimators are not strict (an estimator T of g(θ) is said to be strict if Pθ(Tg(Θ)) = 1, ∀θ ∈ Θ).

EXAMPLE 5.9.– Given the model [θδ(1) + (1 − θ)δ(−1)]n, θ ∈]1/2, 1[, images is the best unbiased estimator of 2θ − 1, but it is not strict. supimages is strict and preferable to images.

5.5. Efficiency of an estimator

5.5.1. The Fréchet-Darmois-Cramer-Rao inequality

We consider the model images where Θ is an open set in images and we seek to estimate g(θ) where images is differentiable. We suppose that dPθ = Lθ · (with Lθ > 0 on a fixed open set in images) and we denote an unbiased estimator of g by T. Then

images

Supposing that it is possible to differentiate under the integral sign, we obtain:

images

and

images

from which

images

that is

images

From Schwarz’s inequality, we have:

images

from which, finally

[5.14] images

supposing that I(θ) ≠ 0 and finite for all θ. I(θ) is called the Fisher information. We also say that [5.14] is the information inequality.

5.5.1.1. Calculating I(θ)

1) Since

images

we have:

images

that is

[5.15] images

2) If we can differentiate a second time under the integral sign, we have:

[5.16] images

Indeed, from

images

we take, by differentiating

images

hence the result, since

images

3) Case of independent variables with the same distribution:

Since log images, then

images

and

[5.17] images

since Eθ[(/∂θ)log f(Xi, θ)·(/∂θ) log f(Xj, θ)] is zero for ij, as the covariance of two independent random variables.

Under the conditions of (2), we also have:

images

5.5.1.2. Properties of the Fisher information

PROPERTY (α).– Let images, i = 1, …, k, be statistical models conforming to the hypotheses of regularity seen previously. We consider the model product images. Then, in clear notation,

images

(whose proof is analogous to that of equation [5.17]).

PROPERTY (β).–

THEOREM 5.7.– Let S be a statistic with values in images such that the induced model images is dominatedby λm and such that:

[5.18] images

where Λ(s, θ) denotes the density of (PθS−1). Then:

[5.19] images

and the equality holds if S is sufficient.

PROOF.– Indeed,

images

Now, if S is sufficient,

images

from which

images

and from [5.19],

images

We deduce that Iθ, S = Iθ.

COMMENT 5.8.– Condition [5.18] is satisfied in the usual cases, as

images

may be written as:

images

If we can differentiate under the integral sign, we have:

images

that is

images

therefore [5.18] is satisfied.

CONCLUSION.– (α) and (β) are natural properties that we may expect from a quantity of information: I(θ) verifies them under quite general regularity conditions.

5.5.1.3. The case of a biased estimator

Let T be an integrable estimator of g(θ). We set:

images

T is then an unbiased estimator of g(θ) + B(θ). If B is differentiable and if the regularity conditions of the Cramer–Rao inequality are satisfied, then:

images

5.5.2. Efficiency

DEFINITION 5.6.– An unbiased estimator T is said to be efficient if:

[5.20] images

Existence condition: [5.20] holds if and only if:

images

therefore if and only if

images

Integrating, we find:

images

i.e. the model is exponential.

COMMENT 5.9.– In the particular case where A(θ) = θ, we saw that T is the best unbiased estimator of g(θ) = ϕ′(θ) (Theorem 5.5). The property obtained here is more precise: an unbiased estimator may be optimal without being efficient, as we will see later.

5.5.3. Extension to images

THEOREM 5.8.Let images be a statistical model where Θ is an open set in images. Let g be a differentiable map from Θ into images. We make the following assumptions:

1) Pθ = L(·, θ) · μ where L(x, θ) > 0 μ almost everywhere.

2) Lθ is differentiable and

images

3) images, and the covariance matrix of Iθ of Uθ = grad log L(X, θ) (called a Fisher information matrix) is invertible.

4) T is an unbiased estimator of g(θ) such that the equality:

images

is differentiable under the integral sign.

Then, if Dθ (T) is the covariance matrix of T,

images2 images (Cramer –Rao inequality)

where Δθ = ((/∂θj) gi (θ)).

PROOF.– Let us set

images

then, for images,

images

But

images

and

images

and yet, EθUθ = 0, therefore images and

images

So

images

Bringing together the obtained results, we find:

images

COROLLARY 5.1.– Under the previous hypotheses and with g(θ) = θ, we find:

images

PROOF.– In effect, the Cramer–Rao inequality implies here that:

images, 3

from which we find the result by using an orthonormal basis of images.

5.5.3.1. Properties of the information matrix

We have properties analogous to the case of images under regularity hypotheses. The details are left to the reader. We therefore have:

1) images

2) Iθ = I1 (θ) + … + Ih(θ) (matrix sum);

3) Iθ,S ≤ Iθ (i.e. Iθ−Iθ,S positive-definite) and the equality holds for sufficient S.

5.5.3.2. Efficiency

The efficiency is defined as in images: an estimator is efficient if:

images

As in images, we have:

PROPOSITION 5.5.–Under the hypotheses of the previous theorem, and the additional hypothesis k = p and Δθ is invertible, a necessary condition for the existence of an unbiased estimator is that the family (Pθ) be exponential.

PROOF.– Let T be an efficient unbiased estimator. Then Dθ(Z) = 0, therefore Z = 0 Pθ almost everywhere, therefore also μ almost everywhere. Consequently,

images

that is

images

from which, by integrating, we find the result.

The reciprocal: If Pθ = exp [〈θ, T (x)〉 − ϕ(θ)] · μ, θ ∈ Θ is an open set of images, then T is an efficient unbiased estimator of grad ϕ(θ).

PROOF.– Theorem 5.5 implies that T is unbiased and that:

images

but

images

therefore

images

Furthermore, from log L(X, θ) = 〈θ, T〉 − ϕ(θ), we take:

images

from which

images

then

images

and finally

images

COMMENT 5.10.– images is optimal but not efficient for the estimation of (m, σ2) in the Gaussian case.

Indeed,

images

while

images

In particular, images is not efficient for the estimation of σ2. The details of the calculation are left to the reader.

5.5.4. The non-regular case

5.5.4.1. “Superefficient” estimators

In the case of the estimation of a real parameter from a sample, the Cramer–Rao inequality shows that the variance of an unbiased estimator is better by an order of 1/n for n tending to infinity. However, we may sometimes obtain variances of a superior order when the validity conditions of the Cramer–Rao inequality are not met. An estimator with this property is said to be “superefficient”.

EXAMPLE 5.10.– Let X1, …, Xn be a sample of a uniform distribution on [0, θ] images is a superefficient unbiased estimator of θ. Indeed, VarθT = θ2/n(n + 2).

5.5.4.2. Cramer–Rao-type inequalities

The differentiability hypotheses of Theorem 5.8 often being inconvenient in practice, numerous authors have demonstrated Cramer–Rao-type inequalities which do not include these conditions. Their results are often in the following form:

PROPOSITION 5.6.– images being a statistical model and g being a real-valued function of the parameter, we denote by Uθ,θ′ a real random variable, indexed by G ⊂ Θ × Θ and such that:

images

(C3) If S and T are two square-integrable estimators of g(θ) such that m(θ) = Eθ(S) = Eθ(T), θ ∈ Θ, then Eθ [Uθ,θ·T] = Eθ [Uθ,θ′ · S], (θ,θ′) ∈ G. In other words, Covθ [Uθ,θ′ · T] = φm(θ,θ′).

Under these conditions,

images

where Gθ = {θ′: (θ, θ′) ∈G}.

PROOF.– Schwarz’s inequality is written as:

images

which allows us to conclude the proof in a straightforward way, taking account of the hypotheses made on Uθ, θ′.

EXAMPLE 5.11.– If Pθ = fθ · λ, we may take Uθ, θ′ = (fθ′/fθ) − 1 on the condition that we suppose θ images Pθ to be injective, and that Eθ[(fθ′/fθ)2] < +∞ for (θ, θ′) ∈ G.

We thus obtain the Chapman–Robbins inequality [CHA 51].

5.6. The linear regression model

5.6.1. Generalities

Given:

x, an n × p matrix with known elements.

β, an unknown p-dimensional parameter.

Y, an observed random variable with values in images.

E, an unobserved centered random variable with values in images, and with covariance matrix CE = σ2 · In where σ2 ∈]0, ∞[ is unknown and In denotes the identity matrix of images.

The problem: estimate β knowing

[5.21] images

INTERPRETATION 5.1.– Y is a linear function of the observed variable x. It is also influenced by unobserved variables from which there is a perturbation E.

GENERALIZATION.– If the covariance matrix of E is written as σ2V, where V is a known definite-positive matrix, then there is a regular n × n matrix D such that:

images

Let us set

images

Thus

[5.22] images

with

images

and we are returned to the previous model.

The model defined by [5.21] is called the linear regression model.

EXAMPLE 5.12.– Let us consider the testing, on n plots of a field, of different amounts of manure so as to find optimal amount.

If we try the amount zi on the ith plot, we may represent the measurement of the yield in the form Yi = β0 + β1 zi + β2zi2 + Ei.

Which is relation [5.21] with

images

5.6.2. Estimation of the parameter – the Gauss–Markov theorem

Supposing that x is of rank p, then the column vectors υ1, …, υp of x generate Ep, a p-dimensional vector subspace of images. For all β, η = Ep.

Now, to estimate β, we denote the projection of Y onto Ep by images and we set:

images

To determine images it suffices to write:

images

hence

images

and since xx is regular,

images

which also shows the uniqueness of images.

images is called the Gauss–Markov estimator of β.

COMMENT 5.11.– If n = p, we find:

images

otherwise the part (xx)−1 cannot be factorized.

THEOREM 5.9. Gauss–Markov theorem.– Let Y = +E be a full-rank linear model (i.e. x is of rank p). The Gauss–Markov estimator is then the best linear unbiased estimator (BLUE) for β.

PROOF.–

1) images is linear (in Y) and unbiased, as

images

2) Let aY be an unbiased linear estimator of β. We have:

images

This property being true for all β, we have ax = Ip.

Also, the covariance matrix of aY is written:

images

We set:

images

hence

images

since s−1 is symmetric and ax = Ip.

Finally,

images

therefore

images

and images is positive-definite.

ADDITIONALLY.–

1) images is unbiased for σ2 (n > p).

2) If the rank of xp, each images has a unique BLUE images where images is any solution of images.

3) If E is Gaussian, so too are Y and images and images are then optimal in the class of all the estimators of β and σ2.

5.7. Exercises

EXERCISE 5.1.– Let X be a random variable that follows a uniform distribution on (0, θ), θ ∈]0, 1[.

1) What is the maximum-likelihood estimator of θ?

2) Determine an unbiased estimator of θ.

3) What is the Bayesian estimator of θ associated with the a priori distribution dτ = 2θ1]0, 1[(θ)dθ?

4) Compare the quadratic errors of these three estimators for values of θ.

EXERCISE 5.2.– Let X be a real random variable with a uniform distribution on [θ, 2θ], 1/2 < θ < 2. Construct an unbiased estimator of log θ whose variance vanishes when θ = 1.

EXERCISE 5.3.– Let T be an unbiased Bayesian estimator of images.

1) Show that the Bayesian risk of T associated with the quadratic loss function is zero.

2) Deduce that images is not a Bayesian estimator of θ in the model images.

EXERCISE 5.4.– Look for regularity conditions under which

images

EXERCISE 5.5.– Let X1, …, Xn be samples of a Gamma distribution Γ(1/θ, 1), θ > 0.

1) What is the Cramer–Rao limit for the variance of an unbiased estimator of exp(1/θ)?

2) What is the optimal unbiased estimator of exp(1/θ)? Is it efficient?

EXERCISE 5.6.– Let X1, …, Xn be samples taken from the distribution images. We want to estimate m2.

1) Use the relation images to construct an unbiased estimator of m2 based on the second-order empirical moment. Calculate the variance of this estimator.

2) We consider the estimator:

images

Show that it is unbiased and determine its variance. Compare this with the previous estimator. Could we have predicted this result?

EXERCISE 5.7.– Consider the Euclidean space images equipped with the usual scalar product: images. Let F be a vector subspace of images, and images be the orthogonal of F. Write uF or images for the orthogonal projection operator of images in F or images, respectively. A is the matrix of uF in the canonical basis, and I is the identity matrix.

1) Show that At = A and A2 = A. To which linear mapping does the matrix IA correspond?

2) Show that uF and images are simultaneously diagonalizable in an orthonormal basis of images. Determine the matrices for uF and images on this basis. Show that the transformation matrix P obeys Pt = P−1.

3) Let X = (X1, …, Xn) be a Gaussian vector with distribution images.

   i) Calculate Cov((IA)X, AX). What can we say about the variables (IA)X and AX? This result constitutes Cochran’s theorem.

   ii) What are the distributions of the vectors PtAX and Pt(IA)X?

   iii) We recall that the χ2 (n)-distribution is that of the variable images.

   Show that we have 〈AX, AX〉 = 〈Pt AX, PtAX〉. Deduce the distribution of 〈AX, AX〉 as well as that of 〈(IA)X, (IA)X〉.

   iv) We set images and images. Show that images and S are independent and determine their distributions.

EXERCISE 5.8.– Let Xi be the number of defective light bulbs at the end of a production line observed in n times. We wish to estimate the probability of having no defective light bulbs (P(X = 0)).

1) For this, we count the number Nn of Xi, i = 1, …, n, equal to 0 and we estimate P(X = 0) by Nn/n.

   i) Show, assuming the Xi are independent and identically distributed, that Nn/n is an unbiased estimator of P(X = 0). Calculate its quadratic loss, and give its limiting distribution. Give a confidence interval for P(X = 0) with a level of 95%.

   ii) Calculate the previous quantities in the case of a Poisson distribution images.

2) Supposing that images, estimate λ by images, and P(X = 0) by images.

   i) Show that images is biased. Calculate its variance and its bias. Determine asymptotic equivalents to the previous quantities.

   ii) Show that we may choose tn such that images is unbiased. Comment on the result.

3) The aim of this part is to compare the estimators obtained in (1) and (2).

   i) In the case where the Xi do not follow a Poisson distribution, study the convergence of images and images toward P(X = 0). Comment on the result.

   ii) In the case where the Xi follow a Poisson distribution, which estimator(s) do you prefer? Explain why this result is intuitive.

EXERCISE 5.9.– Let X be an observed random variable that follows a Poisson distribution with the parameter λ > 0:

images

1) Construct an empirical unbiased estimator T for e−λ.

2) Calculate the variance of this estimator and compare it to the limit of the Cramer–Rao inequality. Could we have predicted this result?

3) Show that T is the only unbiased estimator of e−λ.

EXERCISE 5.10.– Let X1, …, Xn be a sample of size n of a Poisson distribution with parameter θ ∈]0, +∞ [. We seek to estimate various functions of θ.

1) Show that images is a sufficient and complete statistic. Deduce an optimal unbiased estimator for θ.

2) To estimate θ, we choose an a priori distribution Γ(α, β) with density:

images

where α > 0, β > 0, and images.

Determine the Bayesian estimator of θ for this a priori distribution.

3) Compare the quadratic error of the previous estimators for α = β = 1 and θ = 1/2 (true value of the parameter).

4) We now wish to estimate θk where k is an integer >1.

   i) Express θ2, then θk, k > 2, as a function of the moments of X1. Deduce optimal unbiased estimators Uk for the θk.

   ii) Determine the maximum-likelihood estimator Vk of θk.

   iii) Calculate images. Deduce the quadratic error of U2 and V2. Show that U2 is efficient and that V2 is asymptotically efficient.

EXERCISE 5.11.– Let (X1, …, Xn) be an n-sample of the density distribution:

images

1) Determine A as a function of θ.

2) What is the maximum-likelihood estimator? Is it sufficient? Is it unbiased?

3) Calculate the Fisher information. Calculate the variance of the maximum-likelihood estimator. Compare them.

1 For a detailed study of the Laplace transform, we refer to [BAS 78].

2 This inequality means that images is a semi-positive-definite matrix. M′ denotes the transpose of the matrix M.

3 images is the covariance matrix of images as

images

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset