Chapter 8

Non-Parametric Methods and Robustness

8.1. Generalities

A statistical model images is said to be non-parametric if Θ is “vast”. When Θ is a vector space or a convex set, “vast” generally means “of infinite dimension”, otherwise the distinction between parametric and non-parametric models is not so clear.

EXAMPLE 8.1.–

1) A Gaussian or exponential model is parametric.

2) Let P0 be the set of probabilities on images dominated by the Lebesgue measure λ. The model images is non-parametric (we may set θ = dP/dλ, and Θ is a convex set of infinite dimension in L1(λ)).

3) Let P1 be the set of probabilities on images which have a unique median. The model images is non-parametric.

Non-parametric methods are interesting for three principal reasons:

1) They avoid errors due to the choice of a specific but often erroneous parametric model.

2) They guide the user in the choice of a parametric model.

3) In certain cases, they provide initial estimators for the parameters of a parametric model from which we may construct more precise estimators by successive approximations.

The theory of robustness is the study of decision rules whose efficiency is well resistant to small deformations of a statistical model. There are therefore analogies between non-parametric and robust methods.

8.2. Non-parametric estimation

8.2.1. Empirical estimators

The empirical measure images, based on the sample (X1,…, Xn), allows us to construct numerous non-parametric estimators which have good asymptotic properties. We refer to Chapter 4 (section 4.1) for details.

8.2.2. Distribution and density estimation

When the model is dominated by the Lebesgue measure (see Example 8.1(2)), images is not a strict estimator of the distribution. We are therefore led to regularize it by “distributing” the masses 1/n situated at the points Xi; a general method consists of regularizing images by convolution: we are given a bounded probability density K such that images, and a positive sequence (hn) which tends to 0, and we set:

images

Then:

images

therefore, images has the density:

images

where images is an estimator of f = dP/dλ whose convergence will be studied.

EXAMPLE 8.2.–

1) If images, we obtain the natural estimator:

images

where Fn denotes the empirical distribution function.

2) If images, the obtained estimator is a mix of Gaussian densities:

images

8.2.2.1. Convergence of the estimator

The following results are due to Parzen [PAR 62].

LEMMA 8.1.– Let H be a real, bounded, λ-integrable function such that:

images

We set:

images

where g is λ-integrable and (hn) → 0+. Then, at every point in x where g is continuous:

images

PROOF.– Since ∫ H(y)dy = ∫ 1/hnH(y/hn)dy, we have:

images

Then, for all δ > 0

images

We deduce that, for all ε > 0, |Δn| < ε for well-chosen δ and for large enough n.

THEOREM 8.1.– If images in quadratic mean.

PROOF.– K and K2 verify the conditions on the function H from Lemma 8.1. Consequently:

images

and

images

From this, we deduce:

images

therefore, images and

images

REMARK 8.1.–

1) It may be shown that images.

2) Under stronger conditions, the chief among which is the existence of f(r), we have:

images

8.2.3. Regression estimation

(Xi, Yi), 1 ≤ in, denoting a two-dimensional sample of (X, Y) such that images is defined, we seek to estimate a specified version y = r(x) of this regression of Y on X.

Considerations analogous to those in the previous section lead us to construct a non-parametric estimator:

images

Under regularity conditions, it may be shown that images in quadratic mean when nhn → +∞. The use of images is of interest each time that r is not an affine function.

Application to prediction: If we observe X1,…, Xn+1 and Y1,…, Yn, then images is a predictor (or a prediction) of Yn+1.

EXAMPLE 8.3.–

1) Xj is the mean air pressure on Day j, and Yj is the amount of rainfall on Day images is a predictor of the rainfall on Day n + 2.

2) X1,…, Xn+1 are the levels of cholesterol observed in the blood of n + 1 patients, and Y1,…, Yn are the levels of calcium observed in the blood of the first n patients: images is a prediction of the calcium level for the (n + 1)th patient.

8.3. Non-parametric tests

8.3.1. The χ2 test

Let μ be a probability on images; we seek to test H0 = {μ} against an alternative that will be specified later.

For this, we take {A1,…, Ak}, a measurable partition of E such that pj = μ(Aj) >0, j = 1,…, k.

The construction of this test introduces the kernel of the space generated by images. The following lemma gives the definition and the properties of this kernel:

LEMMA 8.2.– Let images be a measure space (where m is σ-finite), and e(g1,…, gk) be the vector space generated by the functions gj that belong to L2(m). Finally, let h1,…, hk (k′ ≤ k) be an orthonormal basis of e(g1,…, gk). The function K, defined by:

images

is independent of the chosen basis. K is called the kernel of e(g1,…, gk).

PROOF.– Let ge(g1,…, gk). We have:

images

Now let K′ be a second kernel associated with any orthonormal basis. As K(x, ·) and K′(z, ·) are in ge(g1,…, gk), we have:

images

and since K and K′ are symmetric, we have K′ = K.

Then let h1,…, hk bean orthonormal basis of images with h1 ≡ 1, and K be the kernel of this space. We set:

images

where X1,…, Xn is a sample of the distribution μ; then EμTn = n.

Elsewhere, let us consider the k-dimensional random vector:

images

From the central limit theorem in images, we have:

images

where Y = (0, ξ2,…, ξk) with ξ2,…, ξk independent and with distribution images.

Since convergence in distribution is conserved by continuous transformations, we have:

images

therefore, images converges in distribution to a χ2 with k − 1 degrees of freedom.

However

images

therefore

images

Yet, following from the lemma,

images

hence

images

and finally

images

We may construct a test based on images, with critical region images. This is the χ2 test.

Since images, the alternative, H1, will be the set of distributions such that P(X1Aj) ≠ pj for at least one value of j.

If c is such that P(χ2(k − 1) > c) = α, we deduce that the obtained test is consistent and of asymptotic size α since images when ν ∈ H1.

REMARK 8.2.– The previous test only allows us to verify that P(X1Aj) = pj for all j. To have a more precise test, we must vary k as a function of n.

REMARK 8.3.– In practice, the problem is posed in a more complicated manner: the pj are replaced by pj(θ) where θ is a parameter with values in images. The test statistic is then of the form:

images

where k > d − 1, and images is the maximum likelihood estimator of θ. Then, under regularity conditions, it may be shown that images converges in distribution to a χ2 with k − 1 − d degrees of freedom.

8.3.2. The Kolmogorov–Smirnov test

Recall: If F0 is a continuous distribution function and if:

images

where Fn denotes the empirical distribution function associated with a sample of size n, then:

images

where the distribution function of K is images.

We thus have a test with critical region images: the Kolmogorov–Smirnov test.

If wn = w with P(K > w) = α, we have a test of asymptotic size α for testing F = F0 against FF0. For FF0, we have images and consequently images: the test is consistent.

COMMENT 8.1.– This test uses more information than the χ2 test: it is often more precise.

8.3.3. The Cramer–von Mises test

The Cramer–von Mises test uses the statistic:

images

It may be shown that images where the distribution function of C is that of an “infinite χ2” distribution, which is written as images, where the images are independent χ2 distributions with one degree of freedom.

From this, we have the convergent test of asymptotic size α and critical region Δn > c, with P(C > c) = α.

This test is more robust than the Kolmogorov–Smirnov test: it is more resistant to deformation of a statistical model.

8.3.4. Rank test

Tests based on the “ranks” of the observations are easy to put into practice and possess good asymptotic properties. Here we give some information about the Wilcoxon test.

Let X1,…, Xn and Y1,…, Ym be two independent samples of real random variables with respective densities f and g. We wish to test images against images, and for this, we set:

images

We have E(U) = nmP(X1Y1) and, if images is true, then:

images

Furthermore, a simple calculation shows that:

images

from which we have the Wilcoxon test, with critical region,

images

It may be established that U is asymptotically Gaussian and that this test is consistent for g such that:

images

8.4. Robustness

The study of robust methods is quite delicate. We will simply give two examples and indicate the general definition1.

8.4.1. An example of a robust test

We wish to test θ = 0 against θ > 0 in the model images.

If P0 is the set of images, Student’s t-test is uniformly most powerful (UMP) without bias, and its critical region is of the form images.

Now, if P0 is the set of symmetric distributions with densities, we may use the Wilcoxon test for one sample; this is the test with critical region:

images

where Ri is the rank of |Xi| among the |Xj| (in other words, Ri = ri if ri is the number of |Xj|’s less than |Xi|).

To determine the asymptotic relative efficiency eV/T of the two tests of size α ∈ ]0, 1[, we denote by βn the power of T at the point θ for a sample of size n, and by νn the size2 of the sample for which V is of power βn at the point θ.

Then:

images

where σ2 is the variance of P and f is its density.

It may be shown that eV/T varies from 0.864 (for P well chosen and with compact support) to +∞ (for σ2 = +∞ or f2 non-integrable), passing by the value 0.955 for images: V is much more resistant than T to deformations of the model, i.e. it is a more robust test than T.

8.4.2. An example of a robust estimator

Given the contamination model

images

we determine the asymptotic efficiency3 of an estimator T = (Tn) of θ by the formula:

images

where In is the Fisher information on θ and En is the quadratic error of the estimator Tn.

For images, the following table is obtained (independent of θ):

images

We see that images decreases rapidly when the contamination of images increases.

Now, with [a] denoting the whole part of the number a and X(1),…, X(n) an ordered sample, we set:

images

where α is given in ]0, 1/2[. This estimator of θ is called the α-truncated mean (it is obtained by eliminating the smallest [] and the largest [] observations).

For α = 0.03, we obtain the following table:

images

where images is more robust than images.

8.4.3. A general definition of a robust estimator

A general definition of robustness was proposed by Hampel [HAM 71].

Let (Tn) be an estimator of θ associated with the asymptotic model images. We say that it is robust in P0,θ if:

images

where ρ is the Prokhorov distance, defined by:

images

with images.

Note that the Prokhorov distance shows the deformation of the statistical model due to rounding and gross errors.

8.5. Exercises

EXERCISE 8.1.– For every distribution function F, we define the generalized inverse:

images

with the convention inf images.

Let (Xi)i≥1 be an independent and identically distributed sequence. We write Fn for the empirical distribution function. With regard to the consistency and the normality of the empirical quantiles images, we have:

1) If #{x: F(x) = u} ≤ 1, then images converges almost surely to F−1(u).

2) Let 0 < u1 < … < uk < 1. We suppose that the function F is differentiable at the points F−1(u1),…, F−1(uk), with a strictly positive derivative at these points. Then:

images

where the matrix C is defined by Ci,j = min(ui, uj)-uiuj/F′(F−1(ui))F′(F−1(uj)).

Let (Zi)i≥1 be an independent and identically distributed sequence. We suppose that Z1 has a known, symmetric, and strictly positive density f. We observe Xi = λZi + θ for i = 1,…, n, λ being strictly positive. We write FZ and FX for the respective distribution functions of Z1 and X1.

1) Show that images.

2) Let u ∈]0, 1/2[. Express θ and λ as functions of images, and images.

3) Give two strongly consistent estimators, images and images, of θ and λ, based on the empirical quantiles images and images of the observations (Xi)1≤in.

4) We further suppose that the density f is continuous. Determine the asymptotic behavior of:

images

EXERCISE 8.2.– Let X be a random variable with distribution function F. Let (X1,…, Xn) be an i.i.d. sample of Fθ(x) = F(xθ). We are interested in the estimation of θ when F is the distribution function of a symmetric random variable of variance σ2.

1) i) Show that EθX1 = θ and that θ = argmintEθ(X1t)2.

   ii) Show that images satisfies:

images

2) i) Show that if F is continuous and strictly increasing, then images and θ = argmintEθ|X1t|.

Hint: Use the equation:

images

and Fubini’s theorem to conveniently rewrite Eθ|Xt|.

ii) Calculate images. What happens if n is odd?

3) Supposing F is the distribution function of the normal distribution images, compare the variances of the limits in distribution of the estimators of θ constructed from the mean and the empirical median.

4) As above when F has the density 1/2e−|x|.

EXERCISE 8.3.– A statistician observes n independent and identically distributed random variables with distribution images, and wishes to estimate θ > 0. He proposes the following three estimators:

images

where images is the (generalized) inverse of the empirical distribution function.

1) Explain the ideas leading to the proposition of each estimator.

2) Give the limit in distribution of images, where the sequences ai,n are chosen such that we obtain non-degenerate limits in distribution.

3) Which estimator do you prefer?

EXERCISE 8.4.– Let images be a sequence of independent and identically distributed random variables.

1) What is the distribution of images From this, deduce that of images, the empirical distribution function for fixed x. Show that, for all x, limn→∞ Fn(x) = F(x) a.s. where F is the distribution function of X1.

2) Let us suppose F to be continuous. Let ε > 0 and such that N = 1/ε is an integer.

   i) Show that there exists a sequence z0 = −∞ < z1 < … < zN−1 < zN = +∞) (depending on ε) such that F(zk) = k/N, k = 0,…, N.

   ii) Show that, for every element of [zk, zk+1], Fn(x) − F(x) ≤ Fn(zk+1) − F(zk+1) + ε and Fn(x) − F(x) ≥ Fn(zk) − F(zk) − ε.

   iii) Deduce that images.

EXERCISE 8.5.– Consider n real i.i.d. random variables X1,…, Xn, following the Cauchy distribution with density:

images

1) Take the empirical mean images as an estimator of m.

   i) What is the distribution of images?

   ii) Study the convergence of images in quadratic mean, probability, and distribution.

   iii) What do you think of this estimator?

2) We now arrange the data in increasing value and we write X(1) < X(2) < … < X(n) for the obtained values. To estimate m, we set:

images

   i) Show that images. where Fn is the empirical distribution function.

   ii) Show that, for the considered distribution, P(X1 < m) = 1/2.

   iii) Using the previous exercise, show that images. Comment on this result.

Hint: You may show that images. where F is the distribution function of X1.

EXERCISE 8.6. (Non-parametric regression estimation).–Let (Xn, Yn), n ≥ 1, be a sequence of independent random variables, with values in images, of the same distribution, with the continuous, strictly positive density f(x, y). We suppose that Yn is integrable and we wish to estimate the regression images, that is the function:

images

where images.

r may then be written in the form:

images

To estimate it from a sample of size n, we set:

images

where K is a continuous, symmetric, bounded, and strictly positive density and where hn → 0 and nhn → ∞ when n → ∞.

1) We suppose that images. Show that:

images

2) Use the results obtained in the estimation of the density to deduce that:

images

3) Establish the decomposition:

images

where x is omitted.

4) We suppose that φn is bounded. Show that:

images

EXERCISE 8.7.– In this exercise, a lower bound for the Fisher information associated with the translation model X = μ + ε is sought, where the unknown parameter is images. We suppose that E(ε) = 0, E(ε2) = σ2 (known), and that ε has a density f, which is assumed to be strictly positive and continuously differentiable on images.

1) Recall the expression of the Fisher information I associated with μ, which we will assume to be finite in the following.

2) Using a very simple unbiased estimator while we make use of only one observation, show that I ≥ 1/σ2. Deduce that if we use n independent observations with the same distribution as X, and if there exists an unbiased estimator that attains the Cramer–Rao bound, images, then images.

3) In this question, we wish to determine the densities f which reach the lower bound for I.

   i) Show that ∫ xf′(xμ)dx = −1 and that:

images

   Deduce that E(εf′(ε)/f(ε)) = −1.

   ii) Using the conditions for equality in the Cauchy–Schwarz inequality, show that I = 1/σ2 if and only if ε is of distribution images.

   iii) Supposing that we make use of n i.i.d. observations with the same distribution as X, and that there exists an unbiased estimator that attains the Fisher limit, images, show that if ε is not of distribution images, then this estimator has a quadratic loss which is strictly less than that of the empirical mean images.

EXERCISE 8.8.–Let Xi, i = 1,…, n (n ≥ 2), be i.i.d. with a distribution of density:

images

1) Give the joint density fn(x1,…, xn; θ) of the observations. From this, deduce the maximum likelihood estimator of θ, giving its distribution. Construct, using this estimator, an unbiased estimator, images, and calculate its variance.

2) Compare this with the results obtained in the previous exercise.

EXERCISE 8.9.– A type of mouse is afflicted by an illness M with a rate of 20%. We wish to know if the absorption of a certain product increases this rate. Of 100 mice having absorbed the product, 27 are afflicted by M.

1) Carry out a classical test of size α = 0.05.

2) Carry out a χ2 goodness-of-fit test with size α = 0.05.

3) Compare the obtained results.

EXERCISE 8.10. (Estimation by explosion).– Let (Xn, n ≥ 1) and (Yn, n ≥ 1) be two sequences of real random variables defined on the probability space images such that:

images

where g is an unknown, defined function, which is continuous on images.

Furthermore, let K be a real, continuous, strictly positive density, which verifies lim|x|→∞ x2K(x) = 0. We set:

images

where hn > 0 verifies limn→∞ hn = 0.

We wish to estimate g from the observations (Xi, Yi), 1 ≤ in, using “the explosion” of fn.

1) Establish the following preliminary results:

   i) K is bounded.

   ii) There exist some α and β > 0 such that K(u) ≥ β for |u| ≤ α.

   iii) If yg(x) and if ε ∈ ]0, 1/2|yg(x)|[, there exists h > 0 such that:

images

2) Show that, for fixed x,

images

when n → ∞. You may use (i) and (iii).

3) Supposing that g is Lipschitzian, of order k at the point x (i.e. that |g(x′) − g(x)| ≤ k|x′ − x|, images), establish the lower bound:

images

4) We further suppose that the real random variables Xn are i.i.d. and of continuous and strictly positive density f. Show that:

images

and that, if images,

images

5) An estimator gn of g is defined by setting:

images

Show that gn (x) is images-measurable, and that

images

on the condition that images.


1 For a complete exposition of robustness, the reader may consult [HUB 09].

2 ±1.

3 If it exists.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset