Chapter 15

Prediction

15.1. Generalities

Let images be a real, square-integrableprocess with basis space images, images be the sub-σ-algebra generated by Xnj, j ≥ 0, and images be the closed linear subspace of images generated by the same variables and the constant 1.

We wish to predict Xn+h from the observed variables X1,…,Xn. The strictly positive integer h is called the horizon of the prediction.

With respect to the quadratic error, the best predictor given Xnj, j ≥ 0, is the conditional expectation

images

This is the orthogonal projection in images of Xn+h onto images.

The best linear predictor of Xn+h is its orthogonal projection onto images. If (Xt) is Gaussian, it coincides with images.

A statistical predictor is a known function of the data:

images

The prediction error is written as:

images

as images since images

The error images being structural, the statistician must seek to minimize the “statistical error” images

The linear prediction error is similarly written as the sum of a statistical error and a structural error.

One may, generally speaking, distinguish between two classes of prediction methods: empirical method and those based on the introduction of a model. This distinction is, in fact, imprecise, as empirical methods often contain an underlying model for which they are optimal.

15.2. Empirical methods of prediction

15.2.1. The empirical mean

This is the predictor:

images

It has good properties for a model of the form:

images

where m images and (ετ) is a (strong) white noise.

Then images, and the prediction error is written as:

images

Note that this predictor is unbiased, i.e. images. When εt is Gaussian, we know that images is the best unbiased statistical predictor for Xn+h. images may be calculated recursively using the formula:

images

15.2.2. Exponential smoothing

This method, which is widely used in practice, consists of assigning weights to the observations that tend to 0 at an exponential rate:

images

where 0 < q < 1 and c is a normalization constant.

Usually, we choose c = 1 − q and 0.7 ≤ q ≤ 0.95.

One may empirically determine q by comparing predictions with observations. Set

images

where images denotes the predictor associated with q and with the data X1,…,Xt−1, and n0 is chosen to be large enough (i.e. n0 = [n/2]), and choose the q that minimizes Δ(q).

We will see in section 15.3, that exponential smoothing is optimal for a very particular underlying model.

The predictor is calculated recursively using the relation:

images

15.2.3. Naive predictors

These are defined by:

images

These are good predictors when the observed phenomenon varies a little, or rarely. Thus, in meteorology, the prediction “the weather tomorrow will be the same as today” is correct 75% of the time.

In fact, Xn is the best predictor if and only if:

images

that is, if and only if (Xt, t ≥ 1) is a martingale.

Notably, this is the case for the random walk model Xn = ε1 + … + εn, n ≥ 1.

15.2.4. Trend adjustment

This consists of adjusting the trend of the observed process to be a function of the form images, where the fj are given and the αj are to be estimated.

The chosen, linearly independent fj may be power functions, exponentials or logarithms, periodic functions, etc.

One underlying model could be of the form:

[15.1] images

The αj can be estimated using the least-squares method, by minimizing

[15.2] images

Equalities [15.1] for t = 1,…, n are written in matrix form:

images

where X is the observation vector, ε is the noise vector with n components, α is the vector of parameters to be estimated, and Z is an n × k matrix where the element in the tth line and the jth column is fj(t). Z is assumed tobe of rank k.

With this notation, the minimization of [15.2] gives the unique solution:

images

where Z′ is the transpose of Z.

It may be shown that images is the unbiased linear estimator1 of minimal variance (see the Gauss–Markov theorem, see Theorem 5.9).

Thus, the predictor obtained at the horizon h is:

images

This is the unbiased linear predictor with minimal variance.

15.3. Prediction in the ARIMA model

We first suppose that the observed process is an ARMA(p,q), and we seek a linear prediction of Xn+h.

Recall that an ARMA process has the two following representations (see Chapter 14, equations [14.6] and [14.8]):

images

and

images

Since (εt) is the innovation of the process, we deduce the best linear predictor with horizon 1:

[15.3] images

and

[15.4] images

Relation [15.3] is not exploitable in practice, as the εn+1−j are not observed. If (Xt) is an AR(p) process, we may use [15.4], replacing the non-zero πj with the conditional maximum likelihood estimator (section 14.5.2), whence the predictor

images

Note that the predictor is no longer linear, since the images are functions of X1,…,Xn.

In other cases, the situation is more complicated since a direct estimation of the πjis not used. We may, however, obtain such an estimator by considering relation [14.8] from Chapter 6:

images

where P and images are replaced by their estimators images and images.

For prediction in the ARIMA model, we refer to [BOS 87a], [BRO 91] and [GOU83].

We only examine one particular case: consider an IMA(1,1) process defined by:

images

where 0 < θ < 1.

A recursive calculation then shows that

images

and the best predictor is obtained by exponential smoothing with q = θ.

15.4. Prediction in continuous time

As an example, let us examine the case of an Ornstein–Uhlenbeck process (see Example 13.1, part 3):

images

with θ > 0.

Then

images

Indeed, eθhXt is a square-integrable function of (Xs, st), and for st:

images

as Wiener processes have independent increments, images is σ(Wt+kWt,0 ≤ kh)-measurable and Xs is σ(WυWu, −∞ < uυ)-measurable.

Therefore, eθh Xt is the best linear predictor of XT+h given (Xs, st), but since Ornstein–Uhlenbeck processes are Gaussian, it is also the best nonlinear predictor.

A straightforward calculation shows that the prediction error is written as:

images

For small h, it is therefore of order h, and as h becomes infinite, it tends to

images

Estimating θ, the statistical predictor images is obtained.

15.5. Exercises

EXERCISE 15.1.– Let images be a strong white noise and Y be a real random variable such that υ2 = EY2 ∈]0, +∞[ and EY = 0. We assume that Y and the processes (εt) are independent, and we set

[15.5] images

1) Show that (Xt) is a strictly stationary process. Calculate its autocovariance. Does it have a spectral density?

2) Show that images converges in mean square when n tends to infinity and determine its limit. Do the same for the sequences

images

3) Show that images, where images is the closed vector space generated by (Xu, uts).

4) Show that [15.5] is the Wold decomposition of (Xt).

5) We now seek to predict Xn+1, the measure of error being the quadratic error.

   i) What is the best linear prediction images of Xn+1 based on (Xt, tn)?

   ii) What is the best linear prediction images of Xn+1 based on (Xt, 1 ≤ tn)?

   iii) Calculate the prediction errors associated with images, and images. Study their asymptotic behavior.

   iv) A statistician observes X1,…, Xn. Which predictor of Xn+1 might he/she choose?

6) Setting images, study the asymptotic behavior of images. Construct an estimator and a confidence interval for images.

EXERCISE 15.2.– Let images be a measurable process with values in images. We will suppose that the density fs, t of (Xs, Xt) exists for every pair (s, t) such that st and that the density f of Xt does not depend on t. We set:

images

We denote by images the usual norm on Lp(λ), where λ is the Lebesgue measure on images, and we make the hypothesis:

images

where p ∈ [1,+∞[.

Now, to estimate f from (Xt, 0 ≤ tT), we construct the estimator with the kernel:

images

where K and hT > 0 are chosen by the statistician. In particular, we will choose K such that images with (1/p) + (1/q) = 1.

1) Show that the variance of fT(x) satisfies:

images

specifying the constant Cp.

2) We now suppose that f is of the class C2, and that it and its (partial) derivatives are bounded. Evaluate the asymptotic bias of fT(x) when T → +∞.

3) How is hT chosen to optimize the asymptotic quadratic error of fT(x)?

4) Comment on the results obtained when p = 2 or p = ∞.

5) In the following, X is a one-dimensional, stationary, Gaussian process. Study the condition images relative to X using the autocorrelation ρ(u) = Corr(X0, Xu), images.

6) Use fT to construct estimators of E(X0) and V(X0). Show that these estimators are convergent. Study their asymptotic quadratic errors. Comment on the results.

7) We wish to predict XT+h (H > 0) from (Xt,0 ≤ tT). For this, we construct a kernel regression estimator. Study its asymptotic quadratic error, making some convenient hypotheses similar to images.

Deduce a predictor images studies the behavior of the statistical error of the prediction images when T → +∞ (H fixed).

8) We now suppose that X is an Ornstein–Uhlenbeck process with parameter θ > 0. Determine E(XT+h|XT). Compare images with the parametric predictor obtained by replacing θ by the maximum likelihood estimator θ*in E(XT+h|XT).

EXERCISE 15.3. Information inequality for prediction.– Reconsider the information inequality (Chapter 5), where g(θ) is replaced by g(X, θ) and the estimator is replaced by an unbiased predictor p(X), i.e. a predictor such that

images

Show, giving the necessary regularity conditions, that we obtain:

images

EXERCISE 15.4.– Given a Poisson process (Nt, t ≥ 0) with intensity θ, we observe X = NT and we wish to predict NT+h (h > 0), which is equivalent to the prediction of Eθ(NT+h|NT).

1) Determine Eθ(NT+h|NT).

2) Show that p(NT) = ((T+h)/T)NT is an unbiased predictor (see Exercise 15.3).

3) Show that p(NT) is efficient (i.e. it reaches the bound obtained in Exercise 15.3).


1 With respect to the data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset