Chapter 9

Introduction to Statistics for Stochastic Processes

9.1. Modeling a family of observations

Let (xt, tS) be a family of observations of a phenomenon which may be physical, economic, biological, etc. To model the mechanism that generates the xt, we may suppose them to be realizations of random variables (Xt, tS) that are, in general, correlated. The overall phenomenon is described by (Xt, tT) where t is generally interpreted as a time: (Xt, tT) is said to be a stochastic process or a random function.

If T is denumerable, it concerns a discrete-time process, and if T is an interval in images, it concerns a continuous-time process. If the set S of observation times is random, we say that we observe a point process (this notion will be elaborated subsequently).

EXAMPLE 9.1.–

Discrete-time processes:

   1) The daily electricity consumption of Paris.

   2) The monthly number of vehicle registrations in France.

   3) The annual production of gasoline.

   4) The evolution of a population: growth, the extinction of surnames, the propagation of epidemics.

   5) The evolution of sunspots over the past two centuries.

   6) The series of outcomes for a sportsman.

Continuous-time processes:

   1) The trajectory of a particle immersed in a fluid, where it is subjected to successive collisions with the molecules of the fluid.

   2) The reading from an electrocardiogram.

   3) The variation in concentration of a chemical solution during a reaction.

   4) The evolution of stock prices during a session.

   5) The number of calls which reach a telephone exchange in an interval of time [0, t], t ≥ 0.

Point processes:

   1) The sequence of instants where telephone calls reach an exchange.

   2) The arrival times of customers at service window.

   3) A sequence of disasters (earthquakes, car accidents, etc.).

   4) Spatial distributions of plants or animals.

   5) The position of vehicles at a given instant on a portion of road.

9.2. Processes

Let images be a probability space and images be a measurable space (images and images are σ-algebras, and P is a probability on images). Moreover, let (Xt, tT) be a family of random variables defined on images and with values in images.

We say that (Xt, tT) or (Xt) is a stochastic process with basis space images and with state space images; T is called the time set.

For fixed ω in Ω, t images Xt(ω) is the trajectory of the point ω. For fixed t in T, ω images Xt(ω) is the state of a process at the moment t.

9.2.1. The distribution of a process

Let us consider the mapping

images

where the σ-algebra ζ is generated by the mappings ∏t, tT, with

images

The relation Xt = ∏t images X implies that:

images

Since the images, generate images, we conclude that X is images-measurable.

The distribution PX of X defined by:

images

is called the distribution of the process (Xt).

The process (∏t, tT) defined on (ET, ζ, PX) is called the canonical process of (Xt), and it has the same distribution as (Xt).

The distributions of the random variables images, k ≥ 1, t1, …, tkT, are called the finite-dimensional distributions of (Xt). If images equipped with its Borelian σ-algebra, images, then it may be shown that the finite-dimensional distributions determine PX (this is a consequence of the Kolmogorov existence theorem). The random vectors of the form images are called the margins of (Xt).

9.2.2. Gaussian processes

Recall: A real random variable is said to be Gaussian if it may be written in the form aX0 + b, where X0 follows the distribution with density (2π)−1/2 exp(−x2/2), and where a and b are constant (a = 0 is not excluded). A random variable with values in images is Gaussian if every linear combination of its components is a real, Gaussian, random variable.

A process (Xt) is said to be Gaussian if its margins are Gaussian.

The functions t images EXT and (s, t) images Cov(Xs, Xt), called the mean and the covariance of (Xt), respectively, completely determine the distribution of the Gaussian process (Xt), as they determine its finite-dimensional distributions.

9.2.3. Stationary processes

A process (Xt, tT) is said to be strictly stationary if:

images

A real, square-integrable process is said to be (weakly) stationary if its mean is constant and its covariance satisfies:

images

A real, square-integrable, strictly stationary process is weakly stationary. The converse is not necessarily true: for example, real, centered, independent random variables X1, X2,… with the same variance but different distributions, form a weakly stationary but not strictly stationary process. However, the converse is true if the process is Gaussian.

EXAMPLE 9.2.–

1) A strong white noise is a sequence images of real, independent, centered, random variables with the same distribution, which are such that:

images

If we replace “independent”, and “with the same distribution” by “orthogonal” (i.e. Eεnεm = 0, nm), then we obtain a weak white noise.

A strong white noise is strictly stationary, whereas a weak white noise is weakly stationary.

2) A linear process is defined by the relation:

[9.1] images

where (εn) is a white noise and the aj are constant and such that images

In the following, we will adopt the more restrictive condition ∑j |aj| < ∞ (see Chapter 10).

The series appearing in [9.1] converges in quadratic mean.

(Xn) is strictly or weakly stationary according to whether (εn) is strong or weak.

9.2.4. Markov processes

A real process (Xt, tT) is Markovian if, for every s, tT such that s < t, the conditional distribution of Xt given {Xu, us} is the same as the conditional distribution given Xs.

For example, if images, we have:

images

Many of the processes shown in the following are Markovian: strictly stationary first-order autoregressive processes, Poisson processes, Wiener processes, and diffusion processes.

9.3. Statistics for stochastic processes

The study of a process that models observed variables may be represented by dividing it into four steps.

1) Empirical analysis of the observations:

Let us make, for example, the hypothesis that the series is generated by a process of the form:

[9.2] images

where g is a deterministic function that represents the tendency, s is a periodic, deterministic function called the seasonality, and (Yt) is a centered, stationary, stochastic process.

The first step then consists of estimating or eliminating the tendency and the seasonality in such a way as to only keep the data of the stationary part (Yt).

For this, we may suppose that g and s have a particular form. For example, if images and if S = {1, 2, …, n}, we may set:

images

where images

Thus, τ is the period of s and

images

The decomposition [9.2] is unique if the functions 1, t,…, tp, δ1t,…, δτt are linearly independent. This is not the case, since

images

We then introduce the additional condition:

images

which means that the seasonal effects compensate for each other over one period.

This allows us to construct estimators for a0,…, ap; c1,…, cτ using the method of least squares, i.e. by minimizing

images

under the constraint

images

2) Choice of a stationary model for (Yt):

After eliminating g and s, we may suppose the (modified) observations to be realizations of Y1,…, Yn.

Theoretical considerations often allow us to choose a stationary model which is well suited to (Yt). The linear process defined in section 9.2 is one possible choice.

In certain cases, we simply suppose that (Yt) has stationary increments, i.e. the distribution of Yt+hYs+h (s, t, s + h, t + hT) does not depend on h. The Poisson process, the Wiener process, and the ARIMA process, which we will study later, are very important examples of stationary increment processes.

3) Statistical inference:

To completely identify the chosen process, we estimate the unknown parameters from the observed variables. Some tests allow us to verify that the identified model is well suited to the observations.

4) Use of the identified model:

The identified model may be used to solve problems of control, detection, interpolation or prediction of the future values of (Xt).

9.4. Exercises

EXERCISE 9.1.– Consider the Buys Ballot model:

images

where images is a sequence of i.i.d. real variables images, δ1t = 1 if t is odd and 0 otherwise, and δ2t = 1 − δ1t.

1) Show that the model is not identifiable, i.e. there exist several values of the parameters giving the same function of t for EXt. Show that it is identifiable if one imposes c1 + c2 = 0. We will impose this condition in the following.

2) Supposing that we use T = 2N observations, corresponding to two half-years for N years, give the value of the least squares estimators obtained by minimizing:

images

Show that the estimators are unbiased.

images

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset