Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Let (X, Y) be a pair of random variables defined on the probability space in which only X is observed. We wish to know what information X carries about Y: this is the filtering problem defined in Chapter 1.

This problem may be formalized in the following way: supposing Y to be real and square integrable, construct a real random variable of the form r(X) that gives the best possible approximation of Y with respect to the quadratic error, i.e. E[(Y − r(X))2] being minimal.

If we identify the random variables with their P-equivalence classes, we deduce that r(X) exists and is unique, since it is the orthogonal projection (in the Hilbert space L2 (P)) of Y on the closed vector subspace L2 (P) constituted by the real random variables of the form h(X) and such that E[(h(X))2] < +∞.

From Doob’s lemma, the real random variables of the form h(X) are those that are measurable with respect to the σ-algebra generated by X. We say that r(X) is the conditional expectation of Y with respect to the σ-algebra generated by X (or with respect to X), and that r is the regression of Y on X. We write:

The above equation leads us to the following definition.

DEFINITION 3.1.– Let be a probability space and let be a sub-σ-algebra of . We call the orthogonal projection of L2 onto L2 the conditional expectation with respect to , denoted by or E ( ·|).

CHARACTERIZATION: Following from the definition of an orthogonal projection, is characterized by:

We may replace (2) by

2′)

which is easily seen using the linearity and the monotone continuity of the integral.

3.2. Properties and extension

1) is a contracting and idempotent linear map of L2 onto L2 . Moreover, it is positive and it conserves constants.

The first three properties (contraction (i.e. ), idempotence (i.e. ), and linearity) are characteristics of orthogonal projections.

Its positivity (i.e. ) is established by noting that, for Y ≥ 0,

which implies that a.s.

Finally, it is clear that .

COMMENT 3.1.– We may show that the above five properties characterize the operators of L2 () that are conditional expectations.

2) -measurable and bounded.

In effect, , and

therefore is indeed the orthogonal projection of UY onto L2 .

3) . The linearity and positivity of affirms that lim exists. Yet

and since |Yn| ≤ |Y1| + |Y| and , by twice applying the dominated convergence theorem, we obtain:

Since lim is in L2 , we have lim .

4) If Y−1 and are independent, . In effect:

5) If and are two sub-σ-algebras such that , then .

This is a known property of orthogonal projections.

6) Extension: We will now define when Y is only positive or integrable.

For Y positive, we note that there exists a sequence (Yn) of positive bounded (and therefore square integrable) real random variables such that Yn ↑ Y. We then set . It is straightforward to see that is unique, and that it is characterized by:

(2 bis) may be replaced by:

Among the properties of , we may cite the following:

For positive Y, and positive and -measurable U, we have:

Now, for Y ∈ L1 , we note that and are integrable, and we set:

Again, we have uniqueness, and the characterizations (1)–(2) and (1)–(2′ bis), where it is necessary to replace L2 and L2 with L1 and L1 , respectively. Furthermore, properties (1)–(5) are still valid, with slight modifications. In particular, we have the following important property:

The proofs in this section are left to the reader, as are the extension and the properties of for random variables with values in .

3.3. Conditional probabilities and conditional distributions

DEFINITION 3.2.– Let and be sub-σ-algebras of is called the conditional probability of A with respect to and is written as or P . The mapping is called the conditional probability with respect to and is written as or .

CHARACTERIZATION: Following from the above definition, is characterized by its -measurability and the formula:

3.3.1. Regular version of the conditional probability

We say that a map (·|·) from in is a version of if for all .

Furthermore, given a sub-σ-algebra of , if (·|ω) is a probability on , then for almost all ω ∈ Ω, we say that (·|·) is a regular version of the conditional probability with respect to on . Such a version does not always exist.

If is regular on , we may write:

By linearity and monotone continuity, it follows that, for positive or integrable and -measurable Y:

3.3.2. Conditional distributions

Let (X, Y) be a pair of random variables with values in . A regular version of on will be, for all A fixed in , a function of X, which we will write N(A, X)1. The image of N(·,x) by Y is then called the conditional distribution of Y knowing that X = x and is written as or PY (·|X = x). The mapping is then written as or PY(·|X) and is called the conditional distribution of Y with respect to X; it is defined by the formula:

Now, if Y is a positive or integrable real random variable, the transfer theorem states that:

3.3.3. Theorem for integration with respect to the conditional distribution

THEOREM 3.1.–Let φ be a -measurable and positive or P(X, Y) -integrable real function defined on . Then

where the function is defined PX a.s.

This theorem is proved in the following way: the definition of a conditional distribution shows that it is true for . We deduce from this that the theorem is true for and we conclude the demonstration as in the usual Fubini theorem. Details are left to the reader.

3.3.4. Determination of the conditional distributions in the usual cases

1) If X is a discrete random variable with values in , we may set, for example,

It is clear that we thus obtain a regular version of the conditional probability with respect to on .

For positive or integrable Y, we have:

2) Let (X, Y) be a pair of random variables with values in and density f(x, y) with respect to the Lebesgue measure dxdy on . The density of X is then fX (x) = ∫ f(x, y) dy, and we may set:

The function f(·|x) is a density on called the density of Y knowing that X = x, and this is the density of the conditional distribution of Y knowing that X = x.

We therefore have, for positive or integrable Y:

EXAMPLE 3.1.– Let (X, Y) be a non-degenerate two-dimensional Gaussian variable. The conditional distribution of Y knowing that X = x is a Gaussian distribution with expectation and standard deviation . Consequently, the regression of Y on X is an affine function:

3.4. Exercises

EXERCISE 3.1.– Give a proof of the properties indicated in section 3.2. We may, in particular, define for Y with values in by setting:

EXERCISE 3.2. (martingale). –Let be a probability space and be a sequence of sub-σ-algebras of , increasing for inclusion. We consider a sequence (Xn, n ≥ 1) of integrable and -adapted (i.e. each Xn is measurable) real random variables. We say that (Xn) is a martingale if:

1) Show that (Xn) is a martingale if and only if there exists an integrable and -adapted sequence (Yn) such that:

and

2) Show that, if the Xn are square integrable, the Yn are orthogonal two-by-two.

3) Let X be an integrable, real random variable. Show that is a martingale.

4) Let (ξn, n ≥ 1) be a sequence of zero-mean, integrable, and independent real random variables with the same distribution. We set Xn = ξ1 + … + ξn, n ≥ 1, and we denote the σ-algebra generated by ξ1,…, ξn, n ≥ 1 by . Show that (Xn) is a -adapted martingale.

EXERCISE 3.3.– Let X, Y, Z be random variables taking values in countable sets. The probabilities (conditional or otherwise) of the events below are all assumed to be strictly positive. We make the following assumption:

Show that:

that is X and Z are independent, given Y.

EXERCISE 3.4. (Markov chain).– Let (Xn, n ≥ 1) be a sequence of random variables taking values in a countable set D. We say that this is a Markov chain if:

x1, …, xn ∈ D, n ≥ 1. Use Exercise 3.3 to show that X1, …, Xn−1 and Xn+1, …, Xn+k are independent given that Xn = xn.

EXERCISE 3.5. (Markov process).– Let (Xt, t ∈ T) be a family of real random variables defined on the probability space , where . We set τn = {t1, …, tn} (t1 < … < tn) and we denote the σ-algebra generated by and the σ-algebra generated by Xt. Show that the following conditions are equivalent: