Let (X, Y) be a pair of random variables defined on the probability space in which only X is observed. We wish to know what information X carries about Y: this is the filtering problem defined in Chapter 1.
This problem may be formalized in the following way: supposing Y to be real and square integrable, construct a real random variable of the form r(X) that gives the best possible approximation of Y with respect to the quadratic error, i.e. E[(Y − r(X))2] being minimal.
If we identify the random variables with their P-equivalence classes, we deduce that r(X) exists and is unique, since it is the orthogonal projection (in the Hilbert space L2 (P)) of Y on the closed vector subspace L2 (P) constituted by the real random variables of the form h(X) and such that E[(h(X))2] < +∞.
From Doob’s lemma, the real random variables of the form h(X) are those that are measurable with respect to the σ-algebra generated by X. We say that r(X) is the conditional expectation of Y with respect to the σ-algebra generated by X (or with respect to X), and that r is the regression of Y on X. We write:
The above equation leads us to the following definition.
DEFINITION 3.1.– Let be a probability space and let be a sub-σ-algebra of . We call the orthogonal projection of L2 onto L2 the conditional expectation with respect to , denoted by or E ( ·|).
CHARACTERIZATION: Following from the definition of an orthogonal projection, is characterized by:
1)
2)
We may replace (2) by
2′)
which is easily seen using the linearity and the monotone continuity of the integral.
1) is a contracting and idempotent linear map of L2 onto L2 . Moreover, it is positive and it conserves constants.
The first three properties (contraction (i.e. ), idempotence (i.e. ), and linearity) are characteristics of orthogonal projections.
Its positivity (i.e. ) is established by noting that, for Y ≥ 0,
which implies that a.s.
Finally, it is clear that .
COMMENT 3.1.– We may show that the above five properties characterize the operators of L2 () that are conditional expectations.
2) -measurable and bounded.
In effect, , and
therefore is indeed the orthogonal projection of UY onto L2 .
3) . The linearity and positivity of affirms that lim exists. Yet
and since |Yn| ≤ |Y1| + |Y| and , by twice applying the dominated convergence theorem, we obtain:
Since lim is in L2 , we have lim .
4) If Y−1 and are independent, . In effect:
5) If and are two sub-σ-algebras such that , then .
This is a known property of orthogonal projections.
6) Extension: We will now define when Y is only positive or integrable.
For Y positive, we note that there exists a sequence (Yn) of positive bounded (and therefore square integrable) real random variables such that Yn ↑ Y. We then set . It is straightforward to see that is unique, and that it is characterized by:
(2 bis) may be replaced by:
Among the properties of , we may cite the following:
For positive Y, and positive and -measurable U, we have:
Now, for Y ∈ L1 , we note that and are integrable, and we set:
Again, we have uniqueness, and the characterizations (1)–(2) and (1)–(2′ bis), where it is necessary to replace L2 and L2 with L1 and L1 , respectively. Furthermore, properties (1)–(5) are still valid, with slight modifications. In particular, we have the following important property:
The proofs in this section are left to the reader, as are the extension and the properties of for random variables with values in .
DEFINITION 3.2.– Let and be sub-σ-algebras of is called the conditional probability of A with respect to and is written as or P . The mapping is called the conditional probability with respect to and is written as or .
CHARACTERIZATION: Following from the above definition, is characterized by its -measurability and the formula:
We say that a map (·|·) from in is a version of if for all .
Furthermore, given a sub-σ-algebra of , if (·|ω) is a probability on , then for almost all ω ∈ Ω, we say that (·|·) is a regular version of the conditional probability with respect to on . Such a version does not always exist.
If is regular on , we may write:
By linearity and monotone continuity, it follows that, for positive or integrable and -measurable Y:
Let (X, Y) be a pair of random variables with values in . A regular version of on will be, for all A fixed in , a function of X, which we will write N(A, X)1. The image of N(·,x) by Y is then called the conditional distribution of Y knowing that X = x and is written as or PY (·|X = x). The mapping is then written as or PY(·|X) and is called the conditional distribution of Y with respect to X; it is defined by the formula:
Now, if Y is a positive or integrable real random variable, the transfer theorem states that:
THEOREM 3.1.–Let φ be a -measurable and positive or P(X, Y) -integrable real function defined on . Then
where the function is defined PX a.s.
This theorem is proved in the following way: the definition of a conditional distribution shows that it is true for . We deduce from this that the theorem is true for and we conclude the demonstration as in the usual Fubini theorem. Details are left to the reader.
1) If X is a discrete random variable with values in , we may set, for example,
It is clear that we thus obtain a regular version of the conditional probability with respect to on .
For positive or integrable Y, we have:
2) Let (X, Y) be a pair of random variables with values in and density f(x, y) with respect to the Lebesgue measure dxdy on . The density of X is then fX (x) = ∫ f(x, y) dy, and we may set:
The function f(·|x) is a density on called the density of Y knowing that X = x, and this is the density of the conditional distribution of Y knowing that X = x.
We therefore have, for positive or integrable Y:
EXAMPLE 3.1.– Let (X, Y) be a non-degenerate two-dimensional Gaussian variable. The conditional distribution of Y knowing that X = x is a Gaussian distribution with expectation and standard deviation . Consequently, the regression of Y on X is an affine function:
EXERCISE 3.1.– Give a proof of the properties indicated in section 3.2. We may, in particular, define for Y with values in by setting:
EXERCISE 3.2. (martingale). –Let be a probability space and be a sequence of sub-σ-algebras of , increasing for inclusion. We consider a sequence (Xn, n ≥ 1) of integrable and -adapted (i.e. each Xn is measurable) real random variables. We say that (Xn) is a martingale if:
1) Show that (Xn) is a martingale if and only if there exists an integrable and -adapted sequence (Yn) such that:
and
2) Show that, if the Xn are square integrable, the Yn are orthogonal two-by-two.
3) Let X be an integrable, real random variable. Show that is a martingale.
4) Let (ξn, n ≥ 1) be a sequence of zero-mean, integrable, and independent real random variables with the same distribution. We set Xn = ξ1 + … + ξn, n ≥ 1, and we denote the σ-algebra generated by ξ1,…, ξn, n ≥ 1 by . Show that (Xn) is a -adapted martingale.
EXERCISE 3.3.– Let X, Y, Z be random variables taking values in countable sets. The probabilities (conditional or otherwise) of the events below are all assumed to be strictly positive. We make the following assumption:
Show that:
that is X and Z are independent, given Y.
EXERCISE 3.4. (Markov chain).– Let (Xn, n ≥ 1) be a sequence of random variables taking values in a countable set D. We say that this is a Markov chain if:
x1, …, xn ∈ D, n ≥ 1. Use Exercise 3.3 to show that X1, …, Xn−1 and Xn+1, …, Xn+k are independent given that Xn = xn.
EXERCISE 3.5. (Markov process).– Let (Xt, t ∈ T) be a family of real random variables defined on the probability space , where . We set τn = {t1, …, tn} (t1 < … < tn) and we denote the σ-algebra generated by and the σ-algebra generated by Xt. Show that the following conditions are equivalent:
1) For all t1 < t2 < … < tn < tn+1 (tj ∈ T), n ≥ 1,
2) For all t ∈ T, if we denote , the σ-algebra generated by Xs, s ≤ t, s ∈ T, and , the σ-algebra generated by Xs′, s′ ≥ t, s′ ∈ T, then we have:
1 With the above notation, we have .