Appendix A

Projections in Hilbert spaces

A.1 Introduction

In many situations we want information about variables that are not directly measured, assuming that we have information about some variables which are correlated with the unmeasured variable. If this (cross)correlation is known or estimated, then it can be used for estimating the value of the unmeasured variables.

Consider for instance the interest rates. Short term interest rates are quoted on a daily basis in the money markets for maturities up to, say, one year; but longer term interest rates are traded only indirectly through the bond markets. Theoretically, options dependent on interest rates are priced according to a stochastic process describing the evolution in continuous time of the short term interest rate even though this process is not directly observable. The observed (or measured) variables, when modelling interest rate processes, are the bond prices.

The Wiener filter (see Madsen [2007]) is an example where the known cross-correlation is used together with the projection theorem to estimate an unmeasured time series based on a measured time series. The Kalman filter, which will be introduced in this appendix, is some sort of online version of the Wiener filter.

The main goal of this appendix is to present the projection theorem, and to illustrate the wide range of applications of this theorem. Finally the theorem is used to formulate the (ordinary) Kalman filter. The contents of this appendix is basically based on Madsen [1992] and Brockwell and Davis [1991], and more information about the theory and applications of the projection theorem can be found in those references.

One of the advantages by considering the projection theorem as it is formulated in this appendix is that many of the well-known concepts from two- and three-dimensional Euclidean geometry, such as orthogonality, carry over to the more general Hilbert spaces considered in the following.

By using the projection theorem it can be realized that a unified set of equations can be used in different contexts. Hence it can be shown that many of the methods used in time series analysis, such as prediction, filtering and estimation, are seen in a unified context.

A.2 Hilbert spaces

A Hilbert space is simply an inner-product space, i.e. a vector space supplied with an inner product, with an additional property of completeness. The inner product is a natural generalization of the inner (or scalar) product of two vectors in n-dimensional Euclidean space. Since many of the properties of Euclidean space carry over to the inner-product spaces, it will be helpful to keep Euclidean space in mind in all that follows.

Let us first consider a well-known inner-product space, namely the Euclidean space.

Example A.1 (Euclidean space).

The set of all column vectors

x= ( x 1 ,..., x k ) T k (A.1)

is a real inner-product space if we define

x,y= i=1 k x i y i . (A.2)

It is a simple matter to check that the conditions above are all satisfied.

Definition A.1 (Norm).

Let ||x|| > 0, if x ≠ 0, then the norm of an element x of an inner-product space is defined to be

||x||= x,x .(A.3)

In the Euclidean space ℝk the norm of the vector is simply its length.

Definition A.2 (The angle between elements).

The angle θ between two nonzero elements x and y belonging to any real inner-product space is defined as

θ=arccos[x,y/(||x|| y||)].(A.4)

In particular x and y are said to be orthogonal if and only ifx, y〉 = 0.

Now let us define the Hilbert space:

Definition A.3 (Hilbert space).

A Hilbert space ℋ is vector space, equipped with an inner product, in which every Cauchy sequence xn converges in norm to some element in x ∊ ℋ. The inner-product space is then said to be complete.

Example A.2 (Euclidean space).

The completeness of the inner-product spacek can be verified. Thusk is a Hilbert space.

Example A.3 (The space L2(Ω, , ℙ)).

Consider a probability space (Ω, , ℙ) and the collection C of all random variables X defined on Ω and satisfying the condition E[X2] ≤ ∞. It is rather easy to show that C is a vector space.

For any two elements X,YC we now define the inner product

x,y=E[XY].(A.5)

Norm convergence of a sequence Xn of elements of L2 to the limit X means

|| X n X||2 =E[| X n X | 2 ]0as n.(A.6)

Norm convergence of Xn to X in an L2 space is called mean-square convergence and is written as X n m.s. X .

To complete the proof that L2 is a Hilbert space we need to establish completeness, i.e. that if ||XnX ||2 → 0 as m, n → ∞, then there exists XL2 such that XnX (see Brockwell and Davis [1991]).

A.3 The projection theorem

Let us start by considering two simple applications which illustrate the projection theorem in the two types of Hilbert spaces.

Example A.4 (Linear approximation in ℝ3).

Suppose three vectors are given in3.

y = (1/4,1/4,1) T , (A.7) x 1 = (1,0,1/4) T , (A.8) x 2 = (0,1,1/4) T . (A.9)

Our problem is to find the linear combination y ^ = α 1 x 1 + α 2 x 2 which is closest to y in the sense that S = ||yα1x2α2x2||2 is minimized.

One approach to this problem is to write S in the form S = (1/4 − α1)2 + (1/4 − α2)2 + (1 − 1/4α1 − 1/4α2)2 and then to use calculus to minimize with respect to α1 and α2. In the alternative geometric approach we observe that the required vector y ^ = α 1 x 1 + α 2 x 2 is the vector in the plane determined by x1 and x2 such that yα1x1α2x2 is orthogonal to the plane of x1 and x2 (see the figure). The orthogonality condition may be stated as

y α 1 x 1 α 2 x 2 , x i =0i=1,2(A.10)

or equivalently

α 1 x 1 , x 1 + α 2 x 2 , x 1 = y,x (A.11) α 1 x 1 , x 2 + α 2 x 2 , x 2 = y, x 2 . (A.12)

By solving these two equations for the particular values of x1, x2 and y specified, it is seen that α1 = α2 = 4/9, and y ^ =(4/9,4/9,2/9 ) .

Example A.5 (Linear approximation in L2(Ω, ,P)).

Now suppose that X1, X2 and Y are random variables in L2(Ω, , P). Only X1 and X2 are observed and we wish to estimate the value of Y by using the linear combination Y ^ = α 1 X 1 + α 2 X 2 which minimizes the mean square error,

S=E|Y α 1 X 1 α 2 X 2 | 2 =||Y α 1 X 1 α 2 X 2 || 2 .(A.13)

As in the previous example there are at least two possible approaches to the problem. The first is to write

S=E[ Y 2 ]+ α 1 2 E[ X 1 2 ]+ α 2 2 E[ X 2 2 ]2 α 1 E[Y X 1 ]2 α 2 E [Y X 2 ]+ α 1 α 2 E [ X 1 X 2 ](A.14)

and then to minimize with respect to α1 and α2.

The second method is to use the same geometric approach as in the previous example. Our aim is to find an element in Y ^ in the set

={X L 2 (Ω,,P):X= α 1 X 1 + α 2 X 2 ( α 1 , α 2 )}(A.15)

which implies that the mean square error ||Y Y ^ | | 2 is as small as possible. By analogy with the previous example we might expect Y ^ to have the property that Y Y ^ is orthogonal to all elements of ℳ. Applying it to our present problem, we can write

Y α 1 X 1 α 2 X 2 ,X=0for all X(A.16)

or, equivalently, by the linearity of the inner product,

Y α 1 X 1 α 2 X 2 , X i =0,i=1,2.(A.17)

These are the same equations for α1 and α2 as in the previous example, although the inner product is of course defined differently in (A.17). In terms of expectations we can rewrite (A.17) in the form

α 1 E[ X 1 2 ]+ α 2 E[ X 2 X 1 ] = E[Y X 1 ] (A.18) α 1 E[ X 1 X 2 ]+ α 2 E[ X 2 2 ] = E[Y X 2 ] (A.19)

from which α1 and α2 are easily found.

Before establishing the projection theorem for a general Hilbert space we need to introduce some new terminology.

Definition A.4 (Closed subspace).

A linear subspace ℳ of a Hilbert space ℋ is said to be a closed subspace of ℋ if ℳ contains all of its limit points (i.e. if xn ∊ M and ||xnx|| → 0 imply that x).

Definition A.5 (Orthogonal complement).

The orthogonal complement of a subset ℳ of ℋ is defined to be the set ℳ of all elements of ℋ which are orthogonal to every element of ℳ. Thus

x M if and only ifx,y=0(written xy)(A.20)

for all yM.

Theorem A.1.

If ℳ is any subset of a Hilbert space ℋ then ℳ is a closed subspace of ℋ.

Proof. Omitted.

Theorem A.2 (The projection theorem).

If ℳ is a closed subspace of the Hilbert space ℋ and xℋ, then

1. there is a unique element x ^ such that

||x x ^ ||= inf y ||xy||(A.21)

and

2. x ^ and ||x x ^ ||= inf yM ||xy|| if and only if x ^ and (x x ^ ) . The element x ^ is called the (orthogonal) projection of x onto ℳ.

Proof. Omitted – see Brockwell and Davis [1991].

Theorem A.3 (The projection mapping of onto ).

If ℳ is a closed subspace of the Hilbert space ℋ and I is the identity mapping on ℋ, then there is a unique mapping P of ℋ onto M such that I − P maps ℋ onto ℳ. P is called the projection mapping of ℋ onto ℳ.

Proof. By the projection theorem, for each x there is a unique x ^ such that x x ^ . The required mapping is therefore

P x= x ^ x.(A.22)

Theorem A.4 (Properties of projection mappings).

Let ℋ be a Hilbert space and let P denote the projection mapping onto a closed subspace ℳ. Then

  1. P (αx + βy) = α Px + β Py.
  2. ||x||2 = ||Px||2 + ||(I − P)x||2.
  3. each xℋ has a unique representation as a sum of an element of ℳ and an element of ℳ i.e.

    x= P x+(I P )x.(A.23)

  4. Pxn → Px if ||xnx|| → 0.
  5. x ∊ ℳ if and only if Px = x.
  6. x if and only if Px = 0.
  7. 12 if and only if P1 P2 x = P1 x for all x.

Proof. Omitted – but rather obvious from a geometrical point of view.

A.3.1 Prediction equations

In the following a set of equations, the so-called prediction equations will be derived. The equations describe how to find the projection that gives the minimum mean square error (Minimum MSE).

Given a Hilbert space , a closed subspace and an element x, then the projection theorem shows that the element of closest to x is the unique element x ^ such that

x x ^ ,y=0for all y.(A.24)

Compare the general equation above with the special cases in the examples prior to projection theorem.

The quantity, x ^ = P x , is frequently called the best predictor of x in the subspace ℳ.

Remark A.1.

It is helpful to visualize the projection theorem in terms of Figure A.1, which depicts the special case in which ℋ = ℝ3, and ℳ is the plane containing x1 and x2, and y ^ = P y . The prediction equation (A.24) is simply the statement that y y ^ must be orthogonal to ℳ. The projection theorem tells us that y ^ = P y is uniquely determined by this condition for any Hilbert space ℋ and closed subspace ℳ.

Figure A.1

Figure showing projection in ℝ3.

Projection in ℝ3.

The projection theorem and the prediction equations play fundamental roles in time series analysis, especially for estimation, approximation, filtering and prediction. Examples will be given.

Example A.6 (Minimum MSE linear prediction).

Let {Xt, t = 0, ±1,...} be a stationary process on (Ω, ℱ, P) with mean zero and autocovariance function γ(·), and consider the problem of finding the best linear combination

X ^ n+1 = j=1 n ϕ nj X n+1j   (A.25)

which best approximates Xn+1 in the sense that E[ ( X n+1 Σ j=1 n ϕ nj X n+1j ) 2 ] is minimum. This problem is easily solved with the aid of the projection theorem by taking ℋ = L2(Ω, , ℙ) and ={ Σ j=1 n α j X n+1j : α 1 ,..., α n } . Since minimization of E[| X n+1 X ^ n+1 | 2 ] is identical to minimization of the squared norm || X n+1 X ^ n+1 | | 2 , we see at once that X ^ n+1 = P X n+1 . The prediction equations are

X n+1 j=1 n ϕ nj X n+1j ,Y =0  for all Y  (A.26)

which, by the linearity of the inner product, are equal to the n equations

X n+1 j=1 n ϕ nj X n+1j , X k =0  k=n,n1,...,1.  (A.27)

Recalling the definition 〈X, Y〉 = E[XY] of the inner product in L2(Ω, , ℙ), we see that the prediction equations can be written in the form

Γ n ϕ n = γ n (A.28)

where ϕn = (ϕn1,..., ϕnn)′, γn = (γ(1),..., γ(n))′ and Γ n = [γ(ij)] i,j=1 n . The projection theorem guarantees that there is at least one solution ϕn to the problem. If Γn is singular then there are infinitely many solutions, but the projection theorem guarantees that every solution will give the same (uniquely defined) predictor.

A.4 Conditional expectation and linear projections

It is well known that the conditional expectation plays a central role in time series analysis, as the optimal prediction (under some mild assumptions) is found using the conditional expectation.

Consider the random variables Y and X from L2.

Definition A.6 (The conditional expectation).

The conditional expectation of X given Y = y is

E[X|Y=y]= x f X|Y=y (x)dx (A.29)

where fX|Y = y(x) is the conditional density function for X given Y = y.

Remember that E[X|Y = y] is a number, whereas E[X|Y] is a stochastic variable.

It can be shown that the operator E[X|Y] on L2 has all the properties of a projection operator, in particular

E[cX+dZ|Y] = cE[X|Y]+dE[Z|Y], E[1|Y] = 1.

Theorem A.5 (Best mean square predictor).

The conditional expectation E[X|Y] is the best mean square predictor of X in ℳY, i.e. the best function of Y for predicting X.

Proof. Follows from the projection theorem.

However, the determination of projections on Y is usually very difficult. On the other hand it is relatively easy instead to calculate the projection of X on span{1, Y} ⊆ Y, i.e. the linear projection

E[X|Y]=a+bY(A.30)

which gives a subset of the best function of Y (in the mean square sense) for predicting X.

The linear projection (A.30) is a projection of X onto a subspace of Y. Therefore it can never have a smaller mean square error than E[X|Y]. However it is of great importance for the following reasons:

  • The linear projection (A.30) is easier to calculate.
  • It depends only on the first and second order moments, E[Y], E[X], E[Y2], E[X2] and E[XY], of the joint distribution of (Y,X).
  • If (Y,X)′ has a multivariate normal distribution then the conditional expectation is linear, i.e.

    span{1,Y}= Y .(A.31)

Let us now consider two multivariate stochastic variables X and Y and the corresponding second order representation (first and second order moments for (X,Y)′)

μ Y , μ x , Σ xx , Σ XY , Σ yy .(A.32)

Theorem A.6 (Linear projection in L2).

Given the second order representation for (X,Y)′ the linear projection is given by

E[X|Y]= μ x + Σ XY Σ YY 1 (Y μ y )(A.33)

and the variance is

Var[XE[X|Y]]= Σ XX Σ XY Σ YY 1 Σ YX .(A.34)

Furthermore

Cov[XE[X|Y],Y]=0,(A.35)

i.e. the error XE[X|Y] is uncorrelated with Y.

Proof. From the prediction equations:

XE[X|Y],Y = 0, XE[X|Y],1 = 0,

or

a+bY,Y = 0, abY,1 = 0.

Using the fact that in the multivariate case the inner product in L2 is 〈X,Y〉 = E[XYT] we get

aE [Y] T +bE[Y Y T ] = E[X Y T ], a+bE[Y] = E[X].

By solving these equations and using the fact that ΣXY = E[XY′] − E[X]E[Y]′ we obtain

b = Σ XY Σ YY 1 , (A.36) a = μ X Σ XY Σ YY 1 μ y . (A.37)

Hence the linear projection is

E[X|Y]= μ X Σ XY Σ YY 1 (Y μ Y ).(A.38)

The variance follows immediately

Var[XE[X|Y]] = Var[XabY] = Σ XX +b Σ YY b T b Σ YX Σ XY b T = Σ XX Σ XY Σ YY 1 Σ YX .

The orthogonality between the error XE[X|Y] and Y follows directly from the projection theorem.

Theorem A.7.

If (X, Y)T has a normal distribution then X|Y is normal distributed with mean

E[X|Y]= μ X Σ XY Σ YY 1 (Y μ y )(A.39)

and variance

Var[X|Y]=Var[XE[X|Y]]= Σ XX Σ XY Σ YY 1 Σ YX .(A.40)

The error XE[X|Y] and Y are stochastic independent.

Proof. Omitted.

Let us illustrate the importance of the equations above by a couple of examples.

Example A.7 (Regression).

Let us consider the regression in L2 of Y on X

E[Y|X]=Xθ(A.41)

and assume that E[X] = E[Y] = 0.

Note that — compared to the discussion above — we have interchanged X and Y. And in order to compare the results directly with the ordinary LS estimator for the general linear model in n we have also interchanged X and θ.

The best estimator is found by the prediction equations

YE[Y|X], X=0(A.42)

or

YXθ,X=0.(A.43)

Then we get

Σ YX θ T Σ XX =0(A.44)

or

θ ^ = Σ XX 1 Σ XY .(A.45)

Compare this result with the well-known LS estimator in ℛn.

Next an example where the formulation of the linear projection above is used directly. As this example is very important it is embedded in a section.

A.5 Kalman filter

As mentioned in the introduction, the Kalman filter can be used for estimating some variables, which are not directly measured, by using some measured variables, which are correlated with the unmeasured variables. In the case of the Kalman filter the correlation between the unmeasured variables X and the measured variables Y is described by a linear state space model.

Consider the linear stochastic state space model

X t = A t X t1 + B t u t1 + e 1,t , (A.46) Yt = C t X t + e 2,t , (A.47)

where Xt is a m-dimensional state vector, ut is the input vector and Yt is the measured output vector. The matrices At, Bt and Ct are known and have appropriate dimensions.

The two white noise sequences {e1,t} and {e2,t} are mutually uncorrelated with variance Σ1,t and Σ2,t, respectively.

The matrices At, Bt, Ct, Σ1,t and Σ2,t might be time varying, as indicated by the notation. However, in the rest of this example we skip the index t although all the given results are valid in the time varying case.

Let us consider the problem of estimating Xt+k given the observations {Ys; s = t, t − 1,...} and input {us, s = t − 1,...}. In the case k = 0 the problem is called reconstruction or filtering. The solution to this problem is given by the linear projection theorem.

It is clear that the linear projection theorem also is valid for the conditioned stochastic variable (YX)′|Z. If the stochastic variables have a normal distribution we get

E[X|Y,Z] = E[X|Z]+Cov[X,Y|Z]Va r 1 [Y|Z](YE[Y|Z]), (A.48) Var[X|Y,Z] = Var[X|Z]Cov[X,Y|Z]Va r 1 [Y|Z] C T [X,Y|Z]. (A.49)

Let us now introduce

? t =( Y 1 ,..., Y t ),(A.50)

which is a vector of all observations until time t. The input is assumed to be known.

Further introduce

X ^ t+k|t = E[ X t+k | ? t ], (A.51) Y ^ t+k|t = E[ Y t+k | ? t ], (A.52)

and the variances

Σ t+k|t xx = Var[ X t+k | ? t ], (A.53) Σ t+k|t yy = Var[ Y t+k | ? t ], (A.54) Σ t+k|t xy = Cov [ X t+k , Y t+k | ? t ], (A.55)

then we have the Kalman filter

Theorem A.8 (Kalman filter — Optimal reconstruction).

The reconstruction X ^ t|t which has the smallest mean square error is given by

X ^ t|t = X ^ t|t1 + Σ t|t1 xy ( Σ t|t1 yy ) 1 ( Y t Y ^ t|t1 )(A.56)

and the variance of the reconstruction error is

Σ t|t xx = Σ t|t1 xx Σ t|t1 xy ( Σ t|t1 yy ) 1 ( Σ t|t1 xy ) T .(A.57)

Further the construction error and the observations are orthogonal, i.e.

Cov[ X x+k E[ X t+k | ? t ], ? t ]=0.(A.58)

Proof. Let X = Xt, Y = Yt and Z = ?t−1 and use the linear projection theorem. See e.g., Madsen [2007] for details.

Together with equations for making one-step predictions in the state space model the above equations give the Kalman filter. It is readily seen that the prediction equations are

X ^ t+1|t = A X ^ t|t +B u t , (A.59) Σ ^ t+1|t xx = A Σ t|t xx A T + Σ 1 , (A.60) Σ ^ t+1|t yy = C Σ t+1|t xx C T + Σ 2 (A.61)

with initial values

X ^ 1|0 = E[ X 1 ]= μ 0 , (A.62) Σ 1|0 xx = Var[ X 1 ]= V 0 . (A.63)

We now leave the projections in L2 and continue by considering projections in ℝn.

A.6 Projections in ℝn

Previously we showed that ℝn is a Hilbert space with the inner product

x,y= x T y.(A.64)

In many statistical applications it is convenient to consider the weighted inner product

x,y Σ 1 = x T Σ 1 y(A.65)

where Σ is a positive definite symmetric matrix.

For both definitions of the inner product we have the norm

||x||= x,x .(A.66)

Consider a closed subspace of the Hilbert space ℝn. The following theorem enables us to compute Px directly from any specified set of vectors {x1,...,xm} (m < n) spanning .

Theorem A.9.

If xi ∊ ℝn, i = 1,..., m, and ℳ = span{x1,..., xm} then

p x=Xβ(A.67)

where X is the n × m matrix whose jth column is xj and

X T Xβ= X T x.(A.68)

Equation (A.68) has at least one solution for β but the prediction X β is the same for all solutions. There is exactly one solution of (A.68) if and only if X′X is non-singular and in this case

P x=X ( X T X) 1 X T x.(A.69)

Proof. Since Px, we can write

P x= i=1 m β i X i =Xβ. (A.70)

The prediction equations (A.24) are equivalent in this case to

Xβ, x j =x, x j ,j=1,...,m(A.71)

or in matrix form

X T Xβ= X T x.(A.72)

The existence of at least one solution for β is guaranteed by the existence of the projection Px. The fact that is the same for all solutions is guaranteed by the uniqueness of Px — see the projection theorem.

Remark A.2.

If {x1,..., xm} is a linearly independent set then there must be a unique vector β such that Px = Xβ. This means that (A.68) must have a unique solution, which in turn implies that X′X is non-singular and

P x=X ( X T X) 1 X T xfor all x n .(A.73)

The matrix X(XTX)−1XT must be the same for all linearly independent sets {x1,..., xm} spanning ℳ since P is uniquely defined.

Remark A.3.

Given a real n × n matrix M, how can we tell whether or not there is a subspace ℳ ofn such that Mx = Px for all x ∊ ℝn? If there is such a subspace we say that M is a projection matrix. Such matrices are characterized by the next theorem.

Theorem A.10.

The n × n matrix M is a projection matrix if and only if

  • (a) MT = M, and
  • (b) M2 = M, i.e. the matrix M is idempotent.

Proof. Omitted — but it is easily verified that (a) and (b) are satisfied for the matrix X(XTX)−1XT.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset