Chapter 8

Stochastic differential equations

Having established stochastic calculus in the Itō sense in the last chapter, we are now prepared to consider stochastic differential equations. For ease of notation, we shall in general only state the important results for univariate SDEs, but a few results will be generalized to multivariate SDEs.

We repeat that the notion of stochastic differential equations (SDEs) is merely a shorthand notation for stochastic integral equations. The latter may be defined in several ways, but we restrict our discussion to stochastic integrals in the Itō sense. Unfortunately, this implies that the well-known chain rule for variable transformations must be replaced by the so-called Itō formula, which will be introduced in the multivariate case. This formula may be used to obtain closed form solutions of some SDEs. Besides, it just makes Itō stochastic calculus more tedious.

In the following exposition to stochastic differential equations, we shall only use the Wiener process as the driving noise process. We recall that the Wiener process is both a Markov process and a martingale, and that the mean of the stochastic integral (in the Itō sense) of any square integrable, adapted process with respect to a Wiener process, is zero.

Stochastic differential equations driven by, e.g., a Poisson process (or jump processes, counting processes or marked point processes) are gaining ground in the financial literature (Cont and Tankov [2004] for a gentle overview). However, a considerable extension of the measure-theoretical concepts of adaptedness and predictability is required, which is beyond the scope of this book. It is duly noted that the topics covered in this chapter may be generalized to cover the very general class of square integrable processes (see e.g. Björk [2009], Karatzas and Shreve [1996], Ikeda and Watanabe [1989] for details).

The remainder of this chapter is organized as follows: Section 8.1 introduces stochastic differential equations. Section 8.2 considers analytical solution methods. Section 8.3 considers a link between parabolic partial differential equations (PDEs) and SDEs, which we shall use later in order to avoid solving such PDEs. Section 8.4 introduces continuous measure transformations, which will be used in later chapters.

8.1 Stochastic Differential Equations

We assume the existence of a probability space (Ω, , ℙ), where is a σ-algebra on the sample space Ω of possible outcomes, and (Ω, ) is a measurable space and ℙ: ↦ [0,1] is a probability measure. Let the drift μ: ℝ ↦ ℝ and the diffusion σ: ℝ ↦ ℝ be Borel-measurable functions1 and assume that Xt: Ω ↦ ℝ is a solution to the time-homogeneous Itō stochastic differential equation

dX(t)=μ(t,X(t))dt+σ(t,X(t))dW(t),X(0)=x0(8.1)

where {W(t), t ≥ 0} is a standard Wiener process defined on the probability space (Ω, , ℙ) equipped with the natural filtration {(t)} generated by W(t).

The standard Wiener process is defined in Definition 7.1; the concepts of filtration, martingales and adaptedness are defined in Definitions 7.2, 7.3 and 7.4. Please refer to Appendix A for a detailed discussion of these concepts.

Let us give a number of examples to illustrate the following discussion.

Example 8.1 (The Wiener process).

Consider the Wiener process

dX(t)=σdW(t),X(0)=x0(8.2)

where σ is the standard deviation of the process and x0 is a deterministic initial condition, which is short for

X(t)=x0+t0σdW(s).

From the definition of the Wiener process (Definition 7.1), it immediately follows that

X(t)=x0+σ(W(t)W(0))=x0+σW(t).

Next we compute the mean of X(t), i.e.

E[X(t)]=E[x0+t0σdW(s)]=x0

which follows from (7.37). The variance is given by

Var[X(t)]=Var[x0+t0σdW(s)]=σ2E[(t0dW(s))2]=σ2t0E[12]ds=σ2t,

where we have used the Itō isometry property (7.40). This shows that Var[X (t)] ↦ ∞ ast t → ∞. However the process is still bounded in finite time.

Example 8.2 (Wiener process with drift).

Let us compute the mean and variance of X(t), where X(t) is the solution to

dX(t)=μdt+σdW(t),X(0)=x0

where μ and σ are some constants. This SDE corresponds to

X(t)=x0+t0μds+t0σdW(s).

As in the previous example, we get

E[X(t)]=x0+E[t0μds]+E[t0σdW(s)]=x0+μt,Var[X(t)]=Var[t0σdW(s)]=σ2E[(t0dW(s))2]=σ2t.

We see that the mean of X(t) has a linear trend (or drift).

Example 8.3 (Stochastic exponential growth).

Consider the SDE

dX(t)=μX(t)dt+σdW(t),X(0)=x0(8.3)

where μ and σ are constants, which may describe unlimited growth in biological systems or a stochastic money market account.

If we take expectations in the adjacent stochastic integral equation, we get

E[X(t)]=x0+E[t0μX(s)ds]+E[σt0dW(s)].

Of course, the last term equals zero. Using Fubini's theorem (which we neither state nor prove here), we may exchange the expectation and integration operators, i.e.,

E[X(t)]=x0+E[t0μX(s)ds]=x0+μt0E[X(s)]ds.

Compared to the last two examples the problem is now that E[X(t)] exists on both sides of the equation. A standard trick is to introduce m(t) = E[X(t)] and then take the expectation and derivative with respect to time t on both sides, i.e.,

dm(t)dt=˙m(t)=μm(t);m(0)=E[X(0)]

which clearly has the solution

E[X(t)]=m(t)=m(0)eμt.

We see that E[X(t)] grows exponentially ast t → ∞.

Considering the slightly more complicated Geometric Brownian Motion (GBM)

dX(t)=αX(t)dt+σX(t)dW(t),(8.4)

where α and σ are positive constants, it is not clear if there is existence and uniqueness of the solution for all t ≥ 0 or if the solution might blow up with positive probability in finite time. Along the same lines we must examine whether it is possible to determine a closed form solution or not. In the former case, we may have to impose some restrictions on the functions μ and σ in (8.1) in order to obtain existence of the solution.

It is an interesting result that the answers to these questions only depend on the properties of the infinitesimal characteristics μ and σ in (8.1) (and possibly the initial condition X(0)).

8.1.1 Existence and uniqueness

As for ordinary differential equations (ODEs) Lipschitz and bounded growth conditions must be imposed on the drift and diffusion terms in order to obtain existence and uniqueness of solutions.

We must distinguish between weak and strong solutions to (8.1). A strong solution is obtained if the driving Wiener process is given in advance as a part of the problem such that the obtained solution to (8.1) is (t)-adapted, where (t) is the σ-algebra generated by the Wiener process. On the other hand, if we are just given the infinitesimal characteristics μ and σ in advance and the solution should apply for all possible Wiener processes, then the obtained solution is called a weak solution. It is clear that a strong solution is also a weak solution, because the particular Wiener process W(t) that resulted in the strong solution is just one of infinitely many Wiener processes that will give a weak solution. The converse is not true in general.

Theorem 8.1 (Strong uniqueness).

Suppose that the infinitesimal characteristics μ(x) and σ(x) are locally Lipschitz-continuous in the state variable; i.e., for every integer n ≥ 1 there exists a constant Cn such that for every t ≥ 0, |x| ≤ n and |y| ≤ n:

|μ(x)μ(y)|+|σ(x)σ(y)Cn|xy|.(8.5)

Then strong uniqueness holds for (8.1).

Proof. Omitted. See Karatzas and Shreve [1996].

Let us consider an example that does not satisfy the condition (8.5).

Example 8.4

It is easy to verify that the differential equation

dxdt=3x2/3

has several solutions, for any a > 0,

x(t)={0for ta,(ta)3for t>a.

This ODE is excluded as μ(x) = 3x2/3 does not satisfy (8.5) for x = 0.

We need an additional assumption in order to obtain existence and uniqueness of the solutions of (8.1).

Assumption 8.1 (Linear growth).

The functions μ and σ satisfy the usual linear growth condition

|μ(x)|+|σ(x)|K(1+|x|),x(8.6)

where K is a positive, real constant.

Example 8.5

The differential equation

dxdt=x2(t),x(0)=1

corresponding to μ(x) = x2 has the solution

x(t)=11t;0t<1.

Thus it is impossible to find a solution for all t. This is due to the fact that μ(x) = x2 does not satisfy Assumption 8.1.

Next consider an example of a SDE.

Example 8.6 (Trespassing in a minefield).

Consider as an eXample of the process which satisfies (8.5), but not (8.6)

dX(t)=12exp (2X(t))dt+exp (X(t))dW(t).

For X(t) < 0, we get exponential growth, which is faster than linear growth, and (8.6) is not satisfied. It may be shown that the solution is given by

X(t)=ln (W(t)+exp (X(0))).

It can be seen that the solution blows up when W(t) < − exp(X(0)), as we would have to compute the natural logarithm of a negative number! If we define the (stopping) time τ(X(0), ω) by

τ(X(0),ω)=inf {t0:W(t,ω)=exp (X(0,ω))},ωΩ

it is clear that the solution only eXists up to time τ(X(0), ω). This explosion time depends on the stochastic initial condition and the actual trajectory of the driving Wiener process.

Example 8.7 (Geometric Brownian motion).

Consider the process given in (8.4). In this case an explosion time e may be defined by

e=inf {t0:X(t){0,}}(8.7)

which states that the explosion time e is the first (i.e., smallest) time, where the process X(t) hits the boundary 0 or takes the value of ∞. Note that it is also critical if X(t) attains the value 0 because the process X(s) will remain at zero for s ≥ t. The value of X(t) as t → ∞ depends on the parameters μ and σ as follows (this is illustrated in Example 8.10):

  1. If μ>12σ2 then X(t) → ∞ a.s. as t → ∞.
  2. If μ<12σ2 then X(t) → 0 a.s. as t → ∞.
  3. If μ=12σ2 then X(t) will fluctuate between arbitrary large and arbitrary small values a.s. as t → ∞,

where a.s. is an abbreviation of almost surely. It may, however, be shown that X(t) does not take either the value 0 or ∞ in finite time. Hence the geometric brownian Motion does not explode. This is also clear as the infinitesimal characteristics are linear in X(t) and thus fulfils the Lipschitz condtions (8.5) and, in particular, the linear growth condition (8.6).

It may be shown that the conditions in Theorem 8.1 and Assumption 8.1 ensure the existence and uniqueness of solutions of (8.1). In particular (8.6) ensures that the solution does not explode in finite time. These assumptions may be generalized to the multivariate case (Karatzas and Shreve [1996]).

For one-dimensional processes (8.1), the assumptions (8.5) and (8.6) are not necessary to ensure nonexplosive solutions. The assumptions can be weakened to the following theorem.

Theorem 8.2 (The Yamada conditions).

Suppose that μ and σ are bounded. Assume further that the following conditions hold:

  1. There exists a strictly increasing function v(u): ℝ+ ↦ ℝ such that v(0) = 0, and 0v2(u)du= and |σ(x) − σ(y)| ≤ v(|xy|) for allx, y ∈ ℝ.
  2. There exists an increasing and concave function κ(u): ℝ+ ↦ ℝ such that κ(0) = 0, 0κ1(u)du= and |μ(x) − μ(y)| ≤ κ(|xy|) for all x, y ∈ ℝ.

Then the pathwise uniqueness of solutions holds for (8.1) and hence it has a unique strong solution.

Proof. Omitted. See Ikeda and Watanabe [1989].

Remark 8.1.

The usual Lipschitz condition requires that v(u) = K1u and K(u) = K2u, where K1, K2 ∈ ℝ+ are some constants, or even a unified condition for μ and σ as shown in, e.g. Rydberg [1997].

There exist solutions to (8.1) which do not fulfil the linear growth condition (8.6). Thus we need to determine other conditions that ensure the nonex-plosiveness of solutions, in particular conditions which are easier to check than those in Theorem 8.2.

Consider the scale function

s(x)=xcexp (yc2μ(ξ)σ(ξ))dy(8.8)

for some fixed c ∈ ℝ+. This function may be used to establish sufficient conditions on the parameters θ ∈ Θ ⊂ ℝp so that the explosion will never occur.

Theorem 8.3 (Probability of an explosion).

Let X(t) be described by (8.1), the scale function s(x) by (8.8) and the explosion time e by (8.7).

  1. If s(0) = −∞ and s(∞) = ∞, then the probability for no explosion in finite time is one

    (e=)=1

    for every X(t).

  2. If s(0) > −∞ and s(∞) = ∞, then lim teX(t) exists asymptotically and

    (lim teX(t)=0)=(sup t<eX(t)<)=1

    for every x. A similar assertion holds if the roles of 0 andare interchanged.

  3. If s(0) > −∞ and s(∞) < ∞, then lim reX(t) exists asymptotically and

    (lim reX(t)=0)=1(lim teX(t)=)=s()s(x)s()s(0).

Proof. Omitted. See e.g. Ikeda and Watanabe [1989].

Thus, if case 1) in Theorem 8.3 can be verified, the SDE in (8.1) does not explode with probability 1 and the solution exists for all t. On the other hand, if case 1) is not fulfilled, (8.1) may explode with positive probability in finite time. A further generalization is required, and this is called Feller's test for eXplosions. We refer the interested reader to, e.g., Karatzas and Shreve [1996, Section 5.1] for details.

Remark 8.2.

For specific choices of μ and σ in (8.1) the integral (8.8) may be difficult to evaluate. However, the computations may be simplified considerably by a change of measure using Girsanov's Theorem (see later) provided that a unique equivalent Martingale measure eXists under the new measure (see e.g. Rydberg [1997] for the appropriate conditions in the one-dimensional case2). Informally speaking, Girsanov's Theorem simply introduces a measure that moves along with the deterministic drift and thus, under the equivalent martingale measure, the drift is removed.

The following example illustrates the use of the scale function.

Example 8.8.

For the process (8.2), there is no drift μ(X(t)) = 0 and the diffusion is simply σ(X(t)) = σ, i.e.,

s(x)=xcexp (yc0dξ)dy=xcexp(0)dy=xc.

Thus we get s(0) = −c, which implies that

lim cs(0)=

and

s()=cc+.

Thus condition 1) in Theorem 8.3 is fulfilled and the Wiener process (8.2) does not explode. This may seem contradictory, but it is important to stress that the trajectories of the Wiener process remain finite despite the fact that Var[X (t)] → ∞ as t → ∞. Note that ∞ does not belong to the real line ℝ.

In the remainder of this book (and the problems), we simply assume that a unique solution exists. For brevity we shall not, in general, list the restrictions on the parameters that must be imposed to ensure nonexplosiveness.

8.1.2 Itō formula

An important feature of Itō stochastic differential equations is stated in the next theorem, but first we need a definition.

Definition 8.1 (The C1,2 space).

Let ϕ: ℝ2 ↦ ℝ be a function of two variables. The function φ is said to belong to the space C1,2 (ℝ × ℝ) if φ is continuously differentiable w.r.t. the first variable and twice continuously differentiable w.r.t. the second variable.

Theorem 8.4 (The Itō formula).

Let X(t) be a solution to (8.1) and φ: ℝ2 ↦ ℝ be a C1,2(ℝ)-function applied to X (t)

Y(t)=φ(t,X(t)).(8.9)

Then the following chain rule applies

dY(t)=[φt+μφX(t)+12σ22φX(t)2]dt+σφX(t)dW(t)(8.10)

where the functions φ and σ are as defined prior to (8.1).

Proof. For notational brevity, we will leave out the argument in φ(t, X(t)), X(t) and W(t) in this ad hoc proof. A second order Taylor expansion of dφ gives

dφ=φtdt+φxdX+122φx2(dX)2+122φt2(dt)2+2φtxdtdX.

From (8.1), we get

(dX) 2 = μ 2 (dt) 2 + σ 2 (dW) 2 +2μσ(dt)(dW).

Compared to terms with dt and dW, the terms containing (dt)2 and (dt)(dW) are insignificant while (dW)2 ∽ ?(dt). Thus we get

dφ= φ t dt+ φ x (μdt+σdW)+ 1 2 σ 2 2 φ x 2 (dW) 2 = φ t dt+μ φ x dt+σ φ x dW+ 1 2 σ 2 2 φ x 2 dt =[ φ t +μ φ φ + 1 2 σ 2 2 φ x 2 ]dt+σ φ x dW

where we have also used Metatheorem 1.

Remark 8.3 (Short form of the Itō formula).

By introducing the notation φt = ∂φ/∂t, etc., (8.10) may be written as

dφ=( φ t +μ φ x + 1 2 σ 2 φ xx )dt+σ φ x dW,(8.11)

where we stress that φt should not be confused with φ(t).

Remark 8.4 (Additional term in the Itō formula).

As opposed to classical calculus, (8.10) contains the additional term 1 2 σ 2 2 φ( X t )/ X t 2 , which makes Itō calculus more complicated for theoretical considerations, although solutions to (8.1) are both Markov processes and martingales.

Remark 8.5.

It follows from the last remark that the diffusion term from (8.1) enters the drift of (8.10). Another remarkable observation from (8.10) is that the transformed variable Y(t) is also described by an Itō diffusion process.

Example 8.9.

Consider the integral

I(t)= 0 t W(s)dW (s).

Choose X(t) = W(t), which implies that dX (t) = dW (t), ie. μ = 0 and σ = 1. In addition choose the transformation φ(t,x)= 1 2 x 2 . Then

Y(t)=φ(t,W(t))= 1 2 W (t) 2 .

Using (8.10), we get

dY(t)= φ t dt+ φ x dW(t)+ 1 2 2 φ x 2 (dW(t)) 2 =0+W(t)dW(t)+ 1 2 (dW(t)) 2 =W(t)dW(t)+ 1 2 dt.

This implies that

d( 1 2 (W(t)) 2 )=W(t)dW(t)+ 1 2 dt

or in integral form

1 2 (W(t)) 2 = 0 t W(s)dW (s)+ 1 2 t

or

I= 0 t W(s)dW (s)= 1 2 (W(t)) 2 1 2 t.

Example 8.10 (Geometric Brownian motion).

We wish to solve the SDE given by

dX(t)=μX(t)dt+σX(t)dW, X 0 >0.(8.12)

This SDE is called the geometric Brownian motion and is considered extensively in mathematical finance as a model for interest rates and stock prices. This is mainly due to the fact that the solution X(t) is lognormally distributed and thus excludes negative interest rates (or populations in biology or concentrations in chemistry).

By introducing the transformation Y(t) = φ(t, X (t)) = ln(X (t)), we get

φ t =0, φ X(t) = 1 X(t) , 2 φ X (t) 2 = 1 X (t) 2 .

Inserting these in (8.10) we get

d Y t =[ μX(t) 1 X(t) + 1 2 σ 2 X (t) 2 ( 1 X (t) 2 ) ]dt+σX(t) 1 X(t) dW(t)

or

d(ln ⁡X(t))=(μ 1 2 σ 2 )dt+σdW(t)

and, finally,

X(t)= X 0 exp ⁡((μ 1 2 σ 2 )t+σW(t)).(8.13)

8.1.3 Multivariate SDEs

Let the state variable X(t) ∈ ℝn be described by the multivariate SDE

dX(t)=μ(t,X(t))dt+σ(t,X(t))dW(t)(8.14)

where μ(t, X(t)): ℝ × ℝn → ℝn, σ(t, X(t)): ℝ × ℝn → ℝn × ℝm and W(t) is an m-dimensional standard Wiener process. Note that n need not equal m.

Alternatively, Eq. (8.14) may be written as

d X i (t)= μ i (t,X(t))dt+ j=1 m σ ij (t,X(t))d W j (t);i=1,...,n.(8.15)

For this process, we define the instantaneous covariances as

(t,X(t)) =σ(t,X(t)) σ T (t,X(t)).(8.16)

Consider the following generalization of Theorem 8.4.

Theorem 8.5 (The multivariate Itō formula).

Let X(t) be a solution to (8.14) and φ: ℝn ↦ ℝk be a C1,2(ℝ)-function applied to X(t)

Y(t)φ(t,X(t)). (8.17)

Then the following chain rule applies

dφ=[ φ t + φ X μ+ 1 2 trace( σ σ T 2 φ X X T ) ]dt+ φ X T σdW(t)(8.18)

where φ = φ(t, X(t)), μ = μ(t, X(t)), etc.

Proof. Omitted, but it is similar to the proof of Theorem 8.4.

Remark 8.6.

The multivariate Itō formula may also be written as

dφ= φ t dt+ i=1 n φ X i d X i + 1 2 i=1 n j=1 m 2 φ X i X j (d X i )(d X j )(8.19)

where (dWi)(dWj) = δijdt (Kronecker's delta), i.e.,

(d W i )(d W j ) = 0ij, (8.20) (d W i )(d W i ) = dt, (8.21) (d W i )(dt) = (dt)(d W i )=0. (8.22)

The following example illustrates the use of Itō 's formula.

Example 8.11.

Consider the two-dimensional SDE

d S 1 = α 1 S 1 dt+ σ 1 S 1 d W 1 S 1 (0)= S 10 , (8.23) d S 2 = α 2 S 2 dt+ α 2 S 2 d W 2 S 2 (0)= S 20 , (8.24)

where α1, α2, σ1 and σ2 are constants, and W1, W2 are two uncorrelated, standard Wiener processes. (We have left out the time argument t for brevity.)

By introducing the transformation

φ=φ( S 1 , S 2 )= S 1 S 2

in (8.19), we get

dφ = 0·dt+ 1 S 2 ( α 1 S 1 dt+ σ 1 S 1 d W 1 ) S 1 S 2 2 ( α 2 S 2 dt+ σ 2 S 2 d W 2 ) + 1 2 0·d S 1 d S 1 + 1 2 S 1 ((2 S 2 3 ))d S 2 d S 2 + 1 2 ( 1 S 2 2 )d S 1 d S 2 + 1 2 ( 1 S 2 2 )d S 2 d S 1 = ( α 1 α 2 + α 2 2 ) S 1 S 2 dt+( σ 1 d W 1 σ 2 d W 2 ) S 1 S 2 .

The difference between two uncorrelated Wiener processes W1 and W2 with standard deviations σ1 and a2, respectively, may be expressed as one Wiener process W with the standard deviation σ= σ 1 2 + σ 2 2 (as for normally distributed random variables). Thus

d( S 1 S 2 )=( α 1 α 2 + α 2 2 ) S 1 S 2 dt+σ S 1 S 2 dW.

Note that (8.23) may be solved independently and the solutions are given on the form (8.13). Thus

S 1 (t) S 2 (t) = S 10 S 20 exp ⁡[ ( ( α 1 α 2 )+ σ 2 2 σ 2 2 σ 1 2 2 )t+ σ 1 2 + σ 2 2 W(t) ].

Remark 8.7 (The sum of two Wiener processes).

From the Example, it follows that the sum of two standard Wiener processes W1 (t) and W2(t) may be written as one Wiener process

σ 1 W 1 (t)+ σ 2 W 2 (t) σ 1 2 + σ 2 2 W(t).(8.25)

This important result, which we state here without a formal proof, also applies to the increments of the Wiener process, i.e.,

σ 1 S(t)d W 1 (t)+ σ 2 S(t)d W 2 (t)= σ 1 2 + σ 2 2 S(t)dW(t).(8.26)

These results will be very useful in some problems and applications.

8.1.4 Stratonovitch SDE

An alternative definition of SDEs that adhere to the classical calculus (e.g., the chain rule) is given by the Stratonovitch SDE

dX(t)= μ ˜ (X(t))dt+ σ ˜ (X(t))dW(t)(8.27)

where μ ˜ : and σ ˜ : are Borel-measurable functions and the o-symbol is used to distinguish the Stratonovitch SDE from the Itō SDE (8.1). Although (8.27) does not define neither a Markov process nor a martingale (due to the definition of the Stratonovitch integral), this fact makes it unsuitable for, e.g., prediction and estimation purposes and it is more appropriate for theoretical work, such as existence and uniqueness theorems, stability analysis, bifurcation analysis (Baxendale [1994]) or Taylor series expansions (Kloeden and Platen [1995]).

Fortunately there is a link between the stochastic integrals in the Itō and Stratonovitch senses, namely

μ ˜ (X(t))=μ(X(t)) 1 2 σ(X(t)) σ(X(t)) X(t)

where μ, σ and μ ˜ are defined by (8.1) and (8.27), respectively. See, e.g., Kloeden and Platen [1995], Pugachev and Sinitsyn [1987], ⊘ksendal [2010] for further mathematical details, and Wang [1994], Nielsen [1996] for a discussion of the appropriate application of SDEs (Itō or Stratonovitch) in mathematical modelling.

Remark 8.8.

Note that (8.1) and (8.27) coincide provided that σ(X(t))= σ ˜ (X(t)) is independent of X(t), because ∂σ(X(t))/∂X(t) = 0 in this special, but important, case.

8.2 Analytical solution methods

Generally, it is difficult to obtain closed form solutions to stochastic differential equations. However, the Itō formula, that in all other aspects complicates analytical calculations considerably, may be valuable as an intermediary step in obtaining closed form solutions to (8.1). Some examples along these lines will be given. As with linear ordinary differential equations, the general solution of a linear stochastic differential equation can be found explicitly.

Closed form solutions for a number of SDEs (linear and nonlinear) are listed in Kloeden and Platen [1995], where a very elaborate discussion of numerical solutions may be found as well.

8.2.1 Linear, univariate SDEs

The general form of a univariate linear stochastic differential equation is

dx(t)=(μ 1 (t)X(t)+ μ 2 (t))dt+( σ 1 (t)X(t)+ σ 2 (t))dW(t) (8.28) X( t 0 )= X 0 (8.29)

where the coefficients μ1, μ2, σ1 and σ2 are given functions of time t or constants. We assume that these functions are measurable and bounded on an interval 0 ≤ tT such that the existence and uniqueness theorem from the preceding section applies and ensures the existence of a strong solution X(t) on t0tT for each 0 ≤ t0 < T.

When all the functions are constant the SDE is said to be autonomous and its solutions are homogeneous Markov processes. Otherwise, the SDE is said to be nonautonomous. When μ2(t) ≡ 0 and σ2(t) ≡ 0, the Equations (8.28) reduce to the homogenous linear SDE

dX(t)= μ 1 (t)X(t)dt+ σ 1 (t)X(t)dW(t);X( t 0 )= X 0 (8.30)

which clearly has the solution X(t) ≡ 0. The so-called fundamental solutiont, Φt,t0 which satisfies the initial condition Φt0,t0 = 1 is much more important, because any other solution may be expressed in terms of the fundamental solution. To determine Φt,t0, we consider the simple case where σ1(t) ≡ 0, i.e.,

dX(t)=( μ 1 (t)X(t)+ μ 2 (t))dt+ σ 2 (t)dW(t);X( t 0 )= X 0 (8.31)

where the Wiener process appears additively. In this case we say that the SDE is linear in the narrow sense.

Theorem 8.6 (Solution to a linear SDE in the narrow sense).

The solution of (8.31) is given by

X(t) Φ t, t 0 ( X t 0 + t 0 t μ 2 (s) Φ s, t 0 1 ds + t 0 t σ 2 (s) Φ s, t 0 1 dW (s) )(8.32)

where

Φ t, t 0 =exp ⁡( t 0 t μ 1 (s)ds ).(8.33)

Proof. The homogenous version (σ2(t) ≡ 0) of (8.31) is an ordinary differential equation

X ˙ (t)= μ 1 (t)X(t)(8.34)

with the fundamental solution

Φ t, t 0 =exp ⁡( t 0 t μ(s)ds ).

Applying the Itō formula (8.10) to the transformation φ(t,x)=x/ Φ t, t 0 = Φ t, t 0 1 x and the solution X(t) of (8.34), we get

d( Φ t, t 0 1 X(t)) = ( d Φ t,t 0 1 dt X(t)+( μ 1 (t)X(t)+ μ 2 (t)) Φ t, t 0 1 )dt + σ 2 (t) Φ t, t 0 1 dW(t) = μ 2 (s) Φ t, t 0 1 dt+ σ 2 (t) Φ t, t 0 1 dW(t) (8.35)

as

d Φ t, t 0 1 dt = Φ t, t 0 1 μ 1 (t).

The right hand side of (8.35) can be integrated giving

Φ t, t 0 1 X(t)= Φ t, t 0 1 X( t 0 )+ t 0 t μ 2 (s) Φ s, t 0 1 ds+ t 0 t σ 2 (s) Φ s, t 0 1 dW(s) .

We have thus obtained the solution (8.32) as Φ t, t 0 1 =1.

Remark 8.9.

Notice again that Φ t, t 0 1 means 1/Φt,t0 and not the inverse function.

Theorem 8.7 (Solution to a linear SDE in the wide sense).

The solution to (8.28) is given by

X(t)= Φ t, t 0 ( X t 0 + t 0 t ( μ 2 (s) σ 1 (s) σ 2 (s)) Φ s, t 0 1 ds + t 0 t σ 2 (s) Φ s, t 0 1 dW(s) )(8.36)

where Φt,t0 is given as the solution to the SDE

d Φ t, t 0 = μ 1 (t) Φ t, t 0 dt+ σ 1 (t) Φ t, t 0 dW(t); Φ t 0 , t 0 =1.(8.37)

Proof. Omitted. See Kloeden and Platen [1995, Section 4.3].

Theorem 8.8 (Moments of a linear SDE in the wide sense).

The mean m(t) = E [X(t)] of (8.28) satisfies the ordinary differential equation

m ˙ (t)= μ 1 (t)m(t)+ μ 2 (t);m(0)= m 0 (8.38)

and the second order moment P(t) = E[X2(t)] satisfies

P ˙ (t) = (2 μ 1 (t)+ σ 1 2 (t))P(t)+2m(t)( μ 2 (t)+ σ 1 (t) σ 2 (t)+ σ 2 2 (t), (8.39) P(0) = P. (8.40)

Proof. By proceeding as in Example 8.3, Equation (8.38) is readily seen. In order to show (8.39), we apply the Itō formula (8.10) to the transformation φ(t, x) = x2, i.e.,

dφ=( 0+( μ 1 X+ μ 2 )2X+ 1 2 ( σ 1 X+ σ 2 ) 2 2 )dt+( σ 1 X+ σ 2 )2XdW =(2( μ 1 X 2 + μ 2 X)+ σ 1 2 X 2 + σ 2 2 +2 σ 1 σ 2 X)dt+2( σ 1 X 2 + σ 2 X)dW

where the arguments have been left out for brevity as in the following equivalent stochastic integral formulation

X 2 (t)= X 2 ( t 0 )+ t 0 t (2( μ 1 X 2 + μ 2 X)+ σ 1 2 X 2 + σ 2 2 +2 σ 1 σ 2 X)ds+ t 0 t 2( σ 1 X 2 + σ 2 X)dW.

By taking expectations the last term drops out; cf. (7.37). If we define P(t) = E [X2(t)] and take derivatives, we obtain

P(t) dt =2 μ 1 (t)P(t)+2 μ 2 (t)m(t)+ σ 1 2 (t)P(t)+ σ 2 2 +2 σ 1 (t) σ 2 (t)m(t)

which equals (8.39).

Remark 8.10.

Recall that the variance Var[X (t)] may be determined from

Var[X(t)]=P(t) (m(t)) 2 .(8.41)

In order to solve (8.39) the following result from calculus may be useful.

Remark 8.11 (A formula for solution of ODEs).

The solution to the ODE

x ˙ (t)+ψ(t)x(t)=ϑ(t),t,(8.42)

where Ψ, ϑ: ↦ ℝ are continuous in the interval ℐ, is given by

x(t)=exp ⁡(Ψ(t))( exp ⁡(Ψ(t))ϑ(t)dt+c ),t,c(8.43)

where

Ψ(t)= ψ(t)dt.

As an example consider the SDE from Example 8.3 again.

Example 8.12.

Consider the Langevin equation

dX(t)=μX(t)dt+σdW(t);X(0)= X 0 .(8.44)

Without loss of generality, we assume that t0 = 0. From (8.33), we immediately get

Φ t,0 =exp ⁡( 0 t μds )=exp ⁡(μt)

and thus (8.32) yields the solution

X(t)=exp ⁡(μt)( X 0 +σ 0 t exp ⁡(μs)dW(s) )

which is called the Ornstein-Uhlenbeck process.

The mean m(t) = E[X (t)] is obtained from (8.38), i.e.,

m(t)= m 0 exp ⁡(μt).

The second moment P(t) should fulfill

P ˙ (t)+2μP= σ 2 .

Using Remark 8.11, we get

Ψ(t)= 0 t 2μdt =2μt

and insertion into (8.43) yields

X(t)=exp ⁡(2μt)( 0 t exp ⁡(2μs) σ 2 ds+ P 0 ) = P 0 exp ⁡(2μt) Im ⁡pact from initial var ⁡iance + σ 2 2μ (1exp ⁡(2μt)) Re ⁡sponse of the system .

The variance may be found as stated in Remark 8.10, i.e.,

Var[X(t)]= P 0 exp ⁡(2μt)+ σ 2 2μ (1exp ⁡(2μt)) m 0 2 exp ⁡(2μt)

and the stationary value is

lim ⁡ t Var[X(t)]= σ 2 2μ .

Note that it is not just a σ2.

8.3 Feynman-Kac representation

In this section we shall describe a close relationship between stochastic differential equations and parabolic partial differential equations (PDEs).

Consider the following Cauchy problem

F t (t,x)+μ(t,x) F x (t,x)+ 1 2 σ 2 (t,x) 2 F x 2 (t,x)=0(8.45) F(T,X)=Φ(X)(8.46)

where the functions μ(t, x), σ(t, x) and Φ(T, x) are given and we wish to determine the function F(t, x)

As opposed to solving (8.45) analytically, we shall consider a representation formula for the solution F (t, x) in terms of an associated stochastic differential equation.

Assume that there exists a solution to (8.45). Fix the time t and the state x. Let the stochastic process X(t) be a solution to the SDE

dX(s)=μ(s,X(s))ds+σ(s,X(s))dW(s),X(t)=x(8.47)

where s is now the running time.

Remark 8.12

(Same μ(·) and σ(·)). The functions μ(t, X(t)) and σ(t, X (t)) in (8.45) and (8.47) are the same — except for the fact that the running time variable in (8.47) is s.

If we apply the Itō formula (8.10) to the process F(s, X(s)) and write the result in stochastic integral form, we get

F(T,X(T))=F(t,X(t)) + t T ( F F (s,X(s))+μ(s,X(s)) F x (s,X(s))+ 1 2 σ 2 (s,X(s)) ) ds + t T σ(s,X(s)) F x (s,X(s))dW (s).(8.48)

Let us further assume that the process

σ(s,X(x)) F x (s,X(s))

belongs to the space ℒ2[t, T]; see Definition 7.5. If we use that F(t,x) solves (8.45), then the ds integral drops out of (8.48). If we apply the boundary condition F(T,x) = Φ(x), and the initial condition X(t) = x, and take the expected value of the remaining parts of (8.48) then the last term also drops out; cf. (7.37). The only remaining term is

F(t,x)= E t,x [Φ(X(T))](8.49)

where the subscript t,x on the expectation operator is used to emphasize the fixed initial condition X(t) = x.

We state this important result in a theorem.

Theorem 8.9 (The Feynman–Kac representation).

Assume that F solves the boundary problem (8.45) and that the process

σ(s,X(s)) F x (s,X(s)) 2  for tT,x(8.50)

where X(t) is defined by (8.47). Then F has the stochastic Feynman–Kac representation

F(t,x)= E t,x [Φ(T))].(8.51)

Proof. Follows from the preceding derivation.

Note that the theorem simply states that the solution to (8.45) is obtained as the expected value of the boundary condition.

Remark 8.13.

A major problem with this approach is that it is impossible to check the assumption (8.50) in advance as it requires some a priori information about the solution F to do so. At least two things can go wrong:

  1. Eq. (8.45) does not have a “sufficiently integrable” solution, i.e., the process (8.50) does not belong to the class2. If the latter is the case, the solution offered by the Feynman-Kac representation is pure nonsense.
  2. The solution of (8.45) is not unique. If there are more solutions, the Feynman—Kac approach just supplies the “sufficiently integrable” solution. The remaining solutions must be found using another technique.

In this book, we shall assume that all the functions in question are “sufficiently integrable.” We shall not go into all the technical details (see e.g. Björk[2009], Øksendal [2010]).

Let us consider an example of this remarkable approach.

Example 8.13.

We wish to solve the following boundary problem in the domain [0, T] × ℝ:

F t +μx F x + 1 2 σ 2 x 2 2 F x 2 =0 F(T,x)=ln ⁡( x 2 )

where μ and σ are assumed to be constants.

It is readily seen that the associated SDE is given by

dX(s)=μX(s)ds+σX(s)dW(s);X(t)=x.

We recognize this as the geometric Brownian motion from Example 8.10 on page 148, where the solution was found to be

X(T)=exp ⁡( ln ⁡(x)+(μ 1 2 σ 2 )(Tt)+σ[W(T)W(t)] ).

Using Theorem 8.9, we get the result

F(t,x)= E t,x [2ln ⁡(X(T))] =2ln ⁡(x)+2(μ 1 2 σ 2 )(Tt)

as the expected value of the Wiener increment W(T) − W(t) is zero.

We shall now consider a more general case.

Theorem 8.10 (The Feynman-Kac representation with discounting).

Let the functions μ, σ and Φ be given as above, and let r be a constant. The solution to

F t (t,x)+μ(t,x) F x (t,x)+ 1 2 σ 2 (t,x) 2 F x 2 (t,x)rF(t,x)=0 F(T,x)=Φ(x)

is given by

F(t,x)=exp ⁡(r(Tt)) E t,x [Φ(X(T))](8.52)

where the process X(t) is given by (8.47).

Proof. Omitted. See e.g. Björk [2009].

The Feynman-Kac representation theorems will be used extensively in the following chapters. Further generalizations and examples are to be found in the problems.

These theorems may be used to solve for the transition probabilities for SDEs in order to obtain the conditional and unconditional probability density functions (pdf) of X(t), where X(t) is the solution of (8.47). This is outside the scope of this book.

8.4 Girsanov measure transformation

In this section we introduce the concepts of (probability) measures, the Radon–Nikodym derivative and the Girsanov theorem, which enables us to change (probability) measures in continuous-time models. The theory is much more complicated than in the discrete time case (as described in Chapter 3), so this exposition does not pretend to be complete. Whenever possible, mathematical rigour will be substituted by intuitive arguments.

Note that a measure transformation is an inherently mathematical concept, which greatly simplifies the pricing of financial derivatives, but it is very difficult to fully comprehend the concept.

The objective of this section is to provide the reader with an elementary understanding of the concept of absolute continuous measure transformations, which will be used extensively later to determine arbitrage-free prices of a large class of financial derivatives. This is due to the fact that there exists an intimate relation between arbitrage-free markets and absolute continuous measure transformations. A particularly interesting problem is the existence of equivalent martingale measures (EMM), because it may be shown that the existence of an EMM yields arbitrage-free markets and vice versa.

8.4.1 Measure theory

Intuitively, a measure is a notion that generalizes those of the length, the area of figures and the volume of bodies, and that corresponds to the mass of a set for some mass distribution throughout the space. Please refer to Appendix B for details.

Example 8.14 (Does 2 equal 1?).

Consider two independent, normally distributed stochastic variables X, Y with zero mean and variance 1. If we interpret (X,Y) as a point in2 then we can introduce polar coordinates (R, ϕ), which are also independent.

Consider the conditional mean

E[ R 2 |X=Y]=E[ R 2 |ϕ= π 4 or 5π 4 ]=E[ R 2 ] =E[ X 2 + Y 2 ]=1+1=2.

Now introduce the new variables Z= X+Y 2 and W= XY 2 which are both N(0, 1)-distributed. It is clear that Z2 + W2 = X2 + Y2 = R2 such that

E[ R 2 |X=Y]=E[ Z 2 + W 2 |W=0]=E[ Z 2 ]=1.

Obviously 2 ≠ 1 so there must be something wrong! The problem is that in both conditional expectations we condition on a null set, W = 0, which does not make any sense, whereas the expectation E[R2|XY = ν] makes sense for almost all ν, i.e., we should consider the expectation as an integral with respect to dν (as usual). Thus we need to consider conditional expectations in a wider sense, and this is exactly what measure theory and the Radon-Nikodym derivative enable us to do.

Let (X, ℱ, μ) be a measurable space and let f: X ↦ ℝ be a positive ℱ-measurable function such that

x f(x)dμ(x)<.(8.53)

As an example consider a continuous stochastic variable X with the probability density function f(x). We may define the Lebesgue measure by dμ(x) = f(x)dx such that (8.53) takes the form

x f(x)dx<.

We may now define a new function v: ℱ → ℝ by

v(E)= E f(x)dμ(x)for all E(8.54)

which is also a measure on (X, ℱ).

It follows directly that the measure v has the property

if E and μ(E)=0 then also v(E)=0(8.55)

which means that the v has at least the same null sets as μ.

Definition 8.2 (Equivalent measures).

Let (X, ℱ) be a measurable space, and let μ and ν be measures on (X, ℱ). The measure ν is said to be absolute continuous with respect to μ if (8.55) is fulfilled. In short, we write ν ≪ μ. If both νμ and μν are true, the measures are said to be equivalent and we write ν ~ μ.

Remark 8.14.

That two measures are equivalent simply means that they have the same null sets. Beside that there need not be any similarities.

Example 8.15.

As an example consider an oil tanker that has run aground and starts to leak oil. Let the space X be some limited area of the ocean (a subset of2). At each location xX, we define f(x) as the density of the oil and the measure μ(x) as the depth of the oil. Then the measure v(x) defined by (8.54) measures the amount of oil at location x. These measures are equivalent, because if there is not any oil at any depth at location x, expressed by μ(x) = 0, then there is indeed no oil at location x, which means that v (x) = 0, and vice versa. As there is a limited amount of oil in the tanker, (8.53) is obviously fulfilled.

8.4.2 Radon-Nikodym theorem

Assuming that μ is given and we define the new measure v by (8.54), then ν is absolute continuous with respect to μ. A very important result attributable to Radon–Nikodym states that the converse is also true, namely that any measure ν, where νμ, can be written on the form (8.54). We state this as a theorem without proof.

Theorem 8.11 (Radon—Nikodym).

Let (X, ℱ, μ) be a finite measurable space and let ν be a finite measure on (X, ℱ) such that ν ≪ μ. Then there exists a positive function f: X → ℝ which satisfies

f is measurable(8.56)

x f(x)dμ(x)<(8.57)

v(E)= E f(x)dμ(x);for all Borel sets E.(8.58)

The function f is called the Radon—Nikodym derivative of ν with respect to μ(on the σ-algebra ℱ). It is uniquely determined almost everywhere and we write

f= dv dμ ordv(x)=f(x)dμ(x).(8.59)

Example 8.16.

A simple example of absolute continuity is obtained if we let X be a finite set, i.e., X = [1,..., N] and define the σ-algebra by ℱ = 2X, i.e., the family of all subsets of X. Let the measure μ on (X, ℱ) be given by the point masses μ(n = ([n])), n = 1,..., N. The relation ν ≪ μ means that ν(n) = 0 for all n where μ(n) = 0. If we assume that ν and μ are given and that ν ≪ μ then the Radon—Nikodym derivative is simply found from

v(n)=f(n)μ(n),n=1,...,N

or

f(n)= v(n) μ(n) .

Note that the special case μ(n) = 0 and ν(n) ≠ 0 is excluded by ν ≪ μ.If, however, both μ(n) = ν(n) = 0 then we may define f(n) by

f(n)={ v(n) μ(n) for μ(n)0, Not defined forμ(n)=0.

The function f(n) is not uniquely defined for the n where μ(n) = 0, but the set of these null point has the measure 0. We say that f(n) is uniquely determined almost everywhere (with respect to μ).

It is important to note that the concept of absolute continuity is linked to the specific σ-algebra that we are considering. If for example μ is defined on (X, ℱ) and ℱ ⊇ ? then it is possible that vμ on (X, ?) is true, while it is not true that vμ on (X, ℱ).

Example 8.17.

Consider the set X = [1,2,3] and the measure

μ(1)=2,μ(2)=0,μ(3)=2 v(1)=8,v(2)=5,v(3)=13

and the σ-algebras ℱ = 2X and ? = [X, ∅, [1], [2,3]]. It is clear that v ≪ μ is not true on ℱ because v (2) ≠ 0 while μ(2) = 0. On the other hand, we have v ≪ μ on ? with the Radon—Nikodym derivative

f(n)={ 8/2=4 for n = 1 (5+13)/(0+2)=9 for n = 2,3

By comparing ℱ and ?, it is clear that the absolute continuity property may be lost, if we consider a finer σ-algebra. The σ-algebra ? cannot distinguish between [2] and [3].

We shall now consider measure transformations on filtered probability spaces and we assume that the probability space (Ω, ℱ, ℙ) augmented by the filtration ℱ(t) is given on the time interval [0, T], where T is some fixed time. Assuming that we have a non-negative ℱ(T)-measurable stochastic variable LT, we may construct a new measure ℚ on (X, ℱT) by

d= L T d(8.60)

and if we further have that

E [ L T ]=1(8.61)

then ℚ is a new probability measure on (X, ℱ(T)).

Measure transformations of this kind are closely related to martingale theory. Let ℙt and ℚt denote the restrictions of ℙ and ℚ on ℱ (t), which implies that knowledge about the probability measures is only based on information up to and including time t. Then ℚt is absolute continuous with respect to ℙt for all t, and the Radon—Nikodym Theorem 8.11 guarantees the existence of a stochastic process [Lt; 0 ≤ tT] defined by

L t = d t d t ord t = L t d t .(8.62)

It also follows that Lt is adapted. Furthermore, we shall now show that Lt is also a martingale with respect to (ℱ(t), ℙ).

Theorem 8.12.

The stochastic process Lt is a (ℱ(t), ℙ)-martingale.

Proof. We need to show that

L t = E [ L T |(t)]tT(8.63)

which is indeed the martingale property; namely that the expected value at time t of a stochastic variable L at some future time T, tT, is simply the expected value of L based on the information up to time t.

In other words we need to show that for all F ∈ ℱt, we have

F L t d= F L T d (8.64)

which follows from the following argument: As F ∈ ℱ (t) it follows from (8.62) that

F L t d= t (F)= T (F)

where the latter is due to ℚt = ℚT on ℱ(t). This simply states that our information about the probability measure ℚT given the information set ℱ(t) is limited to the restricted probability measure ℚt. As the filtration is increasing F ∈ ℱt ⊆ ℱT, we finally get (8.64).

Remark 8.15 (Restricted probability measures).

Think of a restricted probability measure ℙt in the following way: Assume that we gather information about, say, stock prices in time. Each time we observe a price we obtain more information about the probability density function (pdf) of stock prices (by e.g., drawing a histogram). As t → ∞, we obtain complete information about the pdf and our knowledge is no longer restricted to ℙt.

It is sometimes convenient to exchange probability measures as we did in Chapter 3 to compute arbitrage-free prices. We recall that the price of any financial derivative may be expressed as the expected value of a (properly discounted) payoff function under an equivalent martingale measure ℚ, Thus we need to establish a relation between expectations under different measures. We will need an important term before we can state the main result.

Definition 8.3 (The L1-space).

Let an integrable stochastic variable X be defined on the probability space (Ω, ℱ, ℙ). If

E [X]<(8.65)

then X is said to belong to the class L1. We write XL1 (Ω, ℱ, ℙ).

Theorem 8.13 (Expectation under the ℚ-measure).

Let the probability space (Ω, ℱ, ℙ) and a stochastic variable XL1(Ω, ℱ, ℙ) be given. Letbe another probability measure on (X, ℱ) where ℚ ≪ ℙ with the Radon—Nikodym derivative given by

L= d d .

Assume that X also belongs to L1 (Ω, ℱ, ℚ) and that a ? is a σ-algebra such that ? ⊆ ℱ. Then

E[X|?]= E [LX|G] E [L|G] almost surely. (8.66)

Proof. Omitted. See Björk [2009].

We may apply this theorem to characterize martingales under the ℚ-measure in terms of the characteristics under the ℙ-measure.

Theorem 8.14.

Consider the probability space (Ω, ℱ, ℙ) augmented by the filtration ℱ(t) on the time interval [0, T]. Letbe another probability measure such thatT ≪ ℙT and define the process L as in (8.62). Assume that M is an ℱ(t)-adaptedprocess with E [M(t)] < ∞ for all t ∈ [0, T] such that

LM is a (, (t))martingale.(8.67)

Then M is a (ℚ, ℱ(t))-martingale.

Proof. Omitted. See Björk [2009].

Remark 8.16.

The theorem simply states (under some additional conditions) that if we apply the Radon—Nikodym derivative to a ℙ-martingale M then we get a ℚ-martingale. Thus (under some conditions) the martingale property is preserved. This is a very important result.

8.4.3 Girsanov transformation

So far we have shown that is possible to introduce absolute continuous measure transformations from the objective probability measure ℙ (the real-world measure) to an equivalent martingale measure ℚ such that we can obtain arbitragefree prices of financial derivatives. We now show that such measure transformations affect the properties of the driving Wiener process and the infinitesimal characteristics of a stochastic differential equation.

As the mathematics is fairly complicated, one should at all times keep in mind that the objective is to choose a particular new measure ℚ such that we can obtain arbitrage-free prices.

The mathematical framework is as follows: We consider a Wiener process X(t) defined on the probability space (Ω, ℱ, ℙ) augmented by the natural filtration ℱ(t) for 0 ≤ tT, where T is some fixed time T (e.g., the maturity time of a bond or the exercise date of a call option on a stock). We introduce a nonnegative ℱ(t)-measurable stochastic variable LT with E[LT] = 1. We wish to exchange measures by

d= L T d(on (T))

and consider the problem how this change of measure affects the ℙ-Wiener process.

Let us consider a univariate stochastic differential equation (defined on some probability space)

dY(t)=μ(t)dt+σ(t)dX(t)(8.68)

where X(t) is a ℙ-Wiener process.

Heuristically, the functions μ and σ may be interpreted as

μ(t)dt = E[dy(t)|(t)](drift) (8.69) σ 2 (t)dt = E[ (dy(t)) 2 |(t)](diffusion) (8.70)

where dY(t) is short for Y (t + dt) − Y(t). In particular for the ℙ-Wiener process we have

E [dX(t)|(t)] = 0·dt (8.71) E [ (dX(t)) 2 |(t)] = 1·dt (8.72)

under the ℙ-measure. We wish to determine

E [dX(t)|(t)] (8.73) E [ (dX(t)) 2 |(t)] (8.74)

under the ℚ-measure. To this end, we may use (8.66) from Theorem 8.13

E [dX(t)(t)]= E [(t+dt)dX(t)|(t)] E [L(t+dt)|(t)] (8.75)

where we must evaluate L at time t + dt, because we have defined dX (t) = X(t + dt) − X(t). From Theorem 8.12, we know that L is a ℙ-martingale such that the denominator in (8.75) is simply L(t). For the numerator we get

E [L(t+dt)dX(t)|(t)]= Ε [(L(t)+dL(t))dX(t)|(t)] = E [L(t)dX(t)|(t)]+ E [dL(t)dX(t)|(t)].

As L(t) is ℱ(t)-measurable, L(t) can move out of the first expectation, i.e.,

E [L(t+dt)dX(t)|(t)]=L(t) Ε [dX(t)|(t)]+ E [dL(t)dX(t)|(t)]

As dX(t) is a Wiener-increment with zero mean, we finally get

E [L(t+dt)dX(t)|(t)]= Ε [dL(t)dX(t)|(t)].

Thus (8.75) may be written as

E [dX(t)|(t)]= Ε [dL(t)dX(t)|(t)] L(t) .(8.76)

This is as far as we can get in general, but for very particular choices of the likelihood process L(t), Equation (8.76) may be simplified considerably. We recall that L(t) is a ℙ-martingale and that we know the properties of the ℙ-Wiener process X. It is to be expected that (8.76) may be simplified if the likelihood process takes the form

dL(t)=f(t)dX(t),L(0)=1.(8.77)

It is by no means clear if there exist likelihood processes of the form (8.77). The process L(t) does indeed become a martingale if f ∈ ℒ2, but we have no a priori guarantee that L(t) remains non-negative for some choice of f. For now we shall just assume that an f(t) process exists and that L(t) remains non-negative. If we use (8.77) in (8.76), we get

E [dL(t)(t)]= E [f(t) (dX(t)) 2 |(t)] =f(t) E [ (dX(t)) 2 |(t)]=f(t)dt

where we have used that f(t) is ℱ(t)-measurable and that

E [dX(t)) 2 |(t)]=dt

because X(t) is a ℙ-Wiener process. If we now choose f(t) of the form

f(t)=g(t)L(t)

and insert this into (8.76), we get

E [dX(t)|(t)]= Ε [dL(t)dX(t)|(t)] L(t) = E [g(t)L(t) (dX(t)) 2 |(t)] L(t) = L(t) E [g(t)(t) (dX(t)) 2 |(t)] L(t) =g(t)dt

as g(t) is also ℱ(t)-measurable.

Using a similar argument, it may be shown that

E [ (dX(t)) 2 |(t)]=dt.

By comparing these last results with (8.69), we see that the process X has the infinitesimal characteristics μ(t) = g(t) and σ(t) = 1 under the ℚ-measure. Thus under the ℚ-measure, X(t) may be described by

dX(t)=g(t)dt+dW(t)(8.78)

where W(t) is a ℚ-Wiener process.

It is seen that the ℙ-wiener process obtains a drift term g(t)dt and a diffusion term dW(t) which is a ℚ-Wiener process. The function g(t) is called the Girsanov kernel. It plays a very important role in mathematical finance as we shall see.

We shall now formalize these results. We start with a small lemma.

Lemma 8.1.

Let g(t) be an ℱ(t)-adaptedprocess that satisfies

[ 0 T g 2 (t)dt< ]=1.(8.79)

Then the equation

dL(t)=g(t)L(t)dX(t),L(0)=1(8.80)

has the unique and strictly positive solution

L(t)=exp ⁡( 0 t g(s)dX(s) 1 2 0 t g 2 (s)ds ).(8.81)

Proof. Omitted. We leave it as an exercise for the reader.

Recall that it is important that E [L(T)] = 1 for ℚ to be a probability measure. It is also important to note that it is not guaranteed that L(t) defined by (8.80)–(8.81) may be applied as a Radon—Nikodym derivative because we do not know if L(t) satisfies the condition E [L(T)] = 1. If L(t) was a martingale (i.e., if we knew a priori that g(t)L(t) ∈ ℒ2) then the initial condition L(0) = 1 would ensure that L(t) is a martingale. Unfortunately, we can only state that L(t) is a supermartingale, i.e., E [L(T)] ≤ 1 for functions satisfying Lemma 8.1. We now state the main result in this section.

Theorem 8.15 (The Girsanov theorem).

Let X(t) be a (ℙ, ℱ(t))-Wienerprocess and let g(t) and L(t) be as defined in Lemma 8.1. Assume that

E [L(T)]=1(8.82)

and define the probability measure ℚ by dℚ = L(T)dℙ on ℱ(t). Then the process W (t) defined by

W(t)=X(t) 0 t g(s)ds (8.83)

becomes a (ℚ, ℱ(t))-Wienerprocess.

Proof. Omitted. See e.g. Björk [2009].

Remark 8.17.

Note that (8.83) on differential form is

dW(t)=dX(t)g(t)dt

or

dX(t)=g(t)dt+dW(t),

which is similar to (8.78).

The assumption (8.82) is obviously very important and we now state a theorem (without a very difficult proof) that establishes necessary and sufficient conditions for g(t) such that (8.82) is satisfied.

Theorem 8.16 (The Novikov condition).

Assume that g(t) satisfies

E [ exp ⁡( 1 2 0 T g 2 (t)dt ) ]<(8.84)

then L(t) becomes a-martingale and, in particular, we have

E [L(T)]=1.(8.85)

8.4.4 Maximum likelihood estimation for continuously observed diffusions

In this section, we introduce the Girsanov measure transformation as the theoretical foundation of a modern application of the well-known Maximum Likelihood method.

The intuition behind the classical maximum likelihood approach is that

  • there is one measure (or one probability density function parametrized by a parameter θ and we wish to estimate this θ),
  • there is one Wiener process or driving noise process and
  • there are many processes X(t) and that we have only observed one of these.

The modern view is that

  • there are several measures (one measure for each admissible parameter θ),
  • there are equally many Wiener processes and
  • there is just one X(t) process, namely the one that we have observed.

In the modern view the problem is thus to determine the measure given only one set of observations X(t). To be specific, we fix the probability space (Ω, ℱ, ℙ), where the process X(t) is a Wiener process under the ℙ-measure. For each θ ∈ Θ ⊆ ℝ, where Θ is the admissible parameter set (for e.g., the exponential distribution only positive parameters are allowed Θ = ℝ+), we define the measure transformation

d L θ (t) = θ L θ (t)dX(t), (8.86) L θ (0) = 1. (8.87)

Next we define the measure Pθ by the Radon—Nikodym derivative

d θ = L θ (t)don  (t) X ,(8.88)

where ℱ(t)X is the natural filtration generated by the process X(t) up to and including time t. This is essentially the likelihood ratio as given in Newman-Pearsons lemma. We see that the likelihood ratio should be evaluated using our observations ℱ(t)X. To be specific, the quantity Lθ (t) should be maximized with respect to θ and we interpret the solution θ ^ as the most probable or the most likely parameter given the observations.3

Our process X(t) is no longer a Wiener process under the ℙ-measure, but it is a Wiener process under the new ℙθ-measure. We say that X(t) is a ℙθ - Wiener process.

The Girsanov theorem with the Girsanov kernel θ = g states that these two measures are connected by

dX(t)=θdt+d W θ (t); W θ (t)X(t) 0 t θds (8.89)

where Wθ (t) is a ℙθ-Wiener process, and that

d θ (t)= L θ (t)d

Thus there is one measure associated with each ℙθ-Wiener process for θ ∈ Θ, but there is only one observed process X(t), 0 ≤ tT, where T is some finite time.

Example 8.18 (Maximum likelihood estimation 1).

Assume that we wish to estimate the parameter θ in the process

dX(t)=θdt+dW(t).

The L(t)-process is given by

d L θ (t)=θ L θ (t)dX(t)

which according to (8.81) has the solution

L θ (t)=exp ⁡( 0 t θdX(s) 1 2 0 t θ 2 ds ) =exp ⁡( θX(t) 1 2 θ 2 t ).

As usual, we compute

l θ (t)=ln ⁡ L θ (t)=θX(t) θ 2 2 t

and solve

l θ (t) θ | θ= θ ^ =0X(t) θ ^ t=0;

thus the maximum likelihood estimate of θ is

θ ^ (t)= X(t) t

where the notation θ ^ (t) emphasizes that the estimate of θ is based on ℱ(t).

Consider a slightly more complicated example.

Example 8.19 (Maximum likelihood estimation 2).

Consider the Langevin equation

dX(t)=θX(t)dt+dW(t)(8.90)

where W(t) is a ℙ-Wiener process. Assuming that we have observations of X(t) for 0 ≤ t ≤ T, we wish to estimate the parameter θ.

The associated likelihood process (g(t) = θX(t)) is

d L θ (t)θX(t) L θ (t)dX(t), L θ (0)=1.(8.91)

Another way of posing the estimation problem is to state that we wish to determine the measure Pθ that maximizes the likelihood ratio

L θ (T)= d θ d .

The likelihood process Lθ (t) should fulfill the condition (8.82), i.e.,

E [ L θ (T)]=1

in order for Lθ (t) to be a probability density function. In addition the process should fulfil the square integrability condition

E [ exp ⁡( 1 2 0 T θ 2 X 2 (t)dt ) ]<.

This condition is fulfilled under the assumption that X(t) ∈ ℒ2.

The Girsanov Theorem 8.15 states that

dX(t)=θX(t)dt+d W θ (t)

where Wθ(t) is a ℙθ-Wiener process.

The solution to (8.91) is, cf. (8.81),

L θ (t)=exp ⁡( 0 t θX(s)dX(s) 1 2 0 t θ 2 X 2 (s)ds ) =exp ⁡( θ 0 t X(x)dX(s) 1 2 θ 2 0 t X 2 (x)ds ).

Using the standard ML-approach, we get

ln ⁡ L θ (t) θ = 0 t X(s)dX(x)θ 0 t X 2 (s)ds=0

which has the solution

θ ^ (t)= 0 t X(s)dX(x) 0 t X 2 (s)ds = X 2 (t) 2 1 2 0 t X 2 (s)ds = X 2 (t)t 2 0 t X 2 (s)ds

where we have used the result from Example 8.9. The notation θ ^ (t) is used to emphasize that the estimate of θ is based on information up to time t.

These examples should just illustrate an application of the Radon—Nikodym derivative and the Girsanov theorem. From these examples, it should also be clear that this approach is not immediately applicable for empirical work when dealing with more complicated models, although the approach in Beskos et al. [2006] is building on this idea. More general methods will be introduced in Chapter 13.

8.5 Notes

Some of the material in this chapter is inspired by the very readable Björk [2009]. A more thorough treatment is given by, e.g. Arnold [1974], Kloeden and Platen [1995], Øksendal [2010]. In particular the monograph by Kloeden and Platen [1995] covers a large number of interesting topics — also of practical interest. The often referenced books by Karatzas and Shreve [1996], Ikeda and Watanabe [1989], Doob [1990] are also recommended, although they require some understanding of measure theory and other rather technical subjects. It should, however, be clear from the preceding section that absolute continuous measure transformations have some interesting applications, albeit the transformations are a purely abstract mathematical concept. Thus, measure theory is inherently important if one wishes to obtain a deeper understanding of the theory of modern mathematical finance.

8.6 Problems

Problem 8.1

Compute the stochastic differential dX in the following cases:

  1. X(t) = exp(αt).
  2. X(t)= 0 t g(s)dW(s) , where g is an adapted stochastic process.
  3. X(t) = exp(αW(t)).

Problem 8.2

Use the Itō formula (8.10) to show that

0 t W 2 (s)dW(s)= 1 3 W 3 (t) 0 t W(s)ds. (8.92)

Problem 8.3

Let X(t) be a solution of (8.1).

  1. Assuming that σ(x)2 > 0, for all x, determine a transformation φ(X(t)) using Itō's formula such that the diffusion term in the SDE for dY(t) = dφ(X(t)) is constant.

Problem 8.4

Consider the two SDEs

dX(t) = αX(t)dt+σX(t)d W (1) (t), (8.93) dY(t) = βY(t)dt+δY(t)d W (2) (t). (8.94)

Compute the SDE for dφ (X, Y) in the following case:

φ(X,Y)=X Y .

Problem 8.5

Let W(t) = (W1(t), W2(t)) be a two-dimensional Wiener process and define

the distance from the origin

R(t)=|W(t)|= ( W 1 2 (t)+ W 1 2 (t)) 1/2 .

Assuming that W(0) = 0, show that

dR(t) W 1 (t)d W 1 (t)+ W 2 (t)d W 2 (t) R(t) + 1 2R(t) dt.

This process is called a Bessel process of order 2.

Problem 8.6

Consider the geometric Brownian motion

d X t =μ X t dt+σ X t d W t , X 0 >0.(8.95)

  1. 1. Determine the solution to (8.95).
  2. 2. Determine the mean.
  3. 3. Determine the variance.

Problem 8.7

Consider the one-dimensional SDE

d X t =(θ+η X t )dt+ρd W t .(8.96)

  1. 1. Solve this SDE.
  2. 2. Determine the mean.
  3. 3. Determine the variance.

Problem 8.8

Consider the nonautonomous SDE

dX(t)=( 2 1+t X(t)+σ (1+t) 2 )dt+σ (1+t) 2 dW(t);X( t 0 )=a.(8.97)

  1. Show that the fundamental solution to (8.97) is

    Φ t, t 0 = ( 1+t 1+ t 0 ) 2 .

  2. Determine the general solution to (8.97).

Problem 8.9

Consider the SDE on t ∈ [0, T] defined by

dX(t)=μ(t,X(t))dt+σ(t,X(t))dW(t)(8.98)

with starting value X(0) = u. Show that the dynamics of the Bridge Diffusion Process when X(T) = v is given by

dX(t)=(μ(t,X(t))+[σ σ T ](t,X(t)) x log p t,T (X(T)=v|x))dt(8.99)

+σ(t,X(t))dW(t).(8.100)

Bridge processes are very useful when deriving Monte Carlo-based estimators for parameters; cf. Section 13.5.1.

1The functions μ and σ will, in general, depend on a p-dimensional parameter vector θ ∈ Θ ⊆ ℝp, where Θ may be some constrained subset of ℝp. For notational convenience this parameter dependency will be suppressed in this chapter.

2These conditions do not immediately generalize to higher dimensions.

3In compact form our statistical model can be expressed as 〈[ℙθ]θεΘ⊆R, Ω,,X〉.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset