Chapter 8
Having established stochastic calculus in the Itō sense in the last chapter, we are now prepared to consider stochastic differential equations. For ease of notation, we shall in general only state the important results for univariate SDEs, but a few results will be generalized to multivariate SDEs.
We repeat that the notion of stochastic differential equations (SDEs) is merely a shorthand notation for stochastic integral equations. The latter may be defined in several ways, but we restrict our discussion to stochastic integrals in the Itō sense. Unfortunately, this implies that the well-known chain rule for variable transformations must be replaced by the so-called Itō formula, which will be introduced in the multivariate case. This formula may be used to obtain closed form solutions of some SDEs. Besides, it just makes Itō stochastic calculus more tedious.
In the following exposition to stochastic differential equations, we shall only use the Wiener process as the driving noise process. We recall that the Wiener process is both a Markov process and a martingale, and that the mean of the stochastic integral (in the Itō sense) of any square integrable, adapted process with respect to a Wiener process, is zero.
Stochastic differential equations driven by, e.g., a Poisson process (or jump processes, counting processes or marked point processes) are gaining ground in the financial literature (Cont and Tankov [2004] for a gentle overview). However, a considerable extension of the measure-theoretical concepts of adaptedness and predictability is required, which is beyond the scope of this book. It is duly noted that the topics covered in this chapter may be generalized to cover the very general class of square integrable processes (see e.g. Björk [2009], Karatzas and Shreve [1996], Ikeda and Watanabe [1989] for details).
The remainder of this chapter is organized as follows: Section 8.1 introduces stochastic differential equations. Section 8.2 considers analytical solution methods. Section 8.3 considers a link between parabolic partial differential equations (PDEs) and SDEs, which we shall use later in order to avoid solving such PDEs. Section 8.4 introduces continuous measure transformations, which will be used in later chapters.
We assume the existence of a probability space (Ω, ℱ, ℙ), where ℱ is a σ-algebra on the sample space Ω of possible outcomes, and (Ω, ℱ) is a measurable space and ℙ: ℱ ↦ [0,1] is a probability measure. Let the drift μ: ℝ ↦ ℝ and the diffusion σ: ℝ ↦ ℝ be Borel-measurable functions1 and assume that Xt: Ω ↦ ℝ is a solution to the time-homogeneous Itō stochastic differential equation
dX(t)=μ(t,X(t))dt+σ(t,X(t))dW(t),X(0)=x0(8.1)
where {W(t), t ≥ 0} is a standard Wiener process defined on the probability space (Ω, ℱ, ℙ) equipped with the natural filtration {ℱ(t)} generated by W(t).
The standard Wiener process is defined in Definition 7.1; the concepts of filtration, martingales and adaptedness are defined in Definitions 7.2, 7.3 and 7.4. Please refer to Appendix A for a detailed discussion of these concepts.
Let us give a number of examples to illustrate the following discussion.
Example 8.1 (The Wiener process).
Consider the Wiener process
dX(t)=σdW(t),X(0)=x0(8.2)
where σ is the standard deviation of the process and x0 is a deterministic initial condition, which is short for
X(t)=x0+t∫0σdW(s).
From the definition of the Wiener process (Definition 7.1), it immediately follows that
X(t)=x0+σ(W(t)−W(0))=x0+σW(t).
Next we compute the mean of X(t), i.e.
E[X(t)]=E[x0+t∫0σdW(s)]=x0
which follows from (7.37). The variance is given by
Var[X(t)]=Var[x0+t∫0σdW(s)]=σ2E[(t∫0dW(s))2]=σ2t∫0E[12]ds=σ2t,
where we have used the Itō isometry property (7.40). This shows that Var[X (t)] ↦ ∞ ast t → ∞. However the process is still bounded in finite time.
Example 8.2 (Wiener process with drift).
Let us compute the mean and variance of X(t), where X(t) is the solution to
dX(t)=μdt+σdW(t),X(0)=x0
where μ and σ are some constants. This SDE corresponds to
X(t)=x0+t∫0μds+t∫0σdW(s).
As in the previous example, we get
E[X(t)]=x0+E[t∫0μds]+E[t∫0σdW(s)]=x0+μt,Var[X(t)]=Var[t∫0σdW(s)]=σ2E[(t∫0dW(s))2]=σ2t.
We see that the mean of X(t) has a linear trend (or drift).
Example 8.3 (Stochastic exponential growth).
Consider the SDE
dX(t)=μX(t)dt+σdW(t),X(0)=x0(8.3)
where μ and σ are constants, which may describe unlimited growth in biological systems or a stochastic money market account.
If we take expectations in the adjacent stochastic integral equation, we get
E[X(t)]=x0+E[t∫0μX(s)ds]+E[σt∫0dW(s)].
Of course, the last term equals zero. Using Fubini's theorem (which we neither state nor prove here), we may exchange the expectation and integration operators, i.e.,
E[X(t)]=x0+E[t∫0μX(s)ds]=x0+μt∫0E[X(s)]ds.
Compared to the last two examples the problem is now that E[X(t)] exists on both sides of the equation. A standard trick is to introduce m(t) = E[X(t)] and then take the expectation and derivative with respect to time t on both sides, i.e.,
dm(t)dt=˙m(t)=μm(t);m(0)=E[X(0)]
which clearly has the solution
E[X(t)]=m(t)=m(0)eμt.
We see that E[X(t)] grows exponentially ast t → ∞.
Considering the slightly more complicated Geometric Brownian Motion (GBM)
dX(t)=αX(t)dt+σX(t)dW(t),(8.4)
where α and σ are positive constants, it is not clear if there is existence and uniqueness of the solution for all t ≥ 0 or if the solution might blow up with positive probability in finite time. Along the same lines we must examine whether it is possible to determine a closed form solution or not. In the former case, we may have to impose some restrictions on the functions μ and σ in (8.1) in order to obtain existence of the solution.
It is an interesting result that the answers to these questions only depend on the properties of the infinitesimal characteristics μ and σ in (8.1) (and possibly the initial condition X(0)).
As for ordinary differential equations (ODEs) Lipschitz and bounded growth conditions must be imposed on the drift and diffusion terms in order to obtain existence and uniqueness of solutions.
We must distinguish between weak and strong solutions to (8.1). A strong solution is obtained if the driving Wiener process is given in advance as a part of the problem such that the obtained solution to (8.1) is ℱ(t)-adapted, where ℱ(t) is the σ-algebra generated by the Wiener process. On the other hand, if we are just given the infinitesimal characteristics μ and σ in advance and the solution should apply for all possible Wiener processes, then the obtained solution is called a weak solution. It is clear that a strong solution is also a weak solution, because the particular Wiener process W(t) that resulted in the strong solution is just one of infinitely many Wiener processes that will give a weak solution. The converse is not true in general.
Theorem 8.1 (Strong uniqueness).
Suppose that the infinitesimal characteristics μ(x) and σ(x) are locally Lipschitz-continuous in the state variable; i.e., for every integer n ≥ 1 there exists a constant Cn such that for every t ≥ 0, |x| ≤ n and |y| ≤ n:
|μ(x)−μ(y)|+|σ(x)−σ(y)≤Cn|x−y|.(8.5)
Then strong uniqueness holds for (8.1).
Proof. Omitted. See Karatzas and Shreve [1996].
Let us consider an example that does not satisfy the condition (8.5).
Example 8.4
It is easy to verify that the differential equation
dxdt=3x2/3
has several solutions, for any a > 0,
x(t)={0for t≤a,(t−a)3for t>a.
This ODE is excluded as μ(x) = 3x2/3 does not satisfy (8.5) for x = 0.
We need an additional assumption in order to obtain existence and uniqueness of the solutions of (8.1).
Assumption 8.1 (Linear growth).
The functions μ and σ satisfy the usual linear growth condition
|μ(x)|+|σ(x)|≤K(1+|x|),∀x∈ℝ(8.6)
where K is a positive, real constant.
Example 8.5
The differential equation
dxdt=x2(t),x(0)=1
corresponding to μ(x) = x2 has the solution
x(t)=11−t;0≤t<1.
Thus it is impossible to find a solution for all t. This is due to the fact that μ(x) = x2 does not satisfy Assumption 8.1.
Next consider an example of a SDE.
Example 8.6 (Trespassing in a minefield).
Consider as an eXample of the process which satisfies (8.5), but not (8.6)
dX(t)=−12exp (−2X(t))dt+exp (−X(t))dW(t).
For X(t) < 0, we get exponential growth, which is faster than linear growth, and (8.6) is not satisfied. It may be shown that the solution is given by
X(t)=ln (W(t)+exp (X(0))).
It can be seen that the solution blows up when W(t) < − exp(X(0)), as we would have to compute the natural logarithm of a negative number! If we define the (stopping) time τ(X(0), ω) by
τ(X(0),ω)=inf {t≥0:W(t,ω)=−exp (X(0,ω))},ω∈Ω
it is clear that the solution only eXists up to time τ(X(0), ω). This explosion time depends on the stochastic initial condition and the actual trajectory of the driving Wiener process.
Example 8.7 (Geometric Brownian motion).
Consider the process given in (8.4). In this case an explosion time e may be defined by
e=inf {t≥0:X(t)∈{0,∞}}(8.7)
which states that the explosion time e is the first (i.e., smallest) time, where the process X(t) hits the boundary 0 or takes the value of ∞. Note that it is also critical if X(t) attains the value 0 because the process X(s) will remain at zero for s ≥ t. The value of X(t) as t → ∞ depends on the parameters μ and σ as follows (this is illustrated in Example 8.10):
where a.s. is an abbreviation of almost surely. It may, however, be shown that X(t) does not take either the value 0 or ∞ in finite time. Hence the geometric brownian Motion does not explode. This is also clear as the infinitesimal characteristics are linear in X(t) and thus fulfils the Lipschitz condtions (8.5) and, in particular, the linear growth condition (8.6).
It may be shown that the conditions in Theorem 8.1 and Assumption 8.1 ensure the existence and uniqueness of solutions of (8.1). In particular (8.6) ensures that the solution does not explode in finite time. These assumptions may be generalized to the multivariate case (Karatzas and Shreve [1996]).
For one-dimensional processes (8.1), the assumptions (8.5) and (8.6) are not necessary to ensure nonexplosive solutions. The assumptions can be weakened to the following theorem.
Theorem 8.2 (The Yamada conditions).
Suppose that μ and σ are bounded. Assume further that the following conditions hold:
Then the pathwise uniqueness of solutions holds for (8.1) and hence it has a unique strong solution.
Proof. Omitted. See Ikeda and Watanabe [1989].
Remark 8.1.
The usual Lipschitz condition requires that v(u) = K1u and K(u) = K2u, where K1, K2 ∈ ℝ+ are some constants, or even a unified condition for μ and σ as shown in, e.g. Rydberg [1997].
There exist solutions to (8.1) which do not fulfil the linear growth condition (8.6). Thus we need to determine other conditions that ensure the nonex-plosiveness of solutions, in particular conditions which are easier to check than those in Theorem 8.2.
Consider the scale function
s(x)=x∫cexp (−y∫c2μ(ξ)σ(ξ)dξ)dy(8.8)
for some fixed c ∈ ℝ+. This function may be used to establish sufficient conditions on the parameters θ ∈ Θ ⊂ ℝp so that the explosion will never occur.
Theorem 8.3 (Probability of an explosion).
Let X(t) be described by (8.1), the scale function s(x) by (8.8) and the explosion time e by (8.7).
If s(0) = −∞ and s(∞) = ∞, then the probability for no explosion in finite time is one
ℙ(e=∞)=1
for every X(t).
ℙ(lim t↑eX(t)=0)=ℙ(sup t<eX(t)<∞)=1
for every x. A similar assertion holds if the roles of 0 and ∞ are interchanged.
ℙ(lim r↑eX(t)=0)=1−ℙ(lim t↑eX(t)=∞)=s(∞)−s(x)s(∞)−s(0).
Proof. Omitted. See e.g. Ikeda and Watanabe [1989].
Thus, if case 1) in Theorem 8.3 can be verified, the SDE in (8.1) does not explode with probability 1 and the solution exists for all t. On the other hand, if case 1) is not fulfilled, (8.1) may explode with positive probability in finite time. A further generalization is required, and this is called Feller's test for eXplosions. We refer the interested reader to, e.g., Karatzas and Shreve [1996, Section 5.1] for details.
Remark 8.2.
For specific choices of μ and σ in (8.1) the integral (8.8) may be difficult to evaluate. However, the computations may be simplified considerably by a change of measure using Girsanov's Theorem (see later) provided that a unique equivalent Martingale measure eXists under the new measure (see e.g. Rydberg [1997] for the appropriate conditions in the one-dimensional case2). Informally speaking, Girsanov's Theorem simply introduces a measure that moves along with the deterministic drift and thus, under the equivalent martingale measure, the drift is removed.
The following example illustrates the use of the scale function.
Example 8.8.
For the process (8.2), there is no drift μ(X(t)) = 0 and the diffusion is simply σ(X(t)) = σ, i.e.,
s(x)=x∫cexp (−y∫c0dξ)dy=x∫cexp(0)dy=x−c.
Thus we get s(0) = −c, which implies that
lim c→∞s(0)=−∞
and
s(∞)=∞−c∀c∈ℝ+.
Thus condition 1) in Theorem 8.3 is fulfilled and the Wiener process (8.2) does not explode. This may seem contradictory, but it is important to stress that the trajectories of the Wiener process remain finite despite the fact that Var[X (t)] → ∞ as t → ∞. Note that ∞ does not belong to the real line ℝ.
In the remainder of this book (and the problems), we simply assume that a unique solution exists. For brevity we shall not, in general, list the restrictions on the parameters that must be imposed to ensure nonexplosiveness.
An important feature of Itō stochastic differential equations is stated in the next theorem, but first we need a definition.
Definition 8.1 (The C1,2 space).
Let ϕ: ℝ2 ↦ ℝ be a function of two variables. The function φ is said to belong to the space C1,2 (ℝ × ℝ) if φ is continuously differentiable w.r.t. the first variable and twice continuously differentiable w.r.t. the second variable.
Theorem 8.4 (The Itō formula).
Let X(t) be a solution to (8.1) and φ: ℝ2 ↦ ℝ be a C1,2(ℝ)-function applied to X (t)
Y(t)=φ(t,X(t)).(8.9)
Then the following chain rule applies
dY(t)=[∂φ∂t+μ∂φ∂X(t)+12σ2∂2φ∂X(t)2]dt+σ∂φ∂X(t)dW(t)(8.10)
where the functions φ and σ are as defined prior to (8.1).
Proof. For notational brevity, we will leave out the argument in φ(t, X(t)), X(t) and W(t) in this ad hoc proof. A second order Taylor expansion of dφ gives
dφ=∂φ∂tdt+∂φ∂xdX+12∂2φ∂x2(dX)2+12∂2φ∂t2(dt)2+∂2φ∂t∂xdtdX.
From (8.1), we get
Compared to terms with dt and dW, the terms containing (dt)2 and (dt)(dW) are insignificant while (dW)2 ∽ ?(dt). Thus we get
where we have also used Metatheorem 1.
Remark 8.3 (Short form of the Itō formula).
By introducing the notation φt = ∂φ/∂t, etc., (8.10) may be written as
where we stress that φt should not be confused with φ(t).
Remark 8.4 (Additional term in the Itō formula).
As opposed to classical calculus, (8.10) contains the additional term , which makes Itō calculus more complicated for theoretical considerations, although solutions to (8.1) are both Markov processes and martingales.
Remark 8.5.
It follows from the last remark that the diffusion term from (8.1) enters the drift of (8.10). Another remarkable observation from (8.10) is that the transformed variable Y(t) is also described by an Itō diffusion process.
Example 8.9.
Consider the integral
Choose X(t) = W(t), which implies that dX (t) = dW (t), ie. μ = 0 and σ = 1. In addition choose the transformation . Then
Using (8.10), we get
This implies that
or in integral form
or
Example 8.10 (Geometric Brownian motion).
We wish to solve the SDE given by
This SDE is called the geometric Brownian motion and is considered extensively in mathematical finance as a model for interest rates and stock prices. This is mainly due to the fact that the solution X(t) is lognormally distributed and thus excludes negative interest rates (or populations in biology or concentrations in chemistry).
By introducing the transformation Y(t) = φ(t, X (t)) = ln(X (t)), we get
Inserting these in (8.10) we get
or
and, finally,
Let the state variable X(t) ∈ ℝn be described by the multivariate SDE
where μ(t, X(t)): ℝ × ℝn → ℝn, σ(t, X(t)): ℝ × ℝn → ℝn × ℝm and W(t) is an m-dimensional standard Wiener process. Note that n need not equal m.
Alternatively, Eq. (8.14) may be written as
For this process, we define the instantaneous covariances as
Consider the following generalization of Theorem 8.4.
Theorem 8.5 (The multivariate Itō formula).
Let X(t) be a solution to (8.14) and φ: ℝn ↦ ℝk be a C1,2(ℝ)-function applied to X(t)
Then the following chain rule applies
where φ = φ(t, X(t)), μ = μ(t, X(t)), etc.
Proof. Omitted, but it is similar to the proof of Theorem 8.4.
Remark 8.6.
The multivariate Itō formula may also be written as
where (dWi)(dWj) = δijdt (Kronecker's delta), i.e.,
The following example illustrates the use of Itō 's formula.
Example 8.11.
Consider the two-dimensional SDE
where α1, α2, σ1 and σ2 are constants, and W1, W2 are two uncorrelated, standard Wiener processes. (We have left out the time argument t for brevity.)
By introducing the transformation
in (8.19), we get
The difference between two uncorrelated Wiener processes W1 and W2 with standard deviations σ1 and a2, respectively, may be expressed as one Wiener process W with the standard deviation (as for normally distributed random variables). Thus
Note that (8.23) may be solved independently and the solutions are given on the form (8.13). Thus
Remark 8.7 (The sum of two Wiener processes).
From the Example, it follows that the sum of two standard Wiener processes W1 (t) and W2(t) may be written as one Wiener process
This important result, which we state here without a formal proof, also applies to the increments of the Wiener process, i.e.,
These results will be very useful in some problems and applications.
An alternative definition of SDEs that adhere to the classical calculus (e.g., the chain rule) is given by the Stratonovitch SDE
where and are Borel-measurable functions and the o-symbol is used to distinguish the Stratonovitch SDE from the Itō SDE (8.1). Although (8.27) does not define neither a Markov process nor a martingale (due to the definition of the Stratonovitch integral), this fact makes it unsuitable for, e.g., prediction and estimation purposes and it is more appropriate for theoretical work, such as existence and uniqueness theorems, stability analysis, bifurcation analysis (Baxendale [1994]) or Taylor series expansions (Kloeden and Platen [1995]).
Fortunately there is a link between the stochastic integrals in the Itō and Stratonovitch senses, namely
where μ, σ and are defined by (8.1) and (8.27), respectively. See, e.g., Kloeden and Platen [1995], Pugachev and Sinitsyn [1987], ⊘ksendal [2010] for further mathematical details, and Wang [1994], Nielsen [1996] for a discussion of the appropriate application of SDEs (Itō or Stratonovitch) in mathematical modelling.
Remark 8.8.
Note that (8.1) and (8.27) coincide provided that is independent of X(t), because ∂σ(X(t))/∂X(t) = 0 in this special, but important, case.
Generally, it is difficult to obtain closed form solutions to stochastic differential equations. However, the Itō formula, that in all other aspects complicates analytical calculations considerably, may be valuable as an intermediary step in obtaining closed form solutions to (8.1). Some examples along these lines will be given. As with linear ordinary differential equations, the general solution of a linear stochastic differential equation can be found explicitly.
Closed form solutions for a number of SDEs (linear and nonlinear) are listed in Kloeden and Platen [1995], where a very elaborate discussion of numerical solutions may be found as well.
The general form of a univariate linear stochastic differential equation is
where the coefficients μ1, μ2, σ1 and σ2 are given functions of time t or constants. We assume that these functions are measurable and bounded on an interval 0 ≤ t ≤ T such that the existence and uniqueness theorem from the preceding section applies and ensures the existence of a strong solution X(t) on t0 ≤ t ≤ T for each 0 ≤ t0 < T.
When all the functions are constant the SDE is said to be autonomous and its solutions are homogeneous Markov processes. Otherwise, the SDE is said to be nonautonomous. When μ2(t) ≡ 0 and σ2(t) ≡ 0, the Equations (8.28) reduce to the homogenous linear SDE
which clearly has the solution X(t) ≡ 0. The so-called fundamental solutiont, Φt,t0 which satisfies the initial condition Φt0,t0 = 1 is much more important, because any other solution may be expressed in terms of the fundamental solution. To determine Φt,t0, we consider the simple case where σ1(t) ≡ 0, i.e.,
where the Wiener process appears additively. In this case we say that the SDE is linear in the narrow sense.
Theorem 8.6 (Solution to a linear SDE in the narrow sense).
The solution of (8.31) is given by
Proof. The homogenous version (σ2(t) ≡ 0) of (8.31) is an ordinary differential equation
with the fundamental solution
Applying the Itō formula (8.10) to the transformation and the solution X(t) of (8.34), we get
as
The right hand side of (8.35) can be integrated giving
We have thus obtained the solution (8.32) as .
Remark 8.9.
Notice again that means 1/Φt,t0 and not the inverse function.
Theorem 8.7 (Solution to a linear SDE in the wide sense).
The solution to (8.28) is given by
where Φt,t0 is given as the solution to the SDE
Proof. Omitted. See Kloeden and Platen [1995, Section 4.3].
Theorem 8.8 (Moments of a linear SDE in the wide sense).
The mean m(t) = E [X(t)] of (8.28) satisfies the ordinary differential equation
and the second order moment P(t) = E[X2(t)] satisfies
Proof. By proceeding as in Example 8.3, Equation (8.38) is readily seen. In order to show (8.39), we apply the Itō formula (8.10) to the transformation φ(t, x) = x2, i.e.,
where the arguments have been left out for brevity as in the following equivalent stochastic integral formulation
By taking expectations the last term drops out; cf. (7.37). If we define P(t) = E [X2(t)] and take derivatives, we obtain
which equals (8.39).
Remark 8.10.
Recall that the variance Var[X (t)] may be determined from
In order to solve (8.39) the following result from calculus may be useful.
Remark 8.11 (A formula for solution of ODEs).
The solution to the ODE
where Ψ, ϑ: ℐ ↦ ℝ are continuous in the interval ℐ, is given by
where
As an example consider the SDE from Example 8.3 again.
Example 8.12.
Consider the Langevin equation
Without loss of generality, we assume that t0 = 0. From (8.33), we immediately get
and thus (8.32) yields the solution
which is called the Ornstein-Uhlenbeck process.
The mean m(t) = E[X (t)] is obtained from (8.38), i.e.,
The second moment P(t) should fulfill
Using Remark 8.11, we get
and insertion into (8.43) yields
The variance may be found as stated in Remark 8.10, i.e.,
and the stationary value is
Note that it is not just a σ2.
In this section we shall describe a close relationship between stochastic differential equations and parabolic partial differential equations (PDEs).
Consider the following Cauchy problem
where the functions μ(t, x), σ(t, x) and Φ(T, x) are given and we wish to determine the function F(t, x)
As opposed to solving (8.45) analytically, we shall consider a representation formula for the solution F (t, x) in terms of an associated stochastic differential equation.
Assume that there exists a solution to (8.45). Fix the time t and the state x. Let the stochastic process X(t) be a solution to the SDE
where s is now the running time.
Remark 8.12
(Same μ(·) and σ(·)). The functions μ(t, X(t)) and σ(t, X (t)) in (8.45) and (8.47) are the same — except for the fact that the running time variable in (8.47) is s.
If we apply the Itō formula (8.10) to the process F(s, X(s)) and write the result in stochastic integral form, we get
Let us further assume that the process
belongs to the space ℒ2[t, T]; see Definition 7.5. If we use that F(t,x) solves (8.45), then the ds integral drops out of (8.48). If we apply the boundary condition F(T,x) = Φ(x), and the initial condition X(t) = x, and take the expected value of the remaining parts of (8.48) then the last term also drops out; cf. (7.37). The only remaining term is
where the subscript t,x on the expectation operator is used to emphasize the fixed initial condition X(t) = x.
We state this important result in a theorem.
Theorem 8.9 (The Feynman–Kac representation).
Assume that F solves the boundary problem (8.45) and that the process
where X(t) is defined by (8.47). Then F has the stochastic Feynman–Kac representation
Proof. Follows from the preceding derivation.
Note that the theorem simply states that the solution to (8.45) is obtained as the expected value of the boundary condition.
Remark 8.13.
A major problem with this approach is that it is impossible to check the assumption (8.50) in advance as it requires some a priori information about the solution F to do so. At least two things can go wrong:
In this book, we shall assume that all the functions in question are “sufficiently integrable.” We shall not go into all the technical details (see e.g. Björk[2009], Øksendal [2010]).
Let us consider an example of this remarkable approach.
Example 8.13.
We wish to solve the following boundary problem in the domain [0, T] × ℝ:
where μ and σ are assumed to be constants.
It is readily seen that the associated SDE is given by
We recognize this as the geometric Brownian motion from Example 8.10 on page 148, where the solution was found to be
Using Theorem 8.9, we get the result
as the expected value of the Wiener increment W(T) − W(t) is zero.
We shall now consider a more general case.
Theorem 8.10 (The Feynman-Kac representation with discounting).
Let the functions μ, σ and Φ be given as above, and let r be a constant. The solution to
is given by
where the process X(t) is given by (8.47).
Proof. Omitted. See e.g. Björk [2009].
The Feynman-Kac representation theorems will be used extensively in the following chapters. Further generalizations and examples are to be found in the problems.
These theorems may be used to solve for the transition probabilities for SDEs in order to obtain the conditional and unconditional probability density functions (pdf) of X(t), where X(t) is the solution of (8.47). This is outside the scope of this book.
In this section we introduce the concepts of (probability) measures, the Radon–Nikodym derivative and the Girsanov theorem, which enables us to change (probability) measures in continuous-time models. The theory is much more complicated than in the discrete time case (as described in Chapter 3), so this exposition does not pretend to be complete. Whenever possible, mathematical rigour will be substituted by intuitive arguments.
Note that a measure transformation is an inherently mathematical concept, which greatly simplifies the pricing of financial derivatives, but it is very difficult to fully comprehend the concept.
The objective of this section is to provide the reader with an elementary understanding of the concept of absolute continuous measure transformations, which will be used extensively later to determine arbitrage-free prices of a large class of financial derivatives. This is due to the fact that there exists an intimate relation between arbitrage-free markets and absolute continuous measure transformations. A particularly interesting problem is the existence of equivalent martingale measures (EMM), because it may be shown that the existence of an EMM yields arbitrage-free markets and vice versa.
Intuitively, a measure is a notion that generalizes those of the length, the area of figures and the volume of bodies, and that corresponds to the mass of a set for some mass distribution throughout the space. Please refer to Appendix B for details.
Example 8.14 (Does 2 equal 1?).
Consider two independent, normally distributed stochastic variables X, Y with zero mean and variance 1. If we interpret (X,Y) as a point in ℝ2 then we can introduce polar coordinates (R, ϕ), which are also independent.
Now introduce the new variables and which are both N(0, 1)-distributed. It is clear that Z2 + W2 = X2 + Y2 = R2 such that
Obviously 2 ≠ 1 so there must be something wrong! The problem is that in both conditional expectations we condition on a null set, W = 0, which does not make any sense, whereas the expectation E[R2|X − Y = ν] makes sense for almost all ν, i.e., we should consider the expectation as an integral with respect to dν (as usual). Thus we need to consider conditional expectations in a wider sense, and this is exactly what measure theory and the Radon-Nikodym derivative enable us to do.
Let (X, ℱ, μ) be a measurable space and let f: X ↦ ℝ be a positive ℱ-measurable function such that
As an example consider a continuous stochastic variable X with the probability density function f(x). We may define the Lebesgue measure by dμ(x) = f(x)dx such that (8.53) takes the form
We may now define a new function v: ℱ → ℝ by
which is also a measure on (X, ℱ).
It follows directly that the measure v has the property
which means that the v has at least the same null sets as μ.
Definition 8.2 (Equivalent measures).
Let (X, ℱ) be a measurable space, and let μ and ν be measures on (X, ℱ). The measure ν is said to be absolute continuous with respect to μ if (8.55) is fulfilled. In short, we write ν ≪ μ. If both ν ≪ μ and μ ≪ ν are true, the measures are said to be equivalent and we write ν ~ μ.
Remark 8.14.
That two measures are equivalent simply means that they have the same null sets. Beside that there need not be any similarities.
Example 8.15.
As an example consider an oil tanker that has run aground and starts to leak oil. Let the space X be some limited area of the ocean (a subset of ℝ2). At each location x ∈ X, we define f(x) as the density of the oil and the measure μ(x) as the depth of the oil. Then the measure v(x) defined by (8.54) measures the amount of oil at location x. These measures are equivalent, because if there is not any oil at any depth at location x, expressed by μ(x) = 0, then there is indeed no oil at location x, which means that v (x) = 0, and vice versa. As there is a limited amount of oil in the tanker, (8.53) is obviously fulfilled.
Assuming that μ is given and we define the new measure v by (8.54), then ν is absolute continuous with respect to μ. A very important result attributable to Radon–Nikodym states that the converse is also true, namely that any measure ν, where ν ≪ μ, can be written on the form (8.54). We state this as a theorem without proof.
Theorem 8.11 (Radon—Nikodym).
Let (X, ℱ, μ) be a finite measurable space and let ν be a finite measure on (X, ℱ) such that ν ≪ μ. Then there exists a positive function f: X → ℝ which satisfies
The function f is called the Radon—Nikodym derivative of ν with respect to μ(on the σ-algebra ℱ). It is uniquely determined almost everywhere and we write
Example 8.16.
A simple example of absolute continuity is obtained if we let X be a finite set, i.e., X = [1,..., N] and define the σ-algebra by ℱ = 2X, i.e., the family of all subsets of X. Let the measure μ on (X, ℱ) be given by the point masses μ(n = ([n])), n = 1,..., N. The relation ν ≪ μ means that ν(n) = 0 for all n where μ(n) = 0. If we assume that ν and μ are given and that ν ≪ μ then the Radon—Nikodym derivative is simply found from
Note that the special case μ(n) = 0 and ν(n) ≠ 0 is excluded by ν ≪ μ.If, however, both μ(n) = ν(n) = 0 then we may define f(n) by
The function f(n) is not uniquely defined for the n where μ(n) = 0, but the set of these null point has the measure 0. We say that f(n) is uniquely determined almost everywhere (with respect to μ).
It is important to note that the concept of absolute continuity is linked to the specific σ-algebra that we are considering. If for example μ is defined on (X, ℱ) and ℱ ⊇ ? then it is possible that v ≪ μ on (X, ?) is true, while it is not true that v ≪ μ on (X, ℱ).
Example 8.17.
Consider the set X = [1,2,3] and the measure
and the σ-algebras ℱ = 2X and ? = [X, ∅, [1], [2,3]]. It is clear that v ≪ μ is not true on ℱ because v (2) ≠ 0 while μ(2) = 0. On the other hand, we have v ≪ μ on ? with the Radon—Nikodym derivative
By comparing ℱ and ?, it is clear that the absolute continuity property may be lost, if we consider a finer σ-algebra. The σ-algebra ? cannot distinguish between [2] and [3].
We shall now consider measure transformations on filtered probability spaces and we assume that the probability space (Ω, ℱ, ℙ) augmented by the filtration ℱ(t) is given on the time interval [0, T], where T is some fixed time. Assuming that we have a non-negative ℱ(T)-measurable stochastic variable LT, we may construct a new measure ℚ on (X, ℱT) by
and if we further have that
then ℚ is a new probability measure on (X, ℱ(T)).
Measure transformations of this kind are closely related to martingale theory. Let ℙt and ℚt denote the restrictions of ℙ and ℚ on ℱ (t), which implies that knowledge about the probability measures is only based on information up to and including time t. Then ℚt is absolute continuous with respect to ℙt for all t, and the Radon—Nikodym Theorem 8.11 guarantees the existence of a stochastic process [Lt; 0 ≤ t ≤ T] defined by
It also follows that Lt is adapted. Furthermore, we shall now show that Lt is also a martingale with respect to (ℱ(t), ℙ).
Theorem 8.12.
The stochastic process Lt is a (ℱ(t), ℙ)-martingale.
Proof. We need to show that
which is indeed the martingale property; namely that the expected value at time t of a stochastic variable L at some future time T, t ≤ T, is simply the expected value of L based on the information up to time t.
In other words we need to show that for all F ∈ ℱt, we have
which follows from the following argument: As F ∈ ℱ (t) it follows from (8.62) that
where the latter is due to ℚt = ℚT on ℱ(t). This simply states that our information about the probability measure ℚT given the information set ℱ(t) is limited to the restricted probability measure ℚt. As the filtration is increasing F ∈ ℱt ⊆ ℱT, we finally get (8.64).
Remark 8.15 (Restricted probability measures).
Think of a restricted probability measure ℙt in the following way: Assume that we gather information about, say, stock prices in time. Each time we observe a price we obtain more information about the probability density function (pdf) of stock prices (by e.g., drawing a histogram). As t → ∞, we obtain complete information about the pdf and our knowledge is no longer restricted to ℙt.
It is sometimes convenient to exchange probability measures as we did in Chapter 3 to compute arbitrage-free prices. We recall that the price of any financial derivative may be expressed as the expected value of a (properly discounted) payoff function under an equivalent martingale measure ℚ, Thus we need to establish a relation between expectations under different measures. We will need an important term before we can state the main result.
Definition 8.3 (The L1-space).
Let an integrable stochastic variable X be defined on the probability space (Ω, ℱ, ℙ). If
then X is said to belong to the class L1. We write X ∈ L1 (Ω, ℱ, ℙ).
Theorem 8.13 (Expectation under the ℚ-measure).
Let the probability space (Ω, ℱ, ℙ) and a stochastic variable X ∈ L1(Ω, ℱ, ℙ) be given. Let ℚ be another probability measure on (X, ℱ) where ℚ ≪ ℙ with the Radon—Nikodym derivative given by
Assume that X also belongs to L1 (Ω, ℱ, ℚ) and that a ? is a σ-algebra such that ? ⊆ ℱ. Then
Proof. Omitted. See Björk [2009].
We may apply this theorem to characterize martingales under the ℚ-measure in terms of the characteristics under the ℙ-measure.
Theorem 8.14.
Consider the probability space (Ω, ℱ, ℙ) augmented by the filtration ℱ(t) on the time interval [0, T]. Let ℚ be another probability measure such that ℚT ≪ ℙT and define the process L as in (8.62). Assume that M is an ℱ(t)-adaptedprocess with Eℚ [M(t)] < ∞ for all t ∈ [0, T] such that
Then M is a (ℚ, ℱ(t))-martingale.
Proof. Omitted. See Björk [2009].
Remark 8.16.
The theorem simply states (under some additional conditions) that if we apply the Radon—Nikodym derivative to a ℙ-martingale M then we get a ℚ-martingale. Thus (under some conditions) the martingale property is preserved. This is a very important result.
So far we have shown that is possible to introduce absolute continuous measure transformations from the objective probability measure ℙ (the real-world measure) to an equivalent martingale measure ℚ such that we can obtain arbitragefree prices of financial derivatives. We now show that such measure transformations affect the properties of the driving Wiener process and the infinitesimal characteristics of a stochastic differential equation.
As the mathematics is fairly complicated, one should at all times keep in mind that the objective is to choose a particular new measure ℚ such that we can obtain arbitrage-free prices.
The mathematical framework is as follows: We consider a Wiener process X(t) defined on the probability space (Ω, ℱ, ℙ) augmented by the natural filtration ℱ(t) for 0 ≤ t ≤ T, where T is some fixed time T (e.g., the maturity time of a bond or the exercise date of a call option on a stock). We introduce a nonnegative ℱ(t)-measurable stochastic variable LT with E[LT] = 1. We wish to exchange measures by
and consider the problem how this change of measure affects the ℙ-Wiener process.
Let us consider a univariate stochastic differential equation (defined on some probability space)
where X(t) is a ℙ-Wiener process.
Heuristically, the functions μ and σ may be interpreted as
where dY(t) is short for Y (t + dt) − Y(t). In particular for the ℙ-Wiener process we have
under the ℙ-measure. We wish to determine
under the ℚ-measure. To this end, we may use (8.66) from Theorem 8.13
where we must evaluate L at time t + dt, because we have defined dX (t) = X(t + dt) − X(t). From Theorem 8.12, we know that L is a ℙ-martingale such that the denominator in (8.75) is simply L(t). For the numerator we get
As L(t) is ℱ(t)-measurable, L(t) can move out of the first expectation, i.e.,
As dX(t) is a Wiener-increment with zero mean, we finally get
Thus (8.75) may be written as
This is as far as we can get in general, but for very particular choices of the likelihood process L(t), Equation (8.76) may be simplified considerably. We recall that L(t) is a ℙ-martingale and that we know the properties of the ℙ-Wiener process X. It is to be expected that (8.76) may be simplified if the likelihood process takes the form
It is by no means clear if there exist likelihood processes of the form (8.77). The process L(t) does indeed become a martingale if f ∈ ℒ2, but we have no a priori guarantee that L(t) remains non-negative for some choice of f. For now we shall just assume that an f(t) process exists and that L(t) remains non-negative. If we use (8.77) in (8.76), we get
where we have used that f(t) is ℱ(t)-measurable and that
because X(t) is a ℙ-Wiener process. If we now choose f(t) of the form
and insert this into (8.76), we get
as g(t) is also ℱ(t)-measurable.
Using a similar argument, it may be shown that
By comparing these last results with (8.69), we see that the process X has the infinitesimal characteristics μ(t) = g(t) and σ(t) = 1 under the ℚ-measure. Thus under the ℚ-measure, X(t) may be described by
where W(t) is a ℚ-Wiener process.
It is seen that the ℙ-wiener process obtains a drift term g(t)dt and a diffusion term dW(t) which is a ℚ-Wiener process. The function g(t) is called the Girsanov kernel. It plays a very important role in mathematical finance as we shall see.
We shall now formalize these results. We start with a small lemma.
Lemma 8.1.
Let g(t) be an ℱ(t)-adaptedprocess that satisfies
Then the equation
has the unique and strictly positive solution
Proof. Omitted. We leave it as an exercise for the reader.
Recall that it is important that Eℙ [L(T)] = 1 for ℚ to be a probability measure. It is also important to note that it is not guaranteed that L(t) defined by (8.80)–(8.81) may be applied as a Radon—Nikodym derivative because we do not know if L(t) satisfies the condition Eℙ [L(T)] = 1. If L(t) was a martingale (i.e., if we knew a priori that g(t)L(t) ∈ ℒ2) then the initial condition L(0) = 1 would ensure that L(t) is a martingale. Unfortunately, we can only state that L(t) is a supermartingale, i.e., Eℙ [L(T)] ≤ 1 for functions satisfying Lemma 8.1. We now state the main result in this section.
Theorem 8.15 (The Girsanov theorem).
Let X(t) be a (ℙ, ℱ(t))-Wienerprocess and let g(t) and L(t) be as defined in Lemma 8.1. Assume that
and define the probability measure ℚ by dℚ = L(T)dℙ on ℱ(t). Then the process W (t) defined by
becomes a (ℚ, ℱ(t))-Wienerprocess.
Proof. Omitted. See e.g. Björk [2009].
Remark 8.17.
Note that (8.83) on differential form is
or
which is similar to (8.78).
The assumption (8.82) is obviously very important and we now state a theorem (without a very difficult proof) that establishes necessary and sufficient conditions for g(t) such that (8.82) is satisfied.
Theorem 8.16 (The Novikov condition).
Assume that g(t) satisfies
then L(t) becomes a ℙ-martingale and, in particular, we have
In this section, we introduce the Girsanov measure transformation as the theoretical foundation of a modern application of the well-known Maximum Likelihood method.
The intuition behind the classical maximum likelihood approach is that
The modern view is that
In the modern view the problem is thus to determine the measure given only one set of observations X(t). To be specific, we fix the probability space (Ω, ℱ, ℙ), where the process X(t) is a Wiener process under the ℙ-measure. For each θ ∈ Θ ⊆ ℝ, where Θ is the admissible parameter set (for e.g., the exponential distribution only positive parameters are allowed Θ = ℝ+), we define the measure transformation
Next we define the measure Pθ by the Radon—Nikodym derivative
where ℱ(t)X is the natural filtration generated by the process X(t) up to and including time t. This is essentially the likelihood ratio as given in Newman-Pearsons lemma. We see that the likelihood ratio should be evaluated using our observations ℱ(t)X. To be specific, the quantity Lθ (t) should be maximized with respect to θ and we interpret the solution as the most probable or the most likely parameter given the observations.3
Our process X(t) is no longer a Wiener process under the ℙ-measure, but it is a Wiener process under the new ℙθ-measure. We say that X(t) is a ℙθ - Wiener process.
The Girsanov theorem with the Girsanov kernel θ = g states that these two measures are connected by
where Wθ (t) is a ℙθ-Wiener process, and that
Thus there is one measure associated with each ℙθ-Wiener process for θ ∈ Θ, but there is only one observed process X(t), 0 ≤ t ≤ T, where T is some finite time.
Example 8.18 (Maximum likelihood estimation 1).
Assume that we wish to estimate the parameter θ in the process
The L(t)-process is given by
which according to (8.81) has the solution
As usual, we compute
and solve
thus the maximum likelihood estimate of θ is
where the notation emphasizes that the estimate of θ is based on ℱ(t).
Consider a slightly more complicated example.
Example 8.19 (Maximum likelihood estimation 2).
Consider the Langevin equation
where W(t) is a ℙ-Wiener process. Assuming that we have observations of X(t) for 0 ≤ t ≤ T, we wish to estimate the parameter θ.
The associated likelihood process (g(t) = θX(t)) is
Another way of posing the estimation problem is to state that we wish to determine the measure Pθ that maximizes the likelihood ratio
The likelihood process Lθ (t) should fulfill the condition (8.82), i.e.,
in order for Lθ (t) to be a probability density function. In addition the process should fulfil the square integrability condition
This condition is fulfilled under the assumption that X(t) ∈ ℒ2.
The Girsanov Theorem 8.15 states that
where Wθ(t) is a ℙθ-Wiener process.
The solution to (8.91) is, cf. (8.81),
Using the standard ML-approach, we get
which has the solution
where we have used the result from Example 8.9. The notation is used to emphasize that the estimate of θ is based on information up to time t.
These examples should just illustrate an application of the Radon—Nikodym derivative and the Girsanov theorem. From these examples, it should also be clear that this approach is not immediately applicable for empirical work when dealing with more complicated models, although the approach in Beskos et al. [2006] is building on this idea. More general methods will be introduced in Chapter 13.
Some of the material in this chapter is inspired by the very readable Björk [2009]. A more thorough treatment is given by, e.g. Arnold [1974], Kloeden and Platen [1995], Øksendal [2010]. In particular the monograph by Kloeden and Platen [1995] covers a large number of interesting topics — also of practical interest. The often referenced books by Karatzas and Shreve [1996], Ikeda and Watanabe [1989], Doob [1990] are also recommended, although they require some understanding of measure theory and other rather technical subjects. It should, however, be clear from the preceding section that absolute continuous measure transformations have some interesting applications, albeit the transformations are a purely abstract mathematical concept. Thus, measure theory is inherently important if one wishes to obtain a deeper understanding of the theory of modern mathematical finance.
Problem 8.1
Compute the stochastic differential dX in the following cases:
Problem 8.2
Use the Itō formula (8.10) to show that
Problem 8.3
Let X(t) be a solution of (8.1).
Problem 8.4
Consider the two SDEs
Compute the SDE for dφ (X, Y) in the following case:
Problem 8.5
Let W(t) = (W1(t), W2(t)) be a two-dimensional Wiener process and define
Assuming that W(0) = 0, show that
This process is called a Bessel process of order 2.
Problem 8.6
Consider the geometric Brownian motion
Problem 8.7
Consider the one-dimensional SDE
Problem 8.8
Consider the nonautonomous SDE
Show that the fundamental solution to (8.97) is
Problem 8.9
Consider the SDE on t ∈ [0, T] defined by
with starting value X(0) = u. Show that the dynamics of the Bridge Diffusion Process when X(T) = v is given by
Bridge processes are very useful when deriving Monte Carlo-based estimators for parameters; cf. Section 13.5.1.
1The functions μ and σ will, in general, depend on a p-dimensional parameter vector θ ∈ Θ ⊆ ℝp, where Θ may be some constrained subset of ℝp. For notational convenience this parameter dependency will be suppressed in this chapter.
2These conditions do not immediately generalize to higher dimensions.
3In compact form our statistical model can be expressed as 〈[ℙθ]θεΘ⊆R, Ω,ℱ,X〉.