Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 7

Stochastic calculus

In this and the following chapter, stochastic differential equations will be formally introduced. This exposition to stochastic calculus does not pretend to be complete. The presentation will be guided by intuition, and important topics and results from a practitioner's point of view will covered at a reasonable mathematical level. General measure theory and other technicalities of a (purely) mathematical interest will be kept at a minimum, but the reader is referred to Arnold [1974], Karatzas and Shreve [1996], Ikeda and Watanabe [1989] and Øksendal [2010] for a detailed account. It should be emphasized that the material in this chapter is not only of interest in mathematical finance. To stress the broad applicability, this chapter does not contain new financial concepts or ideas. A detailed account of these are deferred to the following chapters.

As the successful application of stochastic differential equations in mathematical modelling requires quite a substantial mathematical and statistical setup, we shall now argue why we should bother to consider them.

Application of the nonparametric methods (introduced in Chapter 6) on financial time series revealed some characteristics (e.g., heteroscedasticity) which linear time series models cannot explain, because their conditional mean functions are linear and their conditional variance functions are constants. This is clearly at odds with the small scale empirical studies reported in these notes (and the adjacent exercises) and the large scale studies reported in the open literature. A large number of nonlinear time series models were introduced (in Chapter 5) to model heteroscedasticity. In particular, the GARCH-type models and their numerous extensions performed reasonably well. However, there are a number of important reasons for using differential equations augmented by some kind of randomness or stochasticity.

It is difficult to interpret the parameters of, say, an ARCH(3) model, whereas the embedded parameters in a stochastic differential equation model may have some physical or financial interpretation. A formal relationship between some SDEs and GARCH models may be derived, but that is outside the scope of this book.
Numerous financial products (stocks, foreign exchange rates, etc.) are traded very often or very irregularly on the markets. Thus a reasonable approximation is to use continuous-time models, and stochastic differential equations provide a framework for describing heteroscedasticity.
Beside, being continuous in time, stochastic differential equations are also continuous in state, e.g., a stock price may be any positive, real number. As opposed to the finite number of states ωi. considered in Chapter 3, the uncertainty associated with a future stock price is modelled by considering a continuous distribution. Although stock prices are often quoted in ticks or units of, say, $1/8, we shall consider the number of possible prices as being practically infinite. See Epps [1996] for a discussion of the discrete state case.

Stochastic differential equations entail the best of two worlds, i.e., a combination of physical knowledge (laws of motion, preservation of energy etc.) that may be used to develop a deterministic model of the system and statistical methods for parameter estimation and model validation. This allows the modeller to model causality as well as correlation, where causality may be considered as superior to the correlation functions used in traditional time series analysis. There are a number of disadvantages associated with the use of SDEs; one major disadvantage is the advanced probability theory involved. From an empirical point of view, it is by no means trivial to estimate parameters in SDEs, but we shall get back to that in later chapters.

The remainder of this chapter is organized as follows: Section 7.1 briefly considers adding stochasticity to dynamical systems. Section 7.2 informally introduces stochastic calculus while 7.3 considers stochastic integrals. Section 7.4 introduces concepts from stochastic processes and probability theory, and formally introduces Itō calculus. Finally, Section 7.5 provides a brief overview of jump processes and some convenient related mathematical tools.

7.1 Dynamical systems

Assume that we wish to model a general physical, chemical or technical system. Mathematical modelling of such systems often leads to the formulation of a system of coupled (nonlinear) differential equations, which may, in general, be written on the form

$\frac{d X (t)}{d t} = \dot{X} (t) = f (t, X (t)), (7.1)$

where f(t, X(t)) describes the time-directed evolution of the so-called state variables X(t) ∈ ℝ n. The state variables describe the state of the system at time t in the state space.

The derivation of these equations is often based on a number of conceptual, mathematical and numerical approximations and the validity of these are difficult to evaluate per se.

By adding a stochastic term to (7.1) to account for these approximations random differential equations are obtained as illustrated in these examples.

Example 7.1 (Money market account).

Consider the simple money market account introduced in Definition 2.2 on page 27, i.e.,

$\begin{array}{l} d B (t) = r (t) B (t) d t, (7.2) \\ B (0) = 1, (7.3) \end{array}$

where B(t) is the value of the money account at time t, and r(t) denotes the relevant (there are many different ones!) interest rate.

It is very likely that the interest rate evolves randomly over time, i.e., we have

$r (t) = \tilde{r} (t) + σ " n o i s e " (t) (7.4)$

where $\tilde{r} (t)$ is assumed to be deterministic. If we insert this in (7.2), we get

$d B (t) = (\tilde{r} (t) + σ " n o i s e " (t)) B (t) d t, B (0) = 1 (7.5)$

where σ denotes the standard deviation of the noise. The question is now how do we formalize the concept of “noise” such that (7.5) makes sense and how do we solve it?

Example 7.2 (Stock prices).

We have previously argued that the volatility of stock prices, foreign exchange rates and interest rates depend on the current level, i.e.,

$d S (t) = α S (t) d t + " n o i s e " (t) S (t) d t, S (0) = s (7.6)$

which is essentially similar to (7.5).

Example 7.3 (Simple Black-Scholes).

Consider a simple financial market with two assets:

1. A risky asset, where the price of the asset S(t) at time t is described by (7.6), and
2. a safe asset, namely the money market account (7.2).

We propose the model

$\begin{array}{l} d S (t) = α S (t) d t + " n o i s e " (t) S (t) d t, S (0) = s (7.7) \\ d B (t) = r (t) B(t) d t, B (0) = 1 (7.8) \end{array}$

We get the celebrated Black-Scholes model, when we choose the so-called Brownian motion for the noise process in (7.7). This model will be described in detail later.

The discussion above raises a number of questions about the mathematical and statistical nature of the added stochastic term. This chapter is devoted to answering these questions.

7.2 The Wiener process

The point of departure in our search for a formal definition of the noise terms in the previous examples will be the random difference equation (7.9) with ΔW (t) = W(t + Δt) − W(t).

$X (t + Δ t) - X (t) = μ (t, X (t)) Δ t + σ (t, X (t)) Δ W (t) (7.9)$

where W(t) is a normally distributed random variable with zero mean and a variance that is proportional to Δt. Furthermore W(t) is assumed to be independent of all prior values of the process Ws, s < t, and μ(·, ·) and σ(·, ·) are a priori known functions.

Remark 7.1 (Other driving processes).

The driving noise process W(t) in the random difference equation (7.9) need not be a normally distributed random variable. It could easily be, say, a Poisson process or a compound Poisson process, which could account for completely unpredictable phenomena, such as attacks on some currency in the foreign exchange markets or the effects of earthquakes. We will present a brief introduction to jump processes in Section 7.5.

In order to obtain a more mathematical description of (7.9), a more formal definition of the noise process W(t) is required. In particular, we need a process that generates mutually independent and identically distributed normal random variables with zero mean and a variance that is proportional to Δt. A definition that also makes sense when we consider the limiting behaviour of (7.9) as Δt tends to 0.

One possibility is to consider a Brownian motion, named after the Scottish botanist Robert Brown, who used the process to describe the irregular movements of pollen suspended in water. This random movement, usually attributed to the buffeting of the pollen by water molecules, results in a diffusion of the pollen in the water. Brownian motion is thus a physical example of a random and continuous stochastic process.

A standard Wiener process is an abstract mathematical description of the physical process of Brownian motion. The mathematical properties defining a Wiener process, {W(t), t ≥ 0}, are given in

Definition 7.1 (The Wiener process).

A stochastic process [W (t); t ≥ 0] is said to be a Wiener process if it satisfies the following conditions:

1. W(0) = 0 with probability 1 (w.p.1).
2. The increments W (t1) − W (t0), W (t2) − W (t1),..., W (tn) − W (tn−1) of the process for any partitioning of the time interval 0 ≤ t0 < t1 < ... < tn < ∞ are mutually independent.
3. The increments W(t) − W(s) for any 0 ≤ s < t are normally distributed with mean and variance, respectively,

$\begin{matrix} E [W (t) - W (s)] = 0, (7.10) \\ Var [W (t) - W (s)] = t - s, (7.11) \end{matrix}$

i.e., W(t) − W(s) ∈ N(0, t − s).
4. W(t) has continuous trajectories.

It follows from (7.10) that the mean of the process is zero for any time interval, whereas the variance grows unboundedly as the length of the time interval t − s is increased.

Using this definition of the Wiener process, we can write (7.9) as

$X (t + Δ t) - X (t) = μ (t, X (t)) Δ t + σ (t, X (t)) Δ W (t) (7.12)$

where

$Δ W (t) = W (t + Δ t) - W (t) . (7.13)$

Let us now try to formalize (7.9) slightly by dividing through by Δt and then letting Δt tend to 0. Formally we should obtain

$\dot{X} (t) = μ (t, X, (t)) + σ (t, X (t)) V (t), X (0) = x (7.14)$

where we have added an initial value x and introduced V(t) as the formal time derivative of the Wiener process.

Assuming that V(t) is a well defined process, it should now be possible to solve (7.12) for every realization or trajectory of V(t). It can be shown that the process V(t) is unfortunately not well defined as the Wiener process is nowhere differentiable, although it is continuous. For illustration consider the limit

$\lim_{h \to 0} \frac{E [{(W (t + h))}^{2}] - E [{(W (t))}^{2}]}{h} = \frac{t + h - t}{h} = 1.$

Thus in a mean square sense the derivative of the Wiener process W(t) is not the derivative process V(t) = W(t) as defined above.

The Wiener process is a Markov process as well as a martingale as we shall see later. The sample paths (realizations) of the process are continuous with probability one, but they are nowhere differentiable with probability 1 due to the (independent) increments (see e.g. Øksendal [2010] for a rigorous proof).

Another approach is to let Δt tend to zero in (7.12) without dividing through by Δt. Formally we get

$d X (t) = μ (t, X (t)) d t + σ (t, X (t)) dW (t), X (0) = x (7.15)$

and it is natural to interpret (7.15) as a shorthand notation for the following integral equation

$X (t) = x + \int_{0}^{t} μ (s, X (s)) d s + \int_{0}^{t} σ (s, X (s)) d W (s) . (7.16)$

The ds integral may be interpreted as an ordinary Riemann integral, whereas the natural interpretation of the dW(s) integral is as an Riemann-Stieltjes integral for every trajectory of W. Unfortunately this is not reasonable as it can be shown that the process W(t) is of unbounded variation, i.e. the dW (s) integral in (7.16) is divergent.

Strictly speaking, the notation in (7.15) does not make any sense as it describes the infinitesimal evolution of X(t), which is driven by a Wiener process with unbounded variation. We shall, however, use the notation (7.15) for convenience repeatedly in the following, but it should be remembered that it is only shorthand for (7.16).

The remaining questions are now

how do we formalize the stochastic integral in (7.16),
how do we define the adjacent stochastic calculus and
how do we analyze (7.15) in this framework?

7.3 Stochastic Integrals

Although the Wiener process has some simple probabilistic properties it is by no means simple to define stochastic integration with respect to a Wiener process, because the trajectory of a Wiener process is very odd. Let us list some of its peculiar properties

As a Wiener process is of unbounded variation, it will eventually hit every real value no matter how large or how negative.
Once a Wiener process hits a value, it immediately hits it again infinitely often, and then again from time to time in the future.
It does not matter what scale you examine a Wiener process on — it looks just the same. Thus a Wiener process or Brownian motion pertains to the same self-similarity property as fractals.

Nevertheless, we intend to introduce the stochastic integral

$I (t, ω) = \int_{0}^{t} g (s, ω) d W (s), (7.17)$

where g(t, ω) is some suitably, smooth (possibly random) function in the following scheme, which is identical to the definition of the Riemann integral:

Partition the time interval [0, t] into n subintervals of equal length, i.e. define the time instants 0 = t0 < t1 < ... < tn = t.
Define for each trajectory ω an approximate integral In (ω) by

$I_{n} (t, ω) = \sum_{k = 0}^{n - 1} g (τ_{k}, ω) [W (t_{k + 1}, ω) - W (t_{k}, ω)] (7.18)$

where τk is some arbitrarily chosen time in the interval [tk, tk+1).
Finally, we let n tend to infinity and hope that In (ω) to some limit I, which we shall use to define the integral (7.17).

The objective of the following discussion is to show that it is important where in the time interval [tk, tk+1[the function g(τk, ω) is evaluated. Recall that various choices of τk, ∈ [tk, ε tk+1) yield the same results in ordinary calculus. We shall now show that this does not hold for stochastic calculus.

As an example, let us consider the case g(t) = W(t), i.e. we wish to compute the stochastic integral

$I (t) = \int_{0}^{t} W (s) d W (s) (7.19)$

where we choose to compute the integral from t0 = 0 instead of the more general t0, because we may use that W (0) = 0 to obtain a shorter formula.

As a preparation it is convenient first to consider the quadratic variation of W(t) on the interval [0, t], i.e. we commence by considering the integral

$\int_{0}^{t} {(d W (s))}^{2} . (7.12)$

Thus we introduce the notation ΔWk = W(tk=1) − W(tk) and define the stochastic variable

$S_{n} = \sum_{k = 0}^{n - 1} {(Δ W_{k})}^{2} . (7.21)$

If the Wiener process was differentiable, we would expect that Sn would converge to zero as n tends to infinity, because the time interval [0, t] is finite. Let us introduce the subintervals Δt Δ tk=1 − tk, i.e. Δt = t/n. From Definition 7.1, it immediately follows that E [(ΔWk)2] = Δtk and thus

$E [S_{n}] = \sum_{k = 0}^{n - 1} E [{(Δ W_{k})}^{2}] = \sum_{k = 0}^{n - 1} Δ t_{k} = t .$

The variance of Sn is found by direct calculation

$Var [S_{n}] = \sum_{k = 0}^{n - 1} Var [{(Δ W_{k})}^{2}] = 2 \sum_{k = 0}^{n - 1} {(Δ t_{k})}^{2} = 2 n {(\frac{t}{n})}^{2} = \frac{2 t^{2}}{n}$

where it is used that (ΔWk)2 ∈ Δtkχ2 (1). It is well known that a sum of N χ2(1) distributed random variables is a χ2(N) distributed variable with mean N and variance 2N. In other words, we have

$Var [S_{n}] = E [{(S_{n} - E [S_{n}])}^{2}] = E [{(S_{n} - t)}^{2}] = \frac{2 t^{2}}{n}$

and thus

$\lim_{n \to \infty} E [{(S_{n} - t)}^{2}] = 0.$

In this case, we say that Sn converges towards t in a mean square sense or in the space L2(dℙ × dt). This result is the foundation of the so-called Itō formula, which plays a fundamental role in stochastic calculus as the stochastic counterpart of the well-known chain rule from ordinary calculus.

The main result may be restated in differential form as

${(d W (t))}^{2} = d t . (7.22)$

Formally this metatheorem does not make any sense, but it is worth noticing that it states that the square of a stochastic increment yields a purely deterministic property. Do, please, remember this result.

Let us return to the evaluation of (7.19). We proceed in a similar fashion as above by constructing sums of the form (7.21). We consider two different sums which evaluate the W(t) part at either the left hand side of the interval [tk, tk+1[, τk = tk, or the right hand side τk = tk+1, i.e.

$\begin{array}{l} A_{n} = \sum_{k = 0}^{n - 1} W (t_{k}) (W (t_{k + 1}) - W (t_{k})) (τ_{k} = t_{k}), (7.23) \\ B_{n} = \sum_{k = 0}^{n - 1} W (t_{k} + 1) (W (t_{k + 1}) - W (t_{k})) (τ_{k} = t_{k + 1}) . (7.24) \end{array}$

We immediately get the identities

$\begin{array}{l} A_{n} + B_{n} = W^{2} (t), (7.25) \\ B_{n} - A_{n} = \sum_{k = 0}^{n - 1} {(Δ W_{k})}^{2} = S_{n}, (7.26) \end{array}$

for n → ∞, where Sn is given by (7.21). It immediately follows that Bn − An → t in L2 as n → ∞. We therefore get the limits

$\begin{array}{l} A_{n} \to A, \\ B_{n} \to B, \end{array}$

where

$\begin{array}{l} A = \frac{W^{2} (t)}{2} - \frac{t}{2}, (7.27) \\ B = \frac{W^{2} (t)}{2} + \frac{t}{2} . (7.28) \end{array}$

These results show that the value of the stochastic integral (7.19) depends critically on the placement of τk in the interval [tk; tk+1), i.e. the integral depends on where the integrand is evaluated in the interval [tk; tk+1). Needless to say, this is not the case in ordinary calculus.

By choosing τk = tk, we get the enormously important Itō integral, which yields

$\int_{0}^{t} W (s) d W (s) = \frac{W^{2} (t)}{2} - \frac{t}{2} . (7.29)$

By choosing τk = tk+1, we get

$\int_{0}^{t} W (s) d W (s) = \frac{W^{2} (t)}{2} + \frac{t}{2} . (7.30)$

Note that in both cases, we get the additional term t/2 compared to ordinary calculus. Finally, choosing tk = (tk +tk+1)/2 yields the Stratonovich integral

$\int_{0}^{t} W (s) d W (s) = \frac{W^{2} (t)}{2}, (7.31)$

which is similar to classical calculus. However, there is a consensus that the Itō integral is the only appropriate integral for financial modelling.

7.4 Itō stochastic calculus

In this section we formally introduce the Itō stochastic integral. Therefore some concepts from probability theory will be repeated for convenience.

We assume the existence of a filtered probability space (Ω,ℱ,ℙ), where ℱ is a σ-algebra on the sample space Ω of possible outcomes, (Ω,ℱ) is a measurable space and ℙ: ℱ ↦ [0,1] is some probability measure.

Definition 7.2 (Filtration).

A filtration on (Ω,ℱ) is a family {ℱ(t)}t≥0 of σ-algebras ℱ(t) ⊂ ℱ such that

$ℱ (s) \subseteq ℱ (t) for 0 \leq s < t .$

Generally speaking, ℱ(s) denotes the set of events (or the information set) up to time s. The natural filtration {ℱ(t)}t≥0 is increasing and right continuous, i.e. at time t, 0 ≤ s < t, more information is available (or, at least, information is not lost) ℱ (s) ⊂ ℱ (t) than at time s and in the limit complete information is obtained ℱ (∞) = ℱ. Application of the natural filtration {ℱ(t)}t≥0 implies that information about X(t) in (7.15) must be deduced from observations of X(t) as opposed to, e.g., Y(t) = f(X(t)), where f : ℝ → ℝ is some nontrivial (possibly nonlinear) function.

Example 7.4.

Consider the function Y (t) = |X(t)|. Here, the value of Y(t) is known when knowing X(t), but the converse does not hold.

Remark 7.2.

Consider a stochastic variable X(t) as a function X(t): Ω → ℝ that maps the sample space Ω into ℝ. If {ω ε Ω: X(t, ω) ≤ x} ε ℱ for each x ε ℝ, then X(t) is said to be ℱ(t)-measurable.

Definition 7.3 (Martingale).

A stochastic process {X(t), t ≥ 0} on the probability space (Ω, ℱ, ℙ) is called a martingale with respect to a filtration {ℱ(t)}t≥0 if

1. X(t) is ℱ(t)-measurable for all t
2. E[|X(t)|] < ∞ for all t, and
3. E [X(t)|F (s)] = X(s) for alls < t.

Definition 7.4 (Adapted process).

The stochastic process X(t) is adapted to the filtration ℱ(t) if X(t) is an ℱ(t)-measurable random variable for each t ≥ 0.

Remark 7.3 (Adaptedness).

It is instructive to think of measurability and adaptedness in the sense that if a function g(t) is said to be ℱ(t)-measurable, then it essentially means that g(t) is known at time t.

Example 7.5.

A Wiener process W(t) that is adapted to a given filtration ℱ(t) possesses the property that

$W (t) - W (s) i s i n d e p e n d e n t o f ℱ_{s} . (7.32)$

The process W(t) is then said to be a ℱt-Wiener process.

Please, refer to the Appendix for a more detailed exposition to these concepts or consult the references given in the introduction to this chapter.

Definition 7.5 (The class ℒ2).

Let ℒ2[a, b] denote the class of processes g(s, ω) that satisfies the conditions:

The function g(s, ω) is ℱ(s)-adapted.
The integral

$\int_{a}^{b} E [{(g (s, ω))}^{2}] d s < \infty (7.33)$

is finite.

For some a ≤ b we now define the stochastic integral

$\int_{a}^{b} g (s, ω) d W (s) (7.34)$

for all g ε ℒ2 [a, b]. We shall only consider simple functions (to be defined below) and leave the generalization to the interested reader.

Assume that g is simple, i.e. there exist deterministic time instants a = t0 < t1 < ... < tn = b such that

$g (s, ω) = g (t_{k}, ω) for s ε [t_{k}, t_{k + 1} [$

where

$g (t_{k}, ω) ε ℱ (t_{k}) k = 0, ..., n .$

In other words g(tk, ω) is ℱ(tk)-measurable, i.e. g(tk) is known at time tk.

For a simple process g we define the stochastic integral by a sum similar to (7.23)

$\int_{a}^{b} g (s, ω) d W (s) = \sum_{k = 1}^{n - 1} g (t_{k}, ω) (W (t_{k + i}) - W (t_{k})) . (7.35)$

It is inherently important that we define the incremental Wiener process in terms of the forward differences W(tk+1) − W(tk).

Theorem 7.1 (Stochastic integration rules).

Let g and h be simple processes that satisfy (7.33) and let α, β be real numbers. The following rules apply

Stochastic integrals are linear operators

$\int_{s}^{b} (α g (s) + β h (s)) d W (s) = α \int_{a}^{b} g (s) d W (s) + β \int_{a}^{b} h (s) d W (s) . (7.36)$
The unconditional expectation of a stochastic integral when g ε ℒ2 [a, b] is zero

$E [\int_{a}^{b} g (s) d W (s)] = 0. (7.37)$
Stochastic integrals are measurable with respect to the filtration generated by the Wiener process, i.e.

$\int_{a}^{b} g (s) d W (s) i s ℱ (b) - m e a s u r a b l e . (7.38)$
Stochastic integrals when g ε ℒ2 [a, b] are martingales

$E [\int_{a}^{b} g (s) d W (s) | ℱ (a)] = 0. (7.39)$
The Itō isometry is a convenient way of computing variances when g ∈ sℒ2 [a, b]

$E [{(\int_{a}^{b} g (s) d W (s))}^{2}] = \int_{a}^{b} E [g^{2} (s)] d s (I t \bar{o} i s o m e t r y) . (7.40)$
It also applies to covariance

$E [(\int_{a}^{b} g (s) d W (s)) (\int_{a}^{b} h (s) d W (s))] = \int_{a}^{b} E [g (s) h (s)] d s . (7.41)$

Proof. That the Itō integral is a linear operator is trivial and is left as an exercise for the reader.

To make the notation less cumbersome, we introduce the entities

$g_{k} = g (t_{k}), Δ W_{k} = W (t_{k + 1}) - W (t_{k}), Δ t_{k} = t_{k + 1} - t_{k}, ℱ_{k} = ℱ (t_{k}) . (7.42)$

We get

$E [\int_{a}^{b} g (s) d W (s)] = \sum_{k = 0}^{n - 1} E [g_{k} Δ W_{k}] . (7.43)$

If we use the fact that the process gk is adapted to the filtration ℱ(tk), we get

$E [g_{k} Δ W_{k}] = E [E [g_{k} Δ W_{k} | ℱ (t_{k})]] = E [g_{k} E [Δ W_{k} | ℱ (t_{k})]], (7.44)$

where we have used the standard trick (iterated expectations) of introducing a conditioning argument and taken the expectation with respect to that argument. As the Wiener process has independent increments, we get

$E [g_{k} E [Δ W_{k} | ℱ (t_{k})]] = 0$

and we have proved (7.37).

Next we shall prove (7.40). By introducing the well-known sum, we get

$E [{(\int_{a}^{b} g (s) d W (s))}^{2}] = \sum_{i, j} E [g_{i} g_{j} (Δ W_{i}) (Δ W_{j})]$

where we need to consider two cases:

For i = j, we get

$\begin{array}{l} E [g_{i}^{2} {(Δ W_{i})}^{2}] & = E [E [g_{i}^{2} {(Δ W_{i})}^{2} | ℱ_{i}]] \\ = E [g_{i}^{2} E [{(Δ W)}^{2} | ℱ_{i}]] = E [g_{i}^{2} Δ t_{i}] \\ = E [g_{i}^{2}] Δ t . \end{array}$
For i ≠ j with, say i < j, we get

$\begin{array}{l} E [g_{i} g_{j} (Δ W_{i}) (Δ W_{j})] & = E [E [g_{i} g_{j} (Δ W_{i}) (Δ W_{j}) | ℱ_{j}]] \\ = E [g_{i} g_{j} (Δ W_{i}) E [(Δ W_{j}) | ℱ_{j}]] = 0 \end{array}$

as the Wiener increment has the conditional mean 0.

Thus we have

$E [{(\int_{a}^{b} g (s) d W (s))}^{2}] = \sum_{i, j} E [g_{i}^{2}] Δ t = \int_{a}^{b} E [g_{i}^{2} (s)] d s . (7.45)$

Equation (7.41) may be shown in a similar fashion. Eq. (7.38) follows immediately from the definition of the stochastic integral, and (7.39) is shown as (7.37).

Remark 7.4 (Itō isometry).

Note that (7.40) establishes an isometry between stochastic integrals and deterministic integrals. This is very useful for the calculation of variances.

Remark 7.5.

The rules in Theorem 7.1 may be extended to cover a larger class of functions than the simple functions considered above by considering Cauchy sequences in ℒ2 of simple functions, but we will not go into the details here.

Remark 7.6.

It is possible to extend stochastic integration to all adapted processes g which satisfy the condition

$ℙ [\int_{0}^{t} g^{2} (s) d s < \infty] = 1.$

For all such g it is not guaranteed that (7.37), (7.40) and (7.39) are valid, but the properties (7.38) and (7.36) still hold. These stochastic integrals are known as local martingales.

It is easy to show that the Wiener process is in itself an ℙ-martingale and it is a very important consequence of Theorem 7.1 that the martingale property is preserved with respect to integration of ℒ2-processes.

Theorem 7.2 (Continuous trajectories).

Assume that g ∈ ℒ2[0,t] for all t ≥ 0. Define the process X by

$X (t) = \int_{0}^{t} g (s) d W (s) . (7.46)$

Then X(t) is a martingale with continuous trajectories.

Proof. By direct calculation we get

$\begin{matrix} X (t) = \int_{0}^{t} g (u) d W (u) = \int_{0}^{s} g (u) d W (u) + \int_{s}^{t} g (u) d W (u) \\ = X_{s} + \int_{s}^{t} g (u) d W (u) . \end{matrix}$

Using (7.37) we get

$E [X (t) | ℱ (s)] = X_{s} + E [\int_{s}^{t} g (u) d W (u) | ℱ (s)] = X (s) .$

The continuity of the trajectories is difficult to prove, but it should be intuitively clear as the Wiener process lacks jumps.

7.5 Extensions to jump processes

It is possible to extend the theory on stochastic integration to discontinuous processes, Cont and Tankov [2004] being a good start. The simplest example of a discontinuous process with iid increments is the Poisson process.

Definition 7.6 (Poisson process).

A Poisson process is an integer-valued stochastic process {N (t), t ≥ 0} satisfying the following conditions:

N(0) = 0 with probability 1 (w.p.1).
The increments N(t) − N(u) is independent of N(s) − N(0) fort > u ≥ s > 0.
The distribution of N(t) − N(s) ε Po(λ(t − s)) where Po is the Poisson distribution and λ is the so-called intensity of the process.
The process is continuous in probability.

There are obvious similarities (and differences) between the Wiener process (7.1) and the Poisson process.

Jump processes are easier to analyse if we introduce some well-known transform methods (Fourier transforms, etc.).

Definition 7.7 (Characteristic function).

The Fourier transform of a random variable or process is called the characteristic function

$ψ_{X} (u) = E [e^{i u X}] . (7.47)$

Characteristic functions are incredibly useful in probability, as, e.g., the distribution of sums of iid random variables is computed using convolution of the densities. A simpler alternative is to use Fourier methods. This can be seen by computing the characteristic function for the sum

$ψ_{X_{1} + X_{2}} (u) = E [e^{i u (X_{1} + X_{2})}] = E [e^{i u X_{1}}] E [e^{i u X_{2}}] = ψ_{X_{1}} (u) ψ_{X_{2}} (u), (7.48)$

where we use the independence of the random variables to factor the expectation.

Example 7.6 (Gaussian).

The characteristic function for a Gaussian random variable X with mean μ and covariance Σ is given by

$ψ (u) = E [e^{i u X}] = e^{i μ^{T} u - \frac{1}{2} u^{T} Σ u} . (7.49)$

Example 7.7 (Poisson).

The characteristic function for a Poisson random variable with parameter λ is given by

$ψ (u) = e^{λ (e^{i u} - 1)} . (7.50)$

Example 7.8 (Compound Poisson process).

A compound Poisson process is defined as

$S (t) = \sum_{n = 1}^{N (t)} Y_{n} (7.51)$

where N(t) is a Poisson process and {Yn,n ε N} are iid random variables independent of N. The convention is that no terms are included in the sum before N(t) reaches one

$\sum_{n = 1}^{0} Y_{n} = 0. (7.52)$

The compound Poisson process is a nice model for large, unexpected, rare events such as government interventions, earthquakes, etc.

Theorem 7.3.

The characteristic function for a compound Poisson process is given by

$ψ_{S (t)} (u) = e^{λ t (ψ_{Y} (u) - 1)}, (7.53)$

where λ is the jump intensity and ψY (·) is the characteristic function for the jumps Y.

Proof. The characteristic function is computed, using iterated expectations as

$\begin{array}{l} ψ_{S (t)} (u) & = & E [e^{i u S (t)}] & (7.54) \\ = & E [E [e^{i u S (t)} | N (t)]] = E [E [e^{i u (X_{1} (t) + \dots + X_{N} (t))} | N (t)]] & (7.55) \\ = & E [{(ψ_{Y} (u))}^{N (t)}] . & (7.56) \end{array}$

Here, we recognize that this is in fact the probability generating function, g(z) = E[zN(t)] = eλt(z−1), for a Poisson random variable, evaluated at ψY(u), concluding the proof.

Compound Poisson processes, as well as Wiener processes, are special cases of a more general class of processes, namely Lévy processes.

Definition 7.8 (Lévy process).

A cadlag1 process {X(t), t ≥ 0} is called a Lévy process if it satisfies the following conditions

X(0) = 0 with probability 1.
The increment X(t) − X(u) is independent of X(s) − X (0) for t > u ≥ s > 0.
The increments are strictly stationary, i.e. $X (t + δ t) - X (t) \underline{\underline{d}} X (t) - X (t - δ t)$ .
The paths are continuous in probability,

$\lim_{h \to 0} ℙ (| X (t + h) - X (t) | ε) = 0. (7.57)$

Theorem 7.4 (Lévy-Khinchin representation).

Let {X(t)} be a Lévy process with a characteristic triplet (b, Σ, ν). Then

$E [e^{i u X (t)}] = e^{t φ (u)} (7.58)$

with the characteristic exponent

$ϕ (u) = i b^{T} u - \frac{1}{2} u^{T} Σ u + \int (e^{i u T_{x}} - 1 - i u^{T} x 1_{{| x | < 1}}) ν (d x) (7.59)$

where u, b ∈ ℝd, Σ is a non-negative d × d matrix and ν is a measure on ℝd with ν({0}) = 0 and ∫min(||x||, 1)ν(dx)< ∞

The first two parameters in characteristic triplet (b, Σ, ν) can be identified as the drift and diffusion in a Brownian motion with drift; cf. (7.49). The measure ν is called the Lévy measure and controls the jumps. It is defined, for some Borel set A ∈ ℬ(ℝd), as

$ν (A) = E [# {t \in [0, 1] : Δ X (t) \neq 0, Δ X (t) \in A}] . (7.60)$

We will see in Section 9.6 how characteristic functions can be used to value a large class of options, under rather general models.

Definition 7.9 (Merton).

The Merton model (Merton [1976]), is a simple jump process. The log spot price is modelled as a compound Poisson process with Gaussian ? (μ, δ2) jumps with intensity λ

$log S (t) = X (t) = log S (0) + γ t + σ W (t) + \sum_{n = 0}^{N (t)} Y_{n} . (7.61)$

The conditional distribution generated by the Merton model is a mixture of Gaussians. Option prices computed using the Merton model will therefore be a mixture of Black & Scholes prices.

It follows from Equation (7.49) and Equation (7.53) that the characteristic function (assuming S(0) = 1) is given by

$\begin{array}{l} E [e^{i u X (t)}] & = & e^{i γ t u - \frac{σ^{2} u^{2}}{2} t + λ t (e^{i μ u - \frac{δ^{2} u^{2}}{2}} - 1)} & (7.62) \\ = & e^{t (i γ u - \frac{σ^{2} u^{2}}{2} + λ (e^{i μ u - \frac{δ^{2} u^{2}}{2}} - 1))} & (7.63) \end{array}$

where the second line presents the characteristic exponent.

We can easily find how to choose the parameter γ such that the discounted process becomes a martingale. Evaluating the characteristic function in u = −i yields

$ϕ (- i) = E [e^{i u X (t)}] |_{u = - 1} = E [e^{X (t)}] = E [S (t)] . (7.64)$

Doing this for the Merton model gives

$E [e^{S (t)}] = exp [t (γ + \frac{σ^{2}}{2} + λ (e^{μ + \frac{δ^{2}}{2}} - 1))] (7.65)$

implying that

$γ = \tilde{r} = r - \frac{σ^{2}}{2} - λ (e^{μ + \frac{δ^{2}}{2}} - 1) (7.66)$

transforms the discounted price process into a martingale

Definition 7.10 (Variance Gamma process).

The Variance Gamma (VG) process (Madan and Seneta [1990]), is a time-shifted Wiener process, where the time shift is controlled by a Gamma process Γ(t;1;ν). The Variance Gamma process is then defined as

$X (t) = θ Γ (t, 1, ν) + σ W (Γ (t, 1, ν)) . (7.67)$

This definition is very useful for Monte Carlo simulations.

The characteristic function for a Variance Gamma process (Cont and Tankov [2004], Hirsa [2013]) is given by

$E [e^{i u X (t)}] = {(\frac{1}{1 - i u θ ν + σ^{2} u^{2} ν / 2})}^{t / ν} . (7.68)$

Lévy processes that are defined as time-shifted Brownian motions are commonly referred to as Subordinated Brownian motions.

Definition 7.11 (NIG process).

The Normal Inverse Gaussian (NIG) (Barndorff-Nielsen [1997]), is similar to the VG process, the difference being that the time shift process is an Inverse Gaussian (IG) process, rather than a Gamma process. The corresponding characteristic function is given by

$E [e^{i u X (t)}] = e^{(κ - σ \sqrt{\frac{κ^{2}}{σ^{2}} + \frac{θ^{2}}{σ^{2}} - {(\frac{θ}{σ^{2}} + i u)}^{2}}) t} . (7.69)$

Definition 7.12 (Time-shifted Lévy processes).

The processes Defined in Definition 7.9-7.11 all have iid increments, while it is well known that real world data typically exhibit time varying volatility. This can be achieved by another time shift, this time using an integrated, positive process. One of the most popular time shifts is to use an integrated Cox–Ingersoll-Ross (CIR) model (Cox et al. [1985]) (Stochastic differential equation will be introduced in Chapter 8). The Cox–Ingersoll-Ross model is given by the stochastic differential equation

$d y (t) = κ (η - y (t)) d t + λ \sqrt{y (t)} d W (t) . (7.70)$

It is well known that this process is positive. Integrating this process

$Y (t) \int_{0}^{t} y (s) d s (7.71)$

generates a time shift process.

A time-shifted Variance Gamma or NIG process would then be defined as

$Z_{V G - C I R} (t) = X_{V G} (Y (t)) . (7.72)$

The characteristic function can be derived (see Hirsa [2013]), arriving at

$E [e^{i u Z_{V G - C I R} (t)}] = ψ_{C I R} (- i log ψ_{V G} (u)) (7.73)$

which is rather similar to Equation (7.53). Finally, the characteristic function for the integrated CIR process is given by

$ψ_{C I R} (u) = E [e^{i u Y (t)}] = A (t, u) e^{B (t, u) y (0)}, (7.74)$

where

$\begin{array}{l} A (t, u) & = \frac{e^{\frac{κ^{2} η t}{λ^{2}}}}{{(cosh (\frac{γ t}{2}) + \frac{κ}{γ} sinh (\frac{γ t}{2}))}^{\frac{2 κ η}{λ^{2}}}} & (7.75) \\ B (t, u) & = \frac{2 i u}{κ + γ coth (\frac{γ t}{2})} & (7.76) \end{array}$

with

$γ = \sqrt{κ^{2} - 2 λ^{2} i u .} (7.77)$

Time-shifted Lévy processes provide a very good fit to market data (Lindström et al. [2008]).

The characteristic function can also be derived for some stochastic volatility models, most notably the Heston model (Heston [1993]).

Definition 7.13.

The risk-neutral version of the Heston stochastic volatility model is given by

$\begin{array}{l} d S (t) = r S (t) d t + \sqrt{V (t)} S (t) d W^{(s)} (t) & (7.78) \\ d V (t) = κ (θ - V (t)) d t + σ_{v} \sqrt{V (t)} d W^{(v)} (t) & (7.79) \end{array}$

where the driving Wiener processes are allowed to be correlated on an infinitesimal scale dW(S)(t)dW(V)(t) = ρdt.

It can be shown that the characteristic function for the logarithmic stock price, X(t) = log(S(t)), is given by

$ψ_{H e s t o n} (u) = exp (i u (log (S (0)) + r t) + C (u) + D (u) V (0)) (7.80)$

where

$\begin{array}{l} C (u) & = & \frac{κ θ}{σ_{v}^{2}} [(κ - ρ σ_{v} u i - d) t & (7.81) \\ - & 2 log (\frac{(κ - ρ σ_{v} u i) (1 - e^{- d t}) + d (e^{- d T} + 1)}{2 d})] \\ D (u) & = & (1 - e^{- d t}) \frac{- i u - u^{2}}{(κ - ρ σ_{v} u i) (1 - e^{- d t}) + d (e^{- d t} + 1)} & (7.82) \\ d & = & \sqrt{{(ρ σ_{v} u i - κ)}^{2} + σ_{v}^{2} (u i + u^{2})} . & (7.83) \end{array}$

Extending the Heston characteristic function to the Bates model (Bates [1996]), is rather straightforward.

Definition 7.14.

The Bates model is a Heston model, with independent jumps in the S component, formally defined as

$\begin{array}{l} d S (t) & = & γ S (t) d t + \sqrt{V (t)} S (t) d W^{(S)} (t) + S_{t -} d J (t) & (7.84) \\ d V (t) & = & κ (θ - V (t)) d t + σ_{v} \sqrt{V (t)} d W^{(V)} (t) & (7.85) \end{array}$

where the driving Wiener processes once again are allowed to be correlated, dW(S) (t)dW(v) (t) = ρdt, while J(t) is a compound Poisson process with intensity λ and lognormal distributed jumps of size k such that log(1 + k) ∈ N(μ, δ2). The jumps are independent of the diffusion part, although it is still possible to derive the joint characteristics function when the jump intensity is a linear function of the state variables (Duffie et al. [2003]).

Computing the logarithm of the stock price X(t) = log(S(t)) leads to the dynamics

$d X (t) = (γ^{- λ} (e^{μ + \frac{δ^{2}}{2}} - 1) - \frac{1}{2} V (t)) d t + \sqrt{V (t)} d W^{(S)} (t) + d J (t) . (7.86)$

The discounted price process will therefore be a risk-neutral martingale if the risk-free rate in the Heston models is replaced by

$r^{'} = r - λ (e^{μ + \frac{δ^{2}}{2}} - 1) . (7.87)$

The characteristic function for the Bates model is, due to the independence between the jumps and the Wiener processes, given by a multiplication of the Heston characteristic function, replacing r with $r - λ (e^{μ + \frac{δ^{2}}{2}} - 1)$ , while the jump term given by

$φ_{j u m p s} (u) = e^{λ t (e^{i μ u - \frac{δ^{2} u^{2}}{2}} - 1)}, (7.88)$

leads to the joint expression

$ϕ_{B a t e s} (u) = ϕ_{H e s t o n} (u) ϕ_{J u m p s} (u) . (7.89)$

7.6 Problems

Problem 7.1

1. Show (7.25).
2. Show (7.27).

Problem 7.2

Referring to (7.18), the important Stratonovitch integrals are obtained by introducing

$ξ_{k} = \frac{t_{k} + t_{k} + 1}{2},$

i.e. the integrand is evaluated at the midpoint of the interval [tk, tk+1[.

1. Compute the integral

$\int_{0}^{t} W (s) d W (s)$

in the Stratonovitch sense.

Although it may be shown that Stratonovitch integrals are neither Markov processes nor martingales, they are important for theoretical work because the ordinary chain rule applies for variable transformations.

Problem 7.3

Let B(t) denote a standard Brownian motion (a Wiener process) on the probability field (Ω, ℱ, ℙ) and let ℱ(t) be the natural filtration generated by B(t).

1. Show that B(t) is a martingale.
2. Show that only one of the following is a martingale

$\begin{matrix} M (t) = B (t)^{2}, \\ \tilde{M} (t) = B (t)^{2} - t . \end{matrix}$
3. Use this result to give an intuitive explanation of the martingale property. (Hint: Sketch a realization of the two processes.)
4. Show that N(t) = B(t)3 − 3tB(t) is a martingale.

1Right continuous with left limits.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 7 Stochastic calculus

Create new playlist

Sign In

Sign Up

Stochastic calculus

7.1 Dynamical systems

7.2 The Wiener process

7.3 Stochastic Integrals

7.4 Itō stochastic calculus

7.5 Extensions to jump processes

7.6 Problems

Table of Contents for
Chapter 7 Stochastic calculus