Appendix B
In this appendix it is the intention to give a brief overview of probability theory. Some of the concepts introduced are widely used in the lecture notes. It is not necessary to understand all the technical details, but an intuitive understanding of the concepts introduced is important.
Let Ω denote a finite sample space which contains all the elementary outcome ωi for i = 1,2,..., N. In a two period binomial model the elementary outcome is the state of the world at time t = 2, which determines the stock price at that time.
Definition B.1 (σ-algebra).
Let Ω be a set of points ω. A family ℱ of subset of Ω is called a σ-algebra if
The definition says that (1) the empty set is an element of ℱ. (2) If A ∊ ℱ, then the complement of A is in ℱ as well. As an example the entire set Ω ∊ ℱ since the empty set is in ℱ. (3) Countable unions of elements of ℱ are elements of ℱ as well.
Example B.1.
The family of all subsets of Ω is an example of an σ-algebra, and it is denoted by 2Ω. In the two period binomial model with Ω = (ω1, ω2, ω3, ω4) we have
2Ω={∅,ω1,ω2,ω3,ω4,{ω1,ω2},{ω1,ω3},{ω1,ω4},{ω2,ω3},{ω2,ω4},{ω3,ω4},{ω1,ω2,ω3},{ω1,ω2,ω4},{ω1,ω3,ω4},{ω2,ω3,ω4},{ω1,ω2,ω3,ω4},Ω}.(B.1)
Definition B.2 (Measurable space).
A pair (Ω, ℱ), where Ω is a set and ℱ is a σ-algebra on Ω, is called a measurable space, and the subsets of Ω which are in ℱ are called ℱ-measurable sets.
Definition B.3.
A probability measure ℙ on a measurable space (Ω, ℱ) is a function ℙ: ℱ → [0,1] such that
If A1, A2,... ∊ ℱ and {At}∞i=1
ℙ(∞∪i=1Ai)=∞∑i=1ℙ(Ai).(B.2)
The triple (Ω, ℱ, ℙ) is called a probability space.
Definition B.4 (Partition).
A partition ? of a set Ω is a finite family {Ai, i = 1,2,..., K} of subsets of Ω, such that
Consider a sample space Ω and a given partition ? = {Ai; i = 1,2,...,K} of Ω. We can then interpret ? intuitively in terms of “information” in the following way.
Example B.2.
Let Ω = {ω1, ω2, ω3, ω4} denote the sample space, and define two partitions ?1 = {{ω1, ω2}, {ω3, ω4}} and ?2 = {{ω1, ω2}, {ω3}, {ω4}}. Then intuitively speaking the partition ?2 contains more information than partition ?1, since one of the elements in ?1 {ω3, ω4} is partitioned into “smaller” elements in partition ?2.
This leads to the following definition:
Definition B.5.
A partition ? is said to be “richer” than a partition ? if ? and ? are partitions on the same sample space Ω, and each component of ? is a union of components of ?.
Although the more general concept of σ-algebras is used to denote the “information set,” it might help to think of it as a partition. In the next section we need the following definition:
Definition B.6 (σ-algebra).
A σ-algebra G
The generated σ-algebra is denoted G
Example B.3.
Considera sample space Ω = {ω1, ω2, ω3, ω4} and a partition ? = {{ω1, ω2}, {ω3, ω4}}. The σ-algebra generated by that partition is then given by
G={∅,{ω1,ω2},{ω3,ω4},Ω}.(B.3)
Example B.4.
Let the sample space Ω consist of the real numbers in the interval [0,1]. Define the partitions
?1={A1,A2,A3,A4}?2={B1,B2,B3}
where
A1=[0,13[,A2=[13,12[,A3=[12,34[,A4=[34,1]B1=[0,13[,B2=[13,34[,B3=[34,1].
It is intuitively appealing to state that ?1 contains more information than ?2, because ?1 is partitioned into smaller parts.
The objective of this section is to define the conditional expectation E[X|G] where G is a σ-algebra, which should be interpreted as the expectation of X given the information represented by the σ-algebra. However, we begin with the elementary definition of conditional expectation, given the probability space (Ω, ℱ, ℙ) and two stochastic variables X and Z.
Definition B.7 (Conditional probability).
The probability of X conditioned on Z is given by
ℙ(X=xi|Z=zj)=ℙ(X=xi∩Z=zj)ℙ(Z=zj).(B.4)
The intuition behind this definition is as follows.
The probability of a given event xi is the fraction of the total probability mass that is assigned to that event, e.g.,
ℙ(xi)=ℙ(xi)ℙ(Ω).(B.5)
The definition of the conditional expectation for discrete stochastic variables is
E[X|Z=zj]=∑xiℙ(X=xi|Z=zj).(B.6)
The (unconditional) expectation of a stochastic variable X is given by
E[X]=∫ΩX(ω)dℙ(ω)(B.7)
where the integration is taken over the entire sample space, with respect to the measure (distribution) ℙ. This covers the case where no prior knowledge of the outcome ω is available. Now assume that we know that ω ∊ B, and ℙ(ω) > 0. As a preliminary definition of conditional expectation we have the following:
Definition B.8 (Conditional expectation given a single event).
Given a probability space (Ω, ℱ, ℙ) assume that B ∊ ℱ with ℙ(B) > 0. The conditional expectation of X given B is defined by
E[X|B]=1ℙ(B)∫BX(ω)dℙ(ω).(B.8)
Note that this definition is very similar to the definition of conditional probabilities given in (B.4), and with a similar interpretation. This definition is now generalized to the case where the conditioning argument is a partition. Let ? = {A1,..., AK} be a partition of Ω with ℙ(Ai) > 0, then we know from Section B.2 that this could be interpreted as if we know in which set Ai the true ω lies. This leads to the following preliminary definition of conditional expectation:
Definition B.9.
Let ? = {A1,..., AK} be a partition of Ω with ℙ(Ai) > 0, then the conditional expectation is given by
E[X|?]=K∑n=1I{ω∊An}E[X|An](B.9)
where I{·} denotes the indicator function.
The problem with this definition is that it assumes that each set must have positive probability, which is a unnecessary restriction as we shall see. To give an idea of the interpretation of the final definition of conditional expectation, based on σ-algebras, consider the following.
Let ? be a partition of Ω into Z-atoms,1 where the random variable Z is constant. The σ-algebra G = σ(?) generated by this consists of exactly 2n possible unions of the Z-atoms. It is clear from the elementary definition of conditional expectation that the conditional expectation Y is constant on the Z-atoms, or to be more precise
Y is G-measurable.(B.10)
Since Y takes the constant value yi on the Z-atom {Z = zj}, we have
∫{Z=zj}Ydℙ=yiℙ(Z=zi).(B.11)
Applying the elementary definition of conditional probability and expectation (B.4) and (B.6) we get
∫{Z=zj}Ydℙ=∑ixiℙ(X=xi|Z=zj)ℙ(Z=zj)=∑ixiℙ(X=xi∩Z=zj)=∫{Z=zj}Xdℙ(B.12)
If we write Gj = {Z = zj}, this says that E[YIGj] = E[XIGj], where I denotes the indicator function. Since IG is a sum of IGj for every G ∊ G we have E[YIG] = E[XIG], or
∫GYdℙ=∫GXdℙ,forall G∊G.(B.13)
This leads us to the final definition of conditional expectation.
Definition B.10 (Conditional expectation).
Let (Ω, ℱ, ℙ) be a probability space, X a stochastic variable on this space and let G ⊆ ℱ be a σ-algebra on Ω. If Y is a stochastic variable such that
2.
∫GY(ω)dℙ(ω)=∫GX(ω)dℙ(ω)for all G∊G(B.14)
then Y = E[X|G] is the conditional expectation of X given G.
To give an intuitive understanding of conditional expectation given a σ-algebra consider the following example.
Example B.5.
Suppose we have a finite sample space Ω = (ω1, ω2, ω3, ω4) with four possible outcomes. Define three stochastic variables X, Y1 and Y2: Ω → ℝ with the following values
ω1ω2ω3ω4X1234Y11212Y21.5101.510
Since the stochastic variable X takes different values for all outcomes ωi, the σ-algebra generated by that variable is given by
σ{X}={∅,ω1,ω2,ω3,ω4,{ω1,ω2},{ω1,ω3},{ω1,ω4},{ω2,ω3},{ω2,ω4},{ω3,ω4},{ω1,ω2,ω3},{ω1,ω2,ω4},{ω1,ω3,ω4},{ω2,ω3,ω4},{ω1,ω2,ω3,ω4,Ω}(B.15)
which corresponds to full information. The σ-algebra generated by Y1 and Y2 contains less “information” since these variables take the same value for ω1 and ω3 and the same values for ω2 and ω4. The two generated σ-algebras
σ{Y1}=σ{Y2}={∅,{ω1,ω3},{ω2,ω4},Ω}(B.i6)
contain the same information about X despite the fact that Y1 and Y2 take different values. Assume that each outcome has probability 14. By the elementary definition of conditional expectation we have
E[X|Y1=1]=12⋅1+12⋅3=2(B.17)E[X|Y1=2]=12⋅2+12⋅4=3(B.18)
which summarize to
E[X|Y1](ω)={2ω∊{ω1,ω3}3ω∊{ω2,ω4}.(B.19)
We shall now check whether the two conditions stated in Definition B.10 are fulfilled. Since E[X|Y1](ω) is constant on the two subsets {ω1, ω3} and {ω2, ω4} the conditional expectation (B.19) is measurable with respect to σ{Y1}.
The other condition says that
∫{ω1,ω3}E[X|Y1](ω)dℙ(ω)=∫{ω1,ω3}X(ω)dℙ(ω)(B.20)∫{ω2,ω4}E[X|Y1](ω)dℙ(ω)=∫{ω2,ω4}X(ω)dℙ(ω)(B.21)
which are also fulfilled. It is easy to show that E[X|σ{Y1}] = E[X|σ{Y2}], since the generated σ-algebras are the same.
Some of the most important properties of conditional expectation are given in the following list, where G and ℋ denote sub-σ-algebras of ℱ:
If ℋ is a sub-σ-algebra of G, then
E[E[X|G]|ℋ]=E[X|ℋ]a.s.(B.22)
If Z is G-measurable and bounded, then
E[ZX|G]=ZE[X|G]a.s.(B.23)
Remark B.1.
Intuitively, the statement that X is G-measurable simply means that X is known, and thus E[X|G] = X a.s. Item 2 simply states that the expectation operator is linear. Eq. (B.22) is often called the Tower Property, and it states that the most coarse sub-σ-algebra ℋ overrules the finer sub-σ-algebra G. Eq. (B.23) states that we can take out what is known (namely Z) from the expectation operator.
Should you wish to pursue these (purely) mathematical topics, a number of books are available (Grimmett and Stirzaker [1992], Karatzas and Shreve [1996], Williams [1995], Royden [1988]). The first reference provides an excellent and readable introduction to stochastic processes and probability theory in general. The other references are given in an increasing order of difficulty and the topics considered herein are outside the scope and aim of these lecture notes.
1If the sample space is finite an atom is a set which only consists of one element.