Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Appendix E
Solutions to the Exercises

Chapter 1

1.1
1. 1. (a) We have the stationary solution X _t = ∑_i ≥ 00.5ⁱ(η _t − i + 1), with mean EX _t = 2 and autocorrelations ρ _X(h) = 0.5^∣h∣ .
  2. (b) We have an ‘anticipative’ stationary solution
    
    which is such that EX _t = − 1 and ρ _X(h) = 0.5^∣h∣ .
  3. (c) The stationary solution
    
    is such that EX _t = 2 with ρ _X(1) = 2/19 and ρ _X(h) = 0.5^h − 1 ρ _X(1) for h > 1.
2. The compatible models are, respectively, ARMA(1, 2), MA(3) and ARMA(1, 1).
3. The first noise is strong, and the second is weak because
  
  Note that, by Jensen's inequality, this correlation is positive.
1.2 Without loss of generality, assume for t < 1 or t > n . We have

which gives , and the result follows.
1.3 Consider the degenerate sequence (X _t)_{t = 0, 1, …} defined, on a probability space (Ω, 풜, ℙ), by X _t(ω) = (−1)^t for all ω ∈ Ω and all t ≥ 0. With probability 1, the sequence {(−1)^t} is the realisation of the process (X _t). This process is non‐stationary because, for instance, EX ₀ ≠ EX ₁ .
Let U be a random variable, uniformly distributed on {0, 1}. We define the process (Y _t)_{t = 0, 1, …} by

for any ω ∈ Ω and any t ≥ 0. The process (Y _t) is stationary. We have in particular EY _t = 0 and Cov(Y _t, Y _t + h) = (−1)^h . With probability 1/2, the realisation of the stationary process (Y _t) will be the sequence {(−1)^t} (and with probability 1/2, it will be {(−1)^t + 1}).

This example leads us to think that it is virtually impossible to determine whether a process is stationary or not, from the observation of only one trajectory, even of infinite length. However, practitioners do not consider {(−1)^t} as a potential realisation of the stationary process (Y _t). It is more natural, and simpler, to suppose that {(−1)^t} is generated by the non‐stationary process (X _t).
1.4 The sequence 0, 1, 0, 1, … is a realisation of the process X _t = 0.5(1 + (−1)^t A), where A is a random variable such that P[A = 1] = P[A = − 1] = 0.5. It can easily be seen that (X _t) is strictly stationary.
Let Ω^* = {ω ∣ X _2t = 1, X _2t + 1 = 0, ∀ t}. If (X _t) is ergodic and stationary, the empirical means and both converge to the same limit P[X _t = 1] with probability 1, by the ergodic theorem. For all ω ∈ Ω^* these means are, respectively, equal to 1 and 0. Thus P(Ω^*) = 0. The probability of such a trajectory is thus equal to zero for any ergodic and stationary process.
1.5 We have Eε_t = 0, Var ε_t = 1, and Cov(ε_t, ε_t − h) = 0 when h ≠ 0, thus (ε_t) is a weak white noise. We also have , thus ε_t and ε_t − 1 are not independent, which shows that (ε_t) is not a strong white noise.
1.6 Assume h > 0. Define the random variable where . It is easy to see that has the same asymptotic variance (and also the same asymptotic distribution) as . Using , stationarity, and Lebesgue's theorem, this asymptotic variance is equal to

This value can be arbitrarily larger than 1, which is the value of the asymptotic variance of the empirical autocorrelations of a strong white noise.
1.7 It is clear that is a second‐order stationary process. By construction, ε_t and ε_t − h are independent when h > k , thus for all h > k . Moreover, , for h = 0, …, k . In view of Theorem 1.2, thus follows an MA(k) process. In the case k = 1, we have

where ∣b∣ < 1 and (u _t) is a white noise of variance σ ² . The coefficients b and σ ² are determined by

which gives and σ ² = 2/b .
1.8 Reasoning as in Exercise 1.6, the asymptotic variance is equal to

Since , for k ≠ h the asymptotic variance can be arbitrarily smaller than 1, which corresponds to the asymptotic variance of the empirical autocorrelations of a strong white noise.
1.9
1. We have
  
  when n > m and m → ∞. The sequence {u _t(n)}_n defined by is a Cauchy sequence in L ² , and thus converges in quadratic mean. A priori,
  
  exists in ℝ ∪ + {∞}. Using Beppo Levi's theorem,
  
  which shows that the limit is finite almost surely. Thus, as n → ∞, u _t(n) converges, both almost surely and in quadratic mean, to . Since
  
  we obtain, taking the limit as n → ∞ of both sides of the equality, u _t = au _t − 1 + η _t . This shows that (X _t) = (u _t) is a stationary solution of the AR(1) equation.
  
  Finally, assume the existence of two stationary solutions to the equation X _t = aX _t − 1 + η _t and u _t = au _t − 1 + η _t . If , then
  
  which entails
  
  This is in contradiction to the assumption that the two sequences are stationary, which shows the uniqueness of the stationary solution.
2. We have X _t = η _t + aη _t − 1 + ⋯ + a ^k η _t − k + a ^k + 1 X _{t − k − 1} . Since ∣a ∣ = 1,
  
  as k → ∞. If (X _t) were stationary,
  
  and we would have
  
  This is impossible, because by the Cauchy–Schwarz inequality,
3. The argument used in Part 1 shows that
  
  almost surely and in quadratic mean. Since
  
  for all n , ( ) is a stationary solution (which is called anticipative, because it is a function of the future values of the noise) of the AR(1) equation. The uniqueness of the stationary solution is shown as in Part 1.
4. The autocovariance function of the stationary solution is
  
  We thus have Eε_t = 0 and, for all h > 0,
  
  which confirms that ε_t is a white noise.
1.10 In Figure 1.6a, we note that several empirical autocorrelations are outside the 95% significance band, which leads us to think that the series may not be the realisation of a strong white noise. Inspection of Figure 1.6b confirms that the observed series ε₁, …, ε_n cannot be generated by a strong white noise; otherwise, the series would also be uncorrelated. Clearly, this is not the case, because several empirical autocorrelations go far beyond the significance band. By contrast, it is plausible that the series is a weak noise. We know that Bartlett's formula giving the limits is not valid for a weak noise (see Exercises 1.6 and 1.8). On the other hand, we know that the square of a weak noise can be correlated (see Exercise 1.7).
1.11 Using the relation , formula (B.18) can be written as

With the change of index h = i − ℓ, we obtain

which gives (B.14), using the parity of the autocovariance functions.
1.12 We can assume i ≥ 0 and j ≥ 0. Since γ _X(ℓ) = γ _ε(ℓ) = 0 for all ℓ ≠ 0, formula (B.18) yields

for (i, j) ≠ (0, 0) and

Thus

In formula (B.15), we have _ij = 0 when i ≠ j and _ii = 1. We also have when i ≠ j and for all i ≠ 0. Since , we obtain

For significance intervals C _h of asymptotic level 1 − α , such that , we have

By definition of C _h ,

Moreover,

We have used the convergence in law of to a vector of independent variables. When the observed process is not a noise, this asymptotic independence does not hold in general.
1.13 The probability that all the empirical autocorrelations stay within the asymptotic significance intervals (with the notation of the solution to Exercise 1.12) is, by the asymptotic independence,

For m = 20 and α = 5%, this limit is equal to 0.36. The probability of not rejecting the right model is thus low.
1.14 In view of (B.7), we have r _X(1) = ρ _X(1). Using step (B.8) with k = 2 and a _{1, 1} = ρ _X(1), we obtain

Then, step (B.9) yields

Finally, step (B.8) yields

1.15 The historical data from 3 January 1950 to 24 July 2009 can be downloaded via the URL: http://fr.finance.yahoo.com/q/hp?s = %5EGSPC. We obtain Figure E.1 with the following R code:

> # reading the SP500 data set
> sp500data <- read.table("sp500.csv",header=TRUE,sep=",")
> sp500<-rev(sp500data$Close) # closing price
> n<-length(sp500)
> rend<-log(sp500[2:n]/sp500[1:(n-1)]); rend2<-rend∧2
> op <- par(mfrow = c(2, 2)) # 2 × 2 figures per page
> plot(ts(sp500),main="SP 500 from 1/3/50 to 7/24/09",
+                ylab="SP500 Prices",xlab="")
> plot(ts(rend),main="SP500 Returns",ylab="SP500 Returns",
+                xlab="")
> acf(rend, main="Autocorrelations of the returns",xlab="",
+                ylim=c(-0.05,0.2))
> acf(rend2, main="ACF of the squared returns",xlab="",
+                ylim=c(-0.05,0.2))
> par(op)

Graphs illustrating SP 500 from 3/1/1950 to 24/7/2009 (left) and SP500 returns (right) displaying fluctuating ascending curve and spectrum waveforms, respectively.; Correlograms illustrating autocorrelations of the returns depicting vertical lines in the negative and positive axes (left); and ACF of the squared returns depicting vertical lines with 3 highest peaks above 0.15 (right). — Figure E.1 Closing prices and returns of the S&P 500 index from 3 January 1950 to 24 July 2009.

Chapter 2

2.1 This covariance is meaningful only if and Ef ²(ε_t − h) < ∞. Under these assumptions, the equality is true and follows from E(ε_t ∣ ε_u, u < t) = 0.
2.2 In case (i) the strict stationarity condition becomes α + β < 1. In case (ii) elementary integral computations show that the condition is
2.3 Let λ ₁, …, λ _m be the eigenvalues of A . If A is diagonalisable, there exists an invertible matrix P and a diagonal matrix D such that A = P ⁻¹ DP . It follows that, taking a multiplicative norm,

For the multiplicative norm ‖A‖ = ∑ ∣ a _ij∣, we have The result follows immediately.

When A is any square matrix, the Jordan representation can be used. Let n _i be the multiplicity of the eigenvalue λ _i . We have the Jordan canonical form A = P ⁻¹ JP , where P is invertible, and J is the block‐diagonal matrix with a diagonal of m matrices J _i(λ _i), of size n _i × n _i , with λ _i on the diagonal, 1 on the superdiagonal, and 0 elsewhere. It follows that A ^t = P ⁻¹ J ^t P , where J ^t is the block‐diagonal matrix whose blocks are the matrices . We have , where N _i is such that . It can be assumed that ∣λ ₁ ∣ > ∣ λ ₂ ∣ > ⋯ > ∣ λ _m ∣ . It follows that

as t → ∞, and the proof easily follows.
2.4 We use the multiplicative norm ‖A‖ = ∑ ∣ a _ij∣. Thus log‖Az _t‖ ≤ log ‖A‖ + log ‖z _t‖; therefore, log⁺‖Az _t‖ ≤ log⁺ ‖A‖ + log⁺ ∣ z _t∣, which admits a finite expectation by assumption. It follows that γ exists. We have

and thus

Using Eq. (2.21) and the ergodic theorem, we obtain

Consequently, γ < 0 if and only if ρ(A) < exp(−E log ∣ z _t∣).
2.5 To show 1, first note that, by stationarity, we have . The replacement can thus be done in (2.22). To show that it can also be done in (2.23), let us apply Theorem 2.3 to the sequence defined by . Noting that , we have

which completes the proof of 1.

We have shown that, for any , the stationary sequences and have the same top Lyapunov exponent, i.e.

The convergence follows by showing that .
2.5 For the Euclidean norm, multiplicativity follows from the Cauchy–Schwarz inequality. Since , we have

To show that the norm N ₁ is not multiplicative, consider the matrix A whose elements are all equal to 1: we then have N ₁(A) = 1 but N ₁(A ²) > 1.
2.6 We have

and
2.7 We have , therefore, under the condition α ₁ + α ₂ < 1, the moment of order 2 is given by

(see Theorem 2.5 and Remark 2.6(1)). The strictly stationary solution satisfies

in ℝ ∪ {+∞}. Moreover,

which gives

Using this relation in the previous expression for , we obtain

If , then the term in brackets on the left‐hand side of the equality must be strictly positive, which gives the condition for the existence of the fourth‐order moment. Note that the condition is not symmetric in α ₁ and α ₂ . In Figure E.2, the points (α ₁, α ₂) under the curve correspond to ARCH(2) models with a fourth‐order moment. For these models,

Figure E.2 Region of existence of the fourth‐order moment for an ARCH(2) model (when μ ₄ = 3).
2.8 We have seen that admits the ARMA(1, 1) representation

where is a (weak) white noise. The author correlation of thus satisfies

E.1

Using the MA(∞) representation

we obtain

and

It follows that the lag 1 autocorrelation is

The other autocorrelations are obtained from (E.1) and . To determine the autocovariances, all that remains is to compute

which is given by
2.9 The vectorial representation is

We have

The eigenvalues of A ⁽²⁾ are 0, 0, 0 and 3α ² + 2αβ + β ² , thus I ₄ − A ⁽²⁾ is invertible (0 is an eigenvalue of I ₄ − A ⁽²⁾ if and only if 1 is an eigenvalue of A ⁽²⁾ ), and the system (2.63) admits a unique solution. We have

The solution to Eq. (2.63) is

As first component of this vector, we recognise , and the other three components are equal to . Equation (2.64) yields

which gives , but with tedious computations, compared to the direct method utilised in Exercise 2.8.
2.10 It suffices to show that for all fixed . Let and for . For all , write with . We have

and the result follows.
2.10
1. Subtracting the ( q + 1)th line of (λI _p + q − A) from the first, then expanding the determinant along the first row, and using Eq. (2.32), we obtain
  
  and the result follows.
2. When the previous determinant is equal to zero at λ = 1. Thus ρ(A) ≥ 1. Now, let λ be a complex number of modulus strictly greater than 1. Using the inequality ∣a − b ∣ ≥ ∣ a ∣ − ∣ b∣, we then obtain
  
  It follows that ρ(A) ≤ 1 and thus ρ(A) = 1.
2.11 For all ε > 0, noting that the function f(t) = P(t ⁻¹ ∣ X ₁ ∣ > ε) is decreasing, we have

The convergence follows from the Borel–Cantelli lemma.

Now, let (X _n) be an iid sequence of random variables with density f(x) = x ⁻² _x ≥ 1 . For all K > 0, we have

The events {n ⁻¹ X _n > K} being independent, we can use the counterpart of the Borel–Cantelli lemma: the event {n ⁻¹ X _n > K for an infinite number of n} has probability 1. Thus, with probability 1, the sequence (n ⁻¹ X _n) does not tend to 0.
2.12 First note that the last r − 1 lines of B _t A are the first r − 1 lines of A , for any matrix A of appropriate size. The same property holds true when B _t is replaced by E(B _t). It follows that the last r − 1 lines of E(B _t A) are the last r − 1 lines of E(B _t)E(A). Moreover, it can be shown, by induction on t , that the i th line ℓ_{i, t − i} of B _t…B ₁ is a measurable function of the η _t − j , for j ≥ i . The first line of B _t + 1 B _t…B ₁ is thus of the form a ₁(η _t)ℓ_{1, t − 1} + ⋯ + a _r(η _t − r)ℓ_{r, t − r} . Since

the first line of EB _t + 1 B _t…B ₁ is thus the product of the first line of EB _t + 1 and of EB _t…B ₁ . The conclusion follows.
2.13
1. For any fixed t , the sequence converges almost surely (to ) as K → ∞. Thus
  
  and the first convergence follows. Now note that we have
  
  The first inequality uses (a + b)^s ≤ a ^s + b ^s for a, b ≥ 0 and s ∈ (0, 1]. The second inequality is a consequence of . The second convergence then follows from the dominated convergence theorem.
2. We have . The convergence follows from the previous question, and from the strict stationarity, for any fixed integer K , of the sequence .
3. We have
  
  for any i ^′ = 1, …, ℓ, j ^′ = 1, …, m . In view of the independence between X _n and Y , it follows that almost surely as n → ∞. Since is a strictly positive number, we obtain almost surely, for all i ^′, j ^′ . Using (a + b)^s ≤ a ^s + b ^s once again, it follows that
4. Note that the previous question does not allow us to affirm that the convergence to 0 of entails that of E(‖A _k A _k − 1…A ₁‖^s), because has zero components. For k large enough, however, we have
  
  where is independent of A _k A _k − 1…A _N + 1. The general term a _{i, j} of A _N…A ₁ is the (i, j)th term of the matrix A ^N multiplied by a product of variables. The assumption A ^N > 0 entails a _{i, j} > 0 almost surely for all i and j . It follows that the i th component of Y satisfies Y _i > 0 almost surely for all i . Thus . Now the previous question allows to affirm that E(‖A _k A _k − 1…A _N + 1‖^s) → 0 and, by strict stationarity, that E(‖A _k − N A _{k − N − 1}…A ₁‖^s) → 0 as k → ∞. It follows that there exists k ₀ such that
5. If α ₁ or β ₁ is strictly positive, the elements of the first two lines of the vector are also strictly positive, together with those of the ( q + 1)th and ( q + 2)th lines. By induction, it can be shown that under this assumption.
6. The condition can be satisfied when α ₁ = β ₁ = 0. It suffices to consider an ARCH(3) process with α ₁ = 0, α ₂ > 0, α ₃ > 0, and to check that .
2.14 In the case p = 1, the condition on the roots of 1 − β ₁ z implies ∣β ∣ < 1. The positivity conditions on the φ _i yield

The last inequalities imply β ₁ ≥ 0. Finally, the positivity constraints are

If q = 2, these constraints reduce to

Thus, we can have α ₂ < 0.
2.15 Using the ARCH( q ) representation of the process ( ), together with Proposition 2.2, we obtain
2.16 Since , h > 0, we have where λ, μ are constants and r ₁, r ₂ satisfy r ₁ + r ₂ = α ₁ , r ₁ r ₂ = − α ₂ . It can be assumed that r ₂ < 0 and r ₁ > 0, for instance. A simple computation shows that, for all h ≥ 0,

If the last equality is true, it remains true when h is replaced by h + 1 because . Since , it follows that for all h ≥ 0. Moreover,

Since , if then we have, for all h ≥ 1, We have thus shown that the sequence is decreasing when . If , it can be seen that for h large enough, say h ≥ h ₀ , we have , again because of . Thus, the sequence is decreasing.
2.17 Since X _n + Y _n → − ∞ in probability, for all K we have

Since in probability, there exist K ₀ ∈ ℝ and n ₀ ∈ ℕ such that P(X _n < K ₀/2) ≤ ς < 1 for all n ≥ n ₀ . Consequently,

as n → ∞, for all K ≤ K ₀ , which entails the result.
2.18 We have

as n → ∞. If γ < 0, the Cauchy rule entails that

converges almost surely, and the process (ε_t), defined by , is a strictly stationary solution of model (2.7). As in the proof of Theorem 2.1, it can be shown that this solution is unique, non‐anticipative and ergodic. The converse is proved by contradiction, assuming that there exists a strictly stationary solution . For all n > 0, we have

It follows that a(η ₋₁)…a(η _−n)ω(η _{−n − 1}) converges to zero, almost surely, as n → ∞ , or, equivalently, that

E.2

We first assume that E log {a(η _t)} > 0. Then the strong law of large numbers entails almost surely. For (E.2) to hold true, it is then necessary that log ω(η _{−n − 1}) → − ∞ almost surely, which is precluded since (η _t) is iid and ω(η ₀) > 0 almost surely. Assume now that E log {a(η _t)} = 0. By the Chung–Fuchs theorem, we have with probability 1 and, using Exercise 2.17, the convergence (E.2) entails log ω(η _{−n − 1}) → − ∞ in probability, which, as in the previous case, entails a contradiction.
2.19 Letting a(z) = λ + (1 − λ)z ² , we have

Regardless of the value of , fixed or even random, we have almost surely

using the law of large numbers and Jensen's inequality. It follows that almost surely as t → ∞.
2.20
1. Since the φ _i are positive and A ₁ = 1, we have φ _i ≤ 1, which shows the first inequality. The second inequality follows by convexity of x ↦ x log x for x > 0.
2. Since A ₁ = 1 and A _p < ∞, the function f is well defined for q ∈ [p, 1]. We have
  
  The function q ↦ log E|η ₀|^2q is convex on [p,1] if, for all λ ∈ [0, 1] and all q, q ^* ∈ [p, 1],
  
  which is equivalent to showing that
  
  with X = |η ₀|^2q , . This inequality holds true by Hölder's inequality. The same argument is used to show the convexity of . It follows that f is convex, as a sum of convex functions. We have f(1) = 0 and f(p) < 0, thus the left derivative of f at 1 is negative, which gives the result.
3. Conversely, we assume that there exists p ^* ∈ (0, 1] such that and that condition (2.52) is satisfied. The convexity of f on [p ^*, 1] and (2.52) implies that f(q) < 0 for q sufficiently close to 1. Thus condition (2.41) is satisfied. By convexity of f and since f(1) = 0, we have f(q) < 0 for all q ∈ [p, 1]. It follows that, by Theorem 2.6, E|ε_t|^q < ∞ for all q ∈ [0, 2].
2.21 Since , we have a = 0 and b = 1. Using condition (2.60), we can easily see that

since the condition for the existence of is . Note that when the GARCH effect is weak (that is, α ₁ is small), the part of the variance that is explained by this regression is small, which is not surprising. In all cases, the ratio of the variances is bounded by 1/κ _η , which is largely less than 1 for most distributions (1/3 for the Gaussian distribution). Thus, it is not surprising to observe disappointing R ² values when estimating such a regression on real series.

Chapter 3

3.1 Given any initial measure, the sequence (X _t)_{t ∈ ℕ} clearly constitutes a Markov chain on (ℝ, ℬ(ℝ)), with transition probabilities defined by P(x, B) = ℙ(X ₁ ∈ B ∣ X ₀ = x) = P _ε(B − θx).
1. (a) Since P _ε admits a positive density on ℝ, the probability measure P(x, .) is, for all x ∈ E , absolutely continuous with respect to λ and its density is positive on ℝ. Thus any measure ϕ which is absolutely continuous with respect to λ is a measure of irreducibility: ∀x ∈ E ,
  
  Moreover, λ is a maximal measure of irreducibility.
2. (b) Assume, for example, that ε _t is uniformly distributed on [−1, 1]. If θ > 1 and X ₀ = x ₀ > 1/(θ − 1), we have x ₀ < X ₁ < X ₂ < …, regardless of the ε _t . Thus there exists no irreducibility measure: such a measure should satisfy ϕ([ − ∞ , x]) = 0, for all x ∈ ℝ, which would imply ϕ = 0.
3.2 If (X _n) is strictly stationary, X ₁ and X ₀ have the same distribution, μ , satisfying

Thus μ is an invariant probability measure.

Conversely, suppose that μ is invariant. Using the Chapman–Kolmogorov relation, by which ∀t ∈ ℕ, ∀ s, 0 ≤ s ≤ t, ∀ x ∈ E, ∀ B ∈ ℰ,

we obtain

Thus, by induction, for all t , ℙ[X _t ∈ B] = μ(B) (∀B ∈ ℬ ). Using the Markov property, this is equivalent to the strict stationarity of the chain: the distribution of the process (X _t, X _t + 1, …, X _t + k) is independent of t , for any integer k .
3.3 We have

Thus π is invariant. The third equality is an immediate consequence of the Fubini and Lebesgue theorems.
3.4 Assume, for instance, θ > 0. Let C = [−c, c], c > 0, and let δ = inf {f(x); x ∈ [−(1 + θ)c, (1 + θ)c]} We have, for all A ⊂ C and all x ∈ C ,

Now let B ∈ ℰ . Then for all x ∈ C ,

The measure ν is non‐trivial since ν(E) = δλ(C) = 2δc > 0.
3.5 It is clear that (X _t) constitutes a Feller chain on ℝ. The λ ‐irreducibility follows from the assumption that the noise has a density which is everywhere positive, as in Exercise 3.1. In order to apply Theorem 3.1, a natural choice of the test function is V(x) = 1 + ∣ x∣. We have

Thus if K ₁ < 1, we have, for K ₁ < K < 1 and for g(x) > (K ₂ + 1 − K ₁)/(K − K ₁),

If we put A = {x; g(x) = 1 + ∣ x ∣ ≤ (K ₂ + 1 − K ₁)/(K − K ₁)}, the set A is compact and the conditions of Theorem 3.1 are satisfied, with 1 − δ = K .
3.6 By summing the first n inequalities of (3.11) we obtain

It follows that

because V ≥ 1. Thus, there exists κ > 0 such that

Note that the positivity of δ is crucial for the conclusion.
3.7 We have, for any positive continuous function f with compact support,

The inequality is justified by (i) and the fact that P f is a continuous positive function. It follows that for f = _C , where C is a compact set, we obtain

which shows that,

(that is, π is subvarient) using (ii). If there existed B such that the previous inequality were strict, we should have

and since π(E) < ∞ we arrive at a contradiction. Thus

which signifies that π is invariant.
3.8 See Francq and Zakoïan (2006b).
3.9 If were infinite then, for any K > 0, there would exist a subscript n ₀ such that . Then, using the decrease in the sequence, one would have . Since this should be true for all K > 0, the sequence would not converge. This applies directly to the proof of Corollary A.3 with u _n = {α _X(n)}^{ν/(2 + ν)} , which is indeed a decreasing sequence in view of point (v) on Section A.3.1.
3.10 We have

where

Inequality (A.8) shows that d ₇ is bounded by

By an argument used to deal with d ₆ , we obtain

and the conclusion follows.
3.11 The chain satisfies the Feller condition (i) because

is continuous at x when g is continuous.

To show that the irreducibility condition (ii) is not satisfied, consider the set of numbers in [0,1] such that the sequence of decimals is periodic after a certain lag:

For all h ≥ 0, if and only if . We thus have,

and,

This shows that there is no non‐trivial irreducibility measure.

The drift condition (iii) is satisfied with, for instance, a measure φ such that φ([−1, 1]) > 0, the energy V(x) = 1 + ∣ x∣ and the compact set A = [−1, 1]. Indeed,

provided

Chapter 4

4.1 Note that is a measurable function of and of , that will be denoted by

Using the independence between and the other variables of , we have, for all ,

when the distribution is symmetric.
4.2 A sequence of independent real random variables such that with probability and with probability is suitable, because , , and . We have used for any decreasing sequence of events, in order to show that
4.3 By definition,

and, by continuity of the exponential,

is finite if and only if the series of general term converges. Using the inequalities , we obtain

Since the tend to 0 at an exponential rate and , the series of general term converges absolutely, and we finally obtain

which is finite under condition (4.12).
4.4 Note that (4.13) entails that

with probability 1. The integral of a positive measurable function being always defined in , using Beppo Levi's theorem and then the independence of the , we obtain

which is of course finite under condition (4.12). Applying the dominated convergence theorem, and bounding the variables by the integrable variable , we then obtain the desired expression for .
4.5 Denoting by the density of ,

and . With the notation , it follows that

It then suffices to use the fact that is equivalent to , and that is thus equivalent to , in a neighborhood of 0.
4.6 We can always assume . In view of the discussion on pages 79–80, the process satisfies an ARMA representation of the form

where is a white noise with variance . Using and

the coefficients and are such that

and

When, for instance, , , and , we obtain
4.7 In view of Exercise 3.5, an AR process , , in which the noise has a strictly positive density over , is geometrically ‐mixing. Under the stationarity conditions given in Theorem 4.1, if has a density and if (defined in (4.10)) is a continuously differentiable bijection (that is, if ) then is a geometrically ‐mixing stationary process. Reasoning as in step (iv) of the proof of Theorem 3.4, it is shown that , and then , are also geometrically ‐mixing stationary processes.
4.8 Since and , we have

If the volatility is a positive function of that possesses a moment of order 2, then

under conditions (4.34). Thus, condition (4.38) is necessarily satisfied. Conversely, under (4.38) the strict stationarity condition is satisfied because

and, as in the proof of Theorem 2.2, it is shown that the strictly stationary solution possesses a moment of order 2.
4.9 Assume the second‐order stationarity condition (4.39). Let

and

Using , we obtain

We then obtain the autocovariances

and the autocorrelations . Note that for all , which shows that is a weak ARMA process. In the standard GARCH case, the calculation of these autocorrelations would be much more complicated because is not a linear function of .
4.10 This is obvious because an APARCH with , and corresponds to a TGARCH .
4.11 This is an EGARCH with , , and . It is natural to impose and , so that the volatility increases with . It is also natural to impose so that the effect of a negative shock is more important than the effect of a positive shock of the same magnitude. There always exists a strictly stationary solution

and this solution possesses a moment of order 2 when

which is the case, in particular, for . In the Gaussian case, we have

and

using the calculations of Exercise 4.5 and

Since is an increasing function, provided , we observe the leverage effect .
4.12 For all ε > 0, we have

and the conclusion follows from the Borel–Cantelli lemma.
4.13 Because the increments of the Brownian motion are independent and Gaussian, we have

and

as h → 0. The conclusion follows.
4.14 We have

Using the indication, one can check that and

We thus conclude by noting that

Chapter 5

Let (ℱ_t) be an increasing sequence of σ ‐fields such that ε_t ∈ ℱ_t and E(ε_t ∣ ℱ_t − 1) = 0. For h > 0, we have ε_tε_t + h ∈ ℱ_t + h and

The sequence (ε_tε_t + h, ℱ_t + h)_t is thus a stationary sequence of square integrable martingale increments. We thus have

where . To conclude, ¹ it suffices to note that

in probability (and even in L ² ).
5.2 This process is a stationary martingale difference, whose variance is

Its fourth‐order moment is

Thus,

Moreover,

Using Exercise 5.1, we thus obtain
5.3 We have

By the ergodic theorem, the denominator converges in probability (and even a.s.) to γ _ε(0) = ω/(1 − α) ≠ 0. In view of Exercise 5.2, the numerator converges in law to . Cramér's theorem ² then entails

The asymptotic variance is equal to 1 when α = 0 (that is, when ε_t is a strong white noise). Figure E.3 shows that the asymptotic distribution of the empirical autocorrelations of a GARCH can be very different from those of a strong white noise.

Figure E.3 Comparison between the asymptotic variance of for the ARCH(1) process (5.28) with (η _t) Gaussian (solid line) and the asymptotic variance of when ε_t is a strong white noise (dashed line).
5.4 Using Exercise 2.8, we obtain

In view of Exercises 5.1 and 5.3,

for any h ≠ 0.
5.5 Let ℱ_t be the σ ‐field generated by {η _u, u ≤ t}. If s + 2 > t + 1, then

Similarly, Eε_tε_t + 1ε_sε_s + 2 = 0 when t + 1 > s + 2. When t + 1 = s + 2, we have

because ε_t − 1 σ _t ∈ ℱ_t − 1 , , E(η _t ∣ ℱ_t − 1) = Eη _t = 0 and . Using (7.24), the result can be extended to show that Eε_tε_t + hε_sε_s + k = 0 when k ≠ h and (ε_t) follows a GARCH(p, q), with a symmetric distribution for η _t .
5.6 Since Eε_tε_t + 1 = 0, we have Cov{ε_tε_t + 1, ε_sε_s + 2} = Eε_tε_t + 1ε_sε_s + 2 = 0 in view of Exercise 5.5. Thus
5.7 In view of Exercise 2.8, we have

with ω = 1, α = 0.3 and β = 0.55. Thus γ _ε(0) = 6.667, , . Thus

for i = 1, …, 5. Finally, using Theorem 5.1,
5.8 Since γ _X(ℓ) = 0, for all ∣ℓ ∣ > q , we clearly have

Since , γ _ε(0) = ω/(1 − α) and

(see, for instance, Exercise 2.8), we have

Note that as i → ∞.
5.9 Conditionally on initial values, the score vector is given by

where ε_t(θ) = Y _t − F _θ(W _t). We thus have

and, when σ ² does not depend on θ ,
5.10 In the notation of Section 5.4.1 and denoting by the parameter of interest, the log‐likelihood is equal to

up to a constant. The constrained estimator is , with . The constrained score and the Lagrange multiplier are related by

On the other hand, the exact laws of the estimators under H ₀ are given by

and

with

For the case , we can estimate I ²² by

The test statistic is then equal to

E.3

with

and where R ² is the coefficient of determination (centred if X ₁ admits a constant column) in the regression of on the columns of X ₂ . For the first equality of (E.3), we use the fact that in a regression model of the form Y = Xβ + U , with obvious notation, Pythagoras's theorem yields

In the general case, we have

Since the residuals of the regression of Y on the columns of X ₁ and X ₂ are also the residuals of the regression of on the columns of X ₁ and X ₂ , we obtain LM_n by:
1. computing the residuals of the regression of Y on the columns of X ₁ ;
2. regressing on the columns of X ₂ and X ₁ , and setting LM_n = nR ² , where R ² is the coefficient of determination of this second regression.
5.11 Since , it is clear that , with and R ² defined in representations (5.29) and (5.30).
Since T is invertible, we have , where Col(Z) denotes the vectorial subspace generated by the columns of the matrix Z , and

If e ∈ Col(X) then and

Noting that , we conclude that

Chapter 6

From the observations ε₁, …, ε_n , we can compute and

for h = 0, …, q . We then put

and then, for k = 2, …, q (when q > 1),

With standard notation, the OLS estimators are then
6.2 The assumption that X has full column rank implies that X ^′ X is invertible. Denoting by 〈⋅, ⋅〉 the scalar product associated with the Euclidean norm, we have

and

with equality if and only if , and we are done.
6.3 We can take n = 2, q = 1, ε₀ = 0, ε₁ = 1, ε₂ = 0. The calculation yields .
6.4 Case 3 is not possible, otherwise we would have

for all t , and consequently , which is not possible.

Using the data, we obtain , and thus . Therefore, the constrained estimate must coincide with one of the following three constrained estimates: that constrained by α ₂ = 0, that constrained by α ₁ = 0, or that constrained by α ₁ = α ₂ = 0. The estimate constrained by α ₂ = 0 is , and thus does not suit. The estimate constrained by α ₁ = 0 yields the desired estimate .
6.5 First note that . Thus if and only if η _t = 0. The nullity of the i th column of X , for i > 1, implies that η _{n − i + 1} = ⋯ = η ₂ = η ₁ = 0. The probability of this event tends to 0 as n → ∞ because, since , we have P(η _t = 0) < 1.
6.6 Introducing an initial value X ₀ , the OLS estimator of φ ₀ is

and this estimator satisfies

Under the assumptions of the exercise, the ergodic theorem entails the almost sure convergence

and thus the almost sure convergence of to φ ₀ . For the consistency, the assumption suffices.

If , the sequence (ε_t X _t − 1, ℱ _t) is a stationary and ergodic square integrable martingale difference, with variance

We can see that this expectation exists by expanding the product

The CLT of Corollary A.1 then implies that

and thus

When , the condition suffices for asymptotic normality.
6.7 By direct verification, A ⁻¹ A = I .
6.8
1. Let Then solves the model
  
  The parameter ω ₀ vanishing in this equation, the moments of do not depend on it. It follows that
2. and 3. Write to indicate that a matrix M is proportional to . Partition the vector into Z _t − 1 = (1, W _t − 1)^′ and, accordingly, the matrices A and B of Theorem 6.2. Using the previous question and the notation of Exercise 6.7, we obtain
  
  We then have
  
  Similarly,
  
  It follows that C = A ⁻¹ BA ⁻¹ is of the form
6.9
1. Let Let us show the existence of x ^* . Let (x _n) be a sequence of elements of C such that, for all n > 0, ‖x − x _n‖² < α ² + 1/n . Using the parallelogram identity ‖a + b‖² + ‖a − b‖² = 2‖a‖² + 2‖b‖² , we have
  
  the last inequality being justified by the fact that (x _m + x _n)/2 ∈ C , the convexity of C and the definition of α . It follows that (x _n) is a Cauchy sequence and, E being a Hilbert space and therefore a complete metric space, x _n converges to some point x ^* . Since C is closed, x ^* ∈ C and ‖x − x ^*‖ ≥ α . We have also ‖x − x ^*‖ ≤ α , taking the limit on both sides of the inequality which defines the sequence (x _n). It follows that ‖x − x ^*‖ = α , which shows the existence.
  Assume that there exist two solutions of the minimisation problem in C , and . Using the convexity of C , it is then easy to see that satisfies
  
  This is possible only if (once again using the parallelogram identity).
2. Let λ ∈ (0, 1) and y ∈ C . Since C is convex, (1 − λ)x ^* + λy ∈ C . Thus
  
  and, dividing by λ ,
  
  Taking the limit as λ tends to 0, we obtain inequality (6.17).
  
  Let z such that, for all y ∈ C , 〈z − x, z − y〉 ≤ 0. We have
  
  the last inequality being simply the Cauchy–Schwarz inequality. It follows that ‖x − z‖ ≤ ‖x − y‖, ∀ y ∈ C . This property characterising x ^* in view of part 1, it follows that z = x ^* .
6.10
1. It suffices to show that when C = K , (6.17) is equivalent to (6.18). Since 0 ∈ K , taking y = 0 in (6.17) we obtain 〈x − x ^*, x ^*〉 ≤ 0. Since x ^* ∈ K and K is a cone, 2x ^* ∈ K . For y = 2x ^* in (6.17) we obtain 〈x − x ^*, x ^*〉 ≥ 0, and it follows that 〈x − x ^*, x ^*〉 = 0. The second equation of (6.18) then follows directly from (6.17). The converse, (6.18) ⇒ (6.17), is trivial.
2. Since x ^* ∈ K , then z = λx ^* ∈ K for λ ≥ 0. By (6.18), we have
  
  It follows that (λx)^* = z and (a) is shown. The properties (b) are obvious, expanding ‖x ^* + (x − x ^*)‖² and using the first equation of (6.18).
6.11 The model is written as Y = X ⁽¹⁾ θ ⁽¹⁾ + X ⁽²⁾ θ ⁽²⁾ + U. Thus, since M ₂ X ⁽²⁾ = 0, we have M ₂ Y = M ₂ X ⁽¹⁾ θ ⁽¹⁾ + M ₂ U. Note that this is a linear model, of parameter θ ⁽¹⁾ . Noting that , since M ₂ is an orthogonal projection matrix, the form of the estimator follows.
6.12 Since J _n is symmetric, there exists a diagonal matrix D _n and an orthonormal matrix P _n such that . For n large enough, the eigenvalues of J _n are positive since is positive definite. Let λ _n be the smallest eigenvalue of J _n . Denoting by ‖ ⋅ ‖ the Euclidean norm, we have

Since and , it follows that , and thus that X _n converges to the zero vector of ℝ^k .
6.13 Applying the method of Section 6.3.2, we obtain X ⁽¹⁾ = (1, 1)^′ and thus, by Theorem 6.8,

Chapter 7

1. When j < 0, all the variables involved in the expectation, except ε_t − j , belong to the σ ‐field generated by {ε_{t − j − 1}, ε_{t − j − 2}, …}. We conclude by taking the expectation conditionally on the previous σ ‐field and using the martingale increment property.
2. For j ≥ 0, we note that is a measurable function of and of Thus is an even function of the conditioning variables, denoted by .
3. It follows that the expectation involved in the property can be written as
  
  The latter equality follows from of the nullity of the integral, because the distribution of η _t is symmetric.
7.2 By the Borel–Cantelli lemma, it suffices to show that for all real δ > 0, the series of general terms converges. That is to say,

using Markov's inequality, strict stationarity and the existence of a moment of order s > 0 for .
7.3 For all κ > 0, the process is ergodic and admits an expectation. This expectation is finite since and . We thus have, by the standard ergodic theorem,

When κ → ∞, the variable increases to X ₁ . Thus by Beppo Levi's theorem converges to E(X ₁) = + ∞. It follows that tends almost surely to infinity.
7.4
1. The assumptions made on f and Θ guarantee that is a measurable function of η _t, η _t − 1, …. By Theorem A.1, it follows that (Y _t) is stationary and ergodic.
2. If we remove condition (7.94), the property may not be satisfied. For example, let Θ = {θ ₁, θ ₂} and assume that the sequence (X _t(θ ₁), X _t(θ ₂)) is iid, with zero mean, each component being of variance 1 and the covariance between the two components being different when t is even and when t is odd. Each of the two processes (X _t(θ ₁)) and (X _t(θ ₂)) is stationary and ergodic (as iid processes). However, is not stationary in general because its distribution depends on the parity of t .
7.5
1. In view of (7.30) and of the second part of assumption A1, we have
  E.7
  almost surely. Indeed, on a set of probability 1, we have for all ι > 0,
  E.8
  
  Note that and (7.29) entail that . The limit superior (E.5) being less than any positive number, it is null.
2. Note that is the strong innovation of . We thus have orthogonality between ν _t and any integrable variable which is measurable with respect to the σ ‐field generated by θ₀ :
  
  with equality if and only if ‐almost surely, that is, θ = θ ₀ (by assumptions A3 and A4; see the proof of Theorem 7.1).
3. We conclude that is strongly consistent, as in (d) in the proof of Theorem 7.1, using a compactness argument and applying the ergodic theorem to show that, at any point θ ₁ , there exists a neighbourhood V(θ ₁) of θ ₁ such that
4. Since all we have done remains valid when Θ is replaced by any smaller compact set containing θ ₀ , for instance Θ^c , the estimator is strongly consistent.
7.6 We know that minimises, over Θ,

For all c > 0, there exists such that for all t ≥ 0. Note that if and only if c ≠ 1. For instance, for a GARCH(1, 1) model, if we have . Let The minimum of f is obtained at the unique point

If , we have . It follows that c ₀ = 1 with probability 1, which proves the result.
7.7 The expression for I ₁ is a trivial consequence of (7.74) and Cov . Similarly, the form of I ₂ directly follows from (7.38). Now consider the non‐diagonal blocks. Using (7.38) and (7.74), we obtain

In view of (7.41), (7.42), (7.79) and (7.24), we have

and

It follows that

E.6

and ℐ is block‐diagonal. It is easy to see that 풥 has the form given in the theorem. The expressions for J ₁ and J ₂ follow directly from (7.39) and (7.75). The block‐diagonal form follows from (7.76) and (E.6).
7.8
1. We have The parameters to be estimated are a and α , ω being known. We have
  
  It follows that
  
  Letting ℐ = (ℐ_ij), and , we then obtain
2. In the case where the distribution of η _t is symmetric we have μ ₃ = 0 and, using (7.24), . It follows that
  
  The asymptotic variance of the ARCH parameter estimator is thus equal to : it does not depend on a ₀ and is the same as that of the QMLE of a pure ARCH(1) (using computations similar to those used to obtain (7.1.2)).
3. When α ₀ = 0, we have , and thus . It follows that
  
  We note that the estimation of too complicated a model (since the true process is AR(1) without ARCH effect) does not entail any asymptotic loss of accuracy for the estimation of the parameter a ₀ : the asymptotic variance of the estimator is the same, , as if the AR(1) model were directly estimated. This calculation also allows us to verify the ‘ α ₀ = 0’ column in Table 7.3: for the law we have μ ₃ = 0 and κ _η = 3; for the normalized χ ²(1) distribution we find and κ _η = 15.
7.9 Let ε > 0 and V(θ ₀) be such that (7.95) is satisfied. Since almost surely, for n large enough almost surely. We thus have almost surely

It follows that

and, since ε can be chosen arbitrarily small, we have the desired result.

In order to give an example where (7.95) is not satisfied, let us consider the autoregressive model X _t = θ ₀ X _t − 1 + η _t where θ ₀ = 1 and (η _t) is an iid sequence with mean 0 and variance 1. Let J _t(θ) = X _t − θX _t − 1 . Then J _t(θ ₀) = η _t and the first convergence of the exercise holds true, with J = 0. Moreover, for all neighbourhoods of θ ₀ ,

almost surely because the sum in brackets converges to +∞, X _t being a random walk and the supremum being strictly positive. Thus (7.95) is not satisfied. Nevertheless, we have

Indeed, converges in law to a non‐degenerate random variable (see, for instance, Hamilton 1994, p. 406) whereas in probability since has a non‐degenerate limit distribution.
7.10 It suffices to show that is positive semi‐definite. Note that . It follows that

Therefore is positive semi‐definite. Thus

Setting x = Jy , we then have

which proves the result.
7.11
1. In the ARCH case, we have . It follows that
  
  or equivalently , that is, We also have , and thus
2. Introducing the polynomial , the derivatives of satisfy
  
  It follows that
  
  In view of assumption A2 and Corollary 2.2, the roots of ℬ_θ(L) are outside the unit disk, and the relation follows.
3. It suffices to replace θ ₀ by in 1.
7.12 Only three cases have to be considered, the other ones being obtained by symmetry. If t ₁ < min {t ₂, t ₃, t ₄}, the result is obtained from (7.24) with g = 1 and t − j = t ₁ . If t ₂ = t ₃ < t ₁ < t ₄ , the result is obtained from (7.24) with t = t ₂ and t − j = t ₁ . If t ₂ = t ₃ = t ₄ < t ₁ , the result is obtained from (7.24) with j = 0, t ₂ = t ₃ = t ₄ = t and .
7.13
1. It suffices to apply (7.38), and then to apply Corollary 2.1.
2. The result follows from the Lindeberg central limit theorem of Theorem A.3.
3. Using Eq. (7.39) and the convergence of to +∞,
4. In view of Eq. (7.50) and the fact that , we have
5. The derivative of the criterion is equal to zero at . A Taylor expansion of this derivative around α ₀ then yields
  
  where α ^* is between and α ₀ . The result easily follows from the previous questions.
6. When ω ₀ ≠ 1, we have
  
  with
  
  Since d _t → 0 almost surely as t → ∞, the convergence in law of part 2 always holds true. Moreover,
  
  with
  
  which implies that the result obtained in Part 3 does not change. The same is true for Part 4 because
  
  Finally, it is easy to see that the asymptotic behaviour of is the same as that of , regardless of the value that is fixed for ω .
7. In practice ω ₀ is not known and must be estimated. However, it is impossible to estimate the whole parameter (ω ₀, α ₀) without the strict stationarity assumption. Moreover, under condition (7.14), the ARCH(1) model generates explosive trajectories which do not look like typical trajectories of financial returns.
7.14
1. Consider a constant . We begin by showing that for n large enough. Note that
  
  We have
  
  In view of the inequality x ≥ 1 + log x for all x > 0, it follows that
  
  For all M > 0, there exists an integer t _M such that for all t > t _M . This entails that
  
  Since M is arbitrarily large,
  
  E.7
  
  provided that . If is chosen so that the constraint is satisfied, the inequalities
  
  and (E.7) show that
  
  E.8
  
  We will define a criterion O _n asymptotically equivalent to the criterion Q _n . Since a. s. as t → ∞, we have for α ≠ 0,
  
  where
  
  On the other hand, we have
  
  when α ₀/α ≠ 1. We will now show that Q _n(α) − O _n(α) converges to zero uniformly in . We have
  
  Thus for all M > 0 and any ε > 0, almost surely
  
  provided n is large enough. In addition to the previous constraints, assume that . We have for any , and
  
  for any α ≥ α ₀ . We then have
  
  Since M can be chosen arbitrarily large and ε arbitrarily small, we have almost surely
  
  E.9
  
  For the last step of the proof, let and be two constants such that . It can always be assumed that . With the notation , the solution of
  
  is This solution belongs to the interval when n is large enough. In this case
  
  is one of the two extremities of the interval , and thus
  
  This result, (E.9), the fact that min_α Q _n(α) ≤ Q _n(α ₀) = 0 and (E.8) show that
  
  Since is an arbitrarily small interval that contains α ₀ and , the conclusion follows.
2. It can be seen that the constant 1 does not play any particular role and can be replaced by any other positive number ω . However, we cannot conclude that almost surely because , but is not a constant. In contrast, it can be shown that under the strict stationarity condition the constrained estimator does not converge to α ₀ when ω ≠ ω ₀ .

Chapter 8

Let the Lagrange multiplier λ ∈ ℝ^p . We have to maximise the Lagrangian

Since at the optimum

the solution is such that Since we obtain , and then the solution is
8.2 Let K be the p × n matrix such that K(1, i ₁) = ⋯ = K(p, i _p) = 1 and whose the other elements are 0. Using Exercise 8.1, the solution has the form
E.10

Instead of the Lagrange multiplier method, a direct substitution method can also be used.

The constraints can be written as

where H is n × (n − p), of full column rank, and x ^* is (n − p) × 1 (the vector of the non‐zero components of x ). For instance: (i) if n = 3, x ₂ = x ₃ = 0 then x ^* = x ₁ and ; (ii) if n = 3, x ₃ = 0 then x ^* = (x ₁, x ₂)^′ and .

If we denote by Col(H) the space generated by the columns of H , we thus have to find

where ‖.‖_J is the norm .

This norm defines the scalar product 〈z, y〉_J = z ^′ Jy . The solution is thus the orthogonal (with respect to this scalar product) projection of x ₀ on Col(H). The matrix of such a projection is

Indeed, we have P ² = P , PHz = Hz , thus Col(H) is P ‐invariant, and 〈Hy, (I − P)z〉_J = y ^′ H ^′ J(I − P)z = y ^′ H ^′ Jz − y ^′ H ^′ JH(H ^′ JH)⁻¹ H ^′ Jz = 0, thus z − Pz is orthogonal to Col(H).

It follows that the solution is

E.11

This last expression seems preferable to (E.10) because it only requires the inversion of the matrix H ^′ JH of size n − p , whereas in (E.11) the inverse of J , which is of size n , is required.
8.3 In case (a), we have

and then

and, using (E.11),

which gives a constrained minimum at

In case (b), we have

E.12

and, using (E.11), a calculation, which is simpler than the previous one (we do not have to invert any matrix since H ^′ JH is scalar), shows that the constrained minimum is at

The same results can be obtained with formula (E.10), but the computations are longer, in particular because we have to compute

E.13
8.4 Matrix J ⁻¹ is given by (E.13). With the matrix K ₁ = K defined by (E.12), and denoting by K ₂ and K ₃ the first and second rows of K , we then obtain

It follows that the solution will be found among (a) λ = Z ,

The value of Q(λ) is 0 in case (a), in case (b), in case (c) and in case (d).

To find the solution of the constrained minimisation problem, it thus suffices to take the value λ which minimizes Q(λ) among the subset of the four vectors defined in (a)–(d) which satisfy the positivity constraints of the two last components.

We thus find the minimum at λ ^Λ = Z = (−2, 1, 2)^′ in case (i), at

and
8.5 Recall that for a variable Z ∼ 풩(0, 1), we have EZ ⁺ = − EZ ⁻ = (2π)^−1/2 and . We have

It follows that

The coefficient of the regression of Z ₁ on Z ₂ is −ω ₀ . The components of the vector (Z ₁ + ω ₀ Z ₂, Z ₂) are thus uncorrelated and, this vector being Gaussian, they are independent. In particular , which gives . We thus have

Finally,

It can be seen that

is a positive semi‐definite matrix.
8.6 At the point θ ₀ = (ω ₀, 0, …, 0), we have

and the information matrix (written for simplicity in the ARCH(3) case) is equal to

This matrix is invertible (which is not the case for a general GARCH(p, q)). We finally obtain
8.7 We have ,and

and thus

In view of Theorem 8.1 and (8.15), the asymptotic distribution of is that of the vector λ ^Λ defined by

We have , thus

Since the components of the Gaussian vector (Z ₁ + ω ₀ Z ₂, Z ₂) are uncorrelated, they are independent, and it follows that

We then obtain

Let f(z ₁, z ₂) be the density of Z , that is, the density of a centred normal with variance (κ _η − 1)J ⁻¹ . It is easy to show that the distribution of admits the density and to check that this density is asymmetric.

A simple calculation yields . From , we then obtain . And from we obtain . Finally, we obtain
8.8 The statistic of the C test is 풩(0, 1) distributed under H ₀ . The p ‐value of C is thus . Under the alternative, we have almost surely as n → + ∞. It can be shown that log{1 − Φ(x)}∼ − x ²/2 in the neighbourhood of +∞. In Bahadur's sense, the asymptotic slope of the C test is thus

The p ‐value of C ^* is . Since log 2{1 − Φ(x)}∼ − x ²/2 in the neighbourhood of +∞, the asymptotic slope of C ^* is also c ^*(θ) = θ ² for θ > 0. The C and C ^* tests having the same asymptotic slope, they cannot be distinguished by the Bahadur approach.

We know that C is uniformly more powerful than C ^* . The local power of C is thus also greater than that of C ^* for all τ > 0. It is also true asymptotically as n → ∞, even if the sample is not Gaussian. Indeed, under the local alternatives , and for a regular statistical model, the statistic is asymptotically 풩(τ, 1) distributed. The local asymptotic power of C is thus γ(τ) = 1 − Φ(c − τ) with c = Φ⁻¹(1 − α). The local asymptotic power of C ^* is γ ^*(τ) = 1 − Φ(c ^* − τ) + Φ(−c ^* − τ), with c ^* = Φ⁻¹(1 − α/2). The difference between the two asymptotic powers is

and, denoting the 풩(0, 1) density by φ(x), we have

where

Since 0 < c < c ^* , we have

Thus, g(τ) is decreasing on [0, ∞). Note that g(0) > 0 and . The sign of g(τ), which is also the sign of D ^′(τ), is positive when τ ∈ [0, a] and negative when τ ∈ [a, ∞), for some a > 0. The function D thus increases on [0, a] and decreases on [a, ∞). Since D(0) = 0 and , we have D(τ) > 0 for all τ > 0. This shows that, in Pitman's sense, the test C is, as expected, locally more powerful than C ^* in the Gaussian case, and locally asymptotically more powerful than C ^* in a much more general framework.
8.9 The Wald test uses the fact that

To justify the score test, we remark that the log‐likelihood constrained by H ₀ is

which gives as constrained estimator of σ ² . The derivative of the log‐likelihood satisfies

at . The first component of this score vector is asymptotically 풩(0, 1) distributed under H ₀ . The third test is of course the likelihood ratio test, because the unconstrained log‐likelihood at the optimum is equal to whereas the maximal value of the constrained log‐likelihood is . Note also that under H ₀ .

The asymptotic level of the three tests is of course α , but using the inequality for x > 0, we have

with almost surely strict inequalities in finite samples, and also asymptotically under H ₁ . This leads us to think that the Wald test will reject more often under H ₁ .

Since is invariant by translation of the X _i , tends almost surely to σ ² both under H ₀ and under H ₁ , as well as under the local alternatives . The behaviour of under H _n(τ) is the same as that of under H ₀ , and because

under H ₀ , we have both under H ₀ and under H _n(τ). Similarly, it can be shown that under H ₀ and under H _n(τ). Using these two results and x/(1 + x)∼ log(1 + x) in the neighbourhood of 0, it can be seen that the statistics L _n , R _n and W _n are equivalent under H _n(τ). Therefore, the Pitman approach cannot distinguish the three tests.

Using for x in the neighbourhood of +∞, the asymptotic Bahadur slopes of the tests C ₁ , C ₂ and C ₃ are, respectively

Clearly

Thus the ranking of the tests, in increasing order of relative efficiency in the Bahadur sense, is

All the foregoing remains valid for a regular non‐Gaussian model.
8.10 In Example 8.2, we saw that

Note that Var(Z _d)c corresponds to the last column of VarZ = (κ _η − 1)J ⁻¹ . Thus c is the last column of J ⁻¹ divided by the (d, d)th element of this matrix. In view of Exercise 6.7, this element is . It follows that and . By (8.24), we thus have

This shows that the statistic 2/(κ _η − 1)L _n has the same asymptotic distribution as the Wald statistic W _n , that is, the distribution in the case d ₂ = 1.
8.11 Using (8.29) and Exercise 8.6, we have

The result then follows from (8.30).
8.12 Since XY = 0 almost surely, we have P(XY ≠ 0) = 0. By independence, we have P(XY ≠ 0) = P(X ≠ 0 and Y ≠ 0) = P(X ≠ 0)P(Y ≠ 0). It follows that P(X ≠ 0) = 0 or P(Y ≠ 0) = 0.

Chapter 9

Substituting y = x/σ _t , and then integrating by parts, we obtain

Since and belong to the σ ‐field ℱ _t − 1 generated by {ε_u : u < t}, and since the distribution of ε_t given ℱ _t − 1 has the density , we have

and the result follows. We can also appeal to the general result that a score vector is centred.
9.2 It suffices to use integration by parts.
9.3 We have

Thus

when X∼풩(θ ₀, σ ²), and

when X∼풩(θ, σ ²). Note that

as in Le Cam's third lemma.
9.4 Recall that

and

Using the ergodic theorem, the fact that (1 − |η _t|^λ) is centred and independent of the past, as well as elementary calculations of derivatives and integrals, we obtain

and

almost surely.
9.5 Jensen's inequality entails that

where the inequality is strict if σf(ησ)/f(η) is non‐constant. If this ratio of densities were almost surely constant, it would be almost surely equal to 1, and we would have

which is possible only when σ = 1.
9.6 It suffices to note that .
9.7 The second‐order moment of the double Γ(b, p) distribution is p(p + 1)/b ² . Therefore, to have , the density f of η _t must be the double . We then obtain

Thus and . We then show that κ _η ≔ ∫ x ⁴ f _p(x)dx = (3 + p)(2 + p)/p(p + 1). It follows that .

To compare the ML and Laplace QML, it is necessary to normalise in such a way that E ∣ η _t ∣ = 1, that is, to take the double Γ(p, p) as density f . We then obtain 1 + xf ^′(x)/f(x) = p − p ∣ x∣. We always have , and we have . It follows that , which was already known from Exercise 9.6. This allows us to construct a table similar to Table 9.5.
9.8 Consider the first instrumental density of the table, namely

Denoting by c any constant whose value can be ignored, we have

and thus

Now consider the second density,

We have

which gives

Consider the last instrumental density,

We have

and thus

In each case, does not depend on the parameter λ of h . We conclude that the estimators exhibit the same asymptotic behaviour, regardless of the parameter λ . It can even be easily shown that the estimators themselves do not depend on λ .
9.9
1. The Laplace QML estimator applied to a GARCH in standard form (as defined in Example 9.4) is an example of such an estimator.
2. We have
3. Since
  
  we have
  
  It follows that, using obvious notation,
9.10 After reparameterisation, the result (9.27) applies with η _t replaced by , and θ ₀ by

where ϱ = ∫ ∣ x ∣ f(x)dx . Thus, using Exercise 9.9, we obtain

with .

Chapter 10

The number of parameters of the diagonal GARCH model is

that of the vectorial model is

that of the CCC model is

that of the BEKK model is

For and we obtain Table E.1.

Number of parameters as a function of m.

Model m = 2 m = 3 m = 5 m = 10

Diagonal 9 18 45 165

Vectorial 21 78 465 6105

CCC 19 60 265 2055

BEKK 11 24 96 186
10.2 Assume (10.100) and define . We have (10.101) because

and

Conversely, it is easy to check that (10.101) implies (10.100).
10.3
1. Since and are independent, we have
  
  which shows that is constant.
2. We have
  
  which is nonzero only if , thus and take only one value.
3. Assume that there exist two events and such that and . The independence then entails that
  
  and we obtain a contradiction.
10.4 For all , there exists a symmetric matrix such that , and we have
10.5 The matrix being symmetric and real, there exist an orthogonal matrix ( ) and a diagonal matrix such that Thus, denoting by the (positive) eigenvalues of , we have

where has the same norm as . Assuming, for instance, that we have

Moreover, this maximum is reached at .

An alternative proof is obtained by noting that solves the maximization problem of the function under the constraint . Introduce the Lagrangian

The first‐order conditions yield the constraint and

This shows that the constrained optimum is located at a normalized eigenvector associated with an eigenvalue of , . Since , we of course have .
10.6 Since all the eigenvalues of the matrix are real and positive, the largest eigenvalue of this matrix is less than the sum of all its eigenvalues, that is, of its trace. Using the second equality of (10.67), the first inequality of (10.68) follows. The second inequality follows from the same arguments, and noting that there are eigenvalues. The last inequality uses the fact that the determinant is the product of the eigenvalues and that each eigenvalue is less than .
The first inequality of (10.69) is a simple application of the Cauchy–Schwarz inequality. The second inequality of (10.69) is obtained by twice applying the second inequality of (10.68).
10.7 For the positivity of for all it suffices to require to be symmetric positive definite, and the initial values to be symmetric positive semi‐definite. Indeed, if the are symmetric and positive semi‐definite then is symmetric if and only is symmetric, and we have, for all ,

We now give a second‐order stationarity condition. If exists, then this matrix is symmetric positive semi‐definite and satisfies

that is,

If is positive definite, it is then necessary to have

(E.14)

For the reverse we use Theorem 10.5. Since the matrices are of the form with , the condition is equivalent to (E.14). This condition is thus sufficient to obtain the stationarity, under technical condition (ii) of Theorem 10.5 (which can perhaps be relaxed). Let us also mention that, by analogy with the univariate case, it is certainly possible to obtain the strict stationarity under a condition weaker than (E.14).
10.8 For the convergence in , it suffices to show that is a Cauchy sequence:

when . To show the almost sure convergence, let us begin by noting that, using Hölder's inequality,

with and . Let , for , and which is defined in , a priori. Since

it follows that is almost surely defined in and is almost surely defined in . Since , we have almost surely.
10.9 It suffices to note that is the correlation matrix of a vector of the form , where and are independent vectors of the respective correlation matrices and .
10.10 Since the are linearly independent, there exist vectors such that forms a basis of and such that for all and all . We then have

and it suffices to take

The conditional covariance between the factors and , for , is

which is a nonzero constant in general.
10.11 As in the proof of Exercise 10.10, define vectors such that

Denoting by the th vector of the canonical basis of , we have

and we obtain the BEKK representation with ,
10.12 Consider the Lagrange multiplier and the Lagrangian . The first‐order conditions yield

which shows that is an eigenvector associated with an eigenvalue of . Left‐multiplying the previous equation by , we obtain

which shows that must be the largest eigenvalue of . The vector is unique, up to its sign, provided that the largest eigenvalue has multiplicity order 1.

An alternative way to obtain the result is based on the spectral decomposition of the symmetric definite positive matrices

Let , that is, . Maximizing is equivalent to maximizing . The constraint is equivalent to the constraint . Denoting by the components of , the function is maximized at under the constraint, which shows that is the first column of , up to the sign. We also see that other solutions exist when . It is now clear that the vector contains the principal components of the variance matrix .
10.13 All the elements of the matrices and are positive. Consequently, when is diagonal, using Exercise 10.4, we obtain

element by element. This shows that is diagonal, and the conclusion easily follows.
10.14 With the abuse of notation , the property yields

The proof is completed by induction on .
10.15 Let be the symmetric positive definite matrix defined by . If is an eigenvector associated with the eigenvalue of the symmetric positive definite matrix , then we have , which shows that the eigenvalues of and are the same. Write the spectral decomposition as where is diagonal and . We have , with .
10.16 Let be a nonzero vector such that

On the right‐hand side of the equality, the term in parentheses is nonnegative and the last term is positive, unless . But in this case and the term in parentheses becomes .
10.16 Take the random matrix , where with . Obviously is never positive definite because this matrix always possesses the eigenvalue 0 but, for all , with probability 1.

Model	m = 2	m = 3	m = 5	m = 10
Diagonal	9	18	45	165
Vectorial	21	78	465	6105
CCC	19	60	265	2055
BEKK	11	24	96	186

Chapter 11

For , we obtain the geometric Brownian motion whose solution, in view of (11.13), is equal to

By It 's formula, the SDE satisfied by is

Using the hint, we then have

It follows that

The positivity follows.
11.2 It suffices to check the conditions of Theorem 11.1, with the Markov chain . We have

this inequality being uniform on any ball of radius . The assumptions of Theorem 11.1 are thus satisfied.
11.3 One may take, for instance,

It is then easy to check that the limits in (11.23) and (11.25) are null. The limiting diffusion is thus

The solution of the equation is, using Exercise 11.1 with ,

where is the initial value. It is assumed that and , in order to guarantee the positivity. We have
11.4 In view of (11.34) the put price is We have seen that the discounted price is a martingale for the risk‐neutral probability. Thus . Moreover,

The result is obtained by multiplying this equality by and taking the expectation with respect to the probability .
3.5 A simple calculation shows that
11.6 In view of (11.36), It 's formula applied to yields

with, in particular, In view of Exercise 11.5, we thus have
11.7 Given observations of model (11.31), and an initial value , the maximum likelihood estimators of and are, in view of (11.33),

The maximum likelihood estimator of is then
3.8 Denoting by the density of the standard normal distribution, we have

It is easy to verify that . It follows that

The option buyer wishes to be covered against the risk: he thus agrees to pay more if the asset is more risky.
11.9 We have where It follows that

The property immediately follows.
11.10 The volatility is of the form with Using the results of Chapter 2, the strict and second‐order stationarity conditions are

We are in the framework of model (11.44), with . Thus the risk‐neutral model is given by (11.47), with and
11.11 The constraints (11.41) can be written as

It can easily be seen that if we have, for and for all ,

Writing

we thus obtain

and writing

we have

It follows that

Thus

There are an infinite number of possible choices for and . For instance, if , one can take and with . Then follows. The risk‐neutral probability is obtained by calculating

Under the risk‐neutral probability, we thus have the model

(E.15)

Note that the volatilities of the two models (under historical and risk‐neutral probability) do not coincide unless for all .
11.12 We have It can be shown that the distribution of has the density At horizon 2, the VaR is thus the solution of the equation For instance, for we obtain , whereas The VaR is thus underevaluated when the incorrect rule is applied, but for other values of the VaR may be overevaluated: , whereas
11.13 We have

Thus, introducing the notation ,

The conditional law of is thus the distribution, and (11.58) follows.
11.14 We have

At horizon 2, the conditional distribution of is not Gaussian if , because its kurtosis coefficient is equal to

There is no explicit formula for when .
11.15 It suffices to note that, conditionally on the available information , we have
11.16 For simplicity, in this proof we will omit the indices. Since has the same distribution as , where denotes a variable uniformly distributed on , we have

Using (11.63), the desired equality follows.
11.17 The monotonicity, homogeneity and invariance properties follow from (11.62) and from the VaR properties. For we have

Note that

because the two bracketed terms have the same sign. It follows that

The property is thus shown.
11.18 The volatility equation is

It follows that

We have , the inequality being strict because the distribution of is nondegenerate. In view of Theorem 2.1, this implies that a.s., and thus that a.s., when tends to infinity.
11.19 Given, for instance, , we have and the distribution of

is not normal. Indeed, and , but . Similarly, the variable

is centered with variance 2, but is not normally distributed because

Note that the distribution is much more leptokurtic when is close to 0.

Chapter 12

Recall that the expectation of an infinite product of independent variables is not necessarily equal to the product of the expectations (see Exercise 4.2). This explains why it seems necessary to impose the finiteness of the product of the E{exp(σ ∣ β ⁱ ∣)} (instead of the α _i s).
We have, using the independence assumptions on the sequences (η _t) and ( ),

provided that the expectation of the term between accolades exists and is finite. To show this, write

We have

Thus EZ _{t, ∞} < ∞. Similarly, the same arguments show that

Moreover, for all k > 0,

using again the independence between (η _t) and ( ).
12.2 We have, for any x ≥ 0:

where the last equality holds because η _t and (h _t) are independent, and because the law of η _t is symmetric. The same arguments show that for x ≥ 0. Thus, there is a one‐to‐one relation between the law of ε _t and that of . In addition the law of ε _t is symmetric. Similarly, for any n ≥ 1, one can show that there is a one‐to‐one relation between the law of (ε _t, …, ε _t + n) and that of . When the distribution of η _t is not symmetric, the fourth equality of the previous computation fails.
12.3 Let Y _t = log h _t . The process (Y _t) being the solution of the AR(1) model Y _t = ω + βY _t − 1 + σv _t its mean and autocovariance function are given by

From the independence between (Y _t) and (Z _t), (X _t) is a second‐order process whose mean and autocovariance function are obtained as follows:

Since γ _X(k) = βγ _X(k − 1), ∀ k > 1, the process (X _t) admits an ARMA(1,1) representation of the form (12.7). The constant α is deduced from the first two autocovariances of (X _t). By Eq. (12.7) we have, denoting by , the variance of the noise in this representation

Hence, if , the coefficient α is a solution of

(E.16)

and the solution of modulus less than 1 is given by

Moreover, the variance of the noise in model (12.7) is if β ≠ 0 (and if β = 0). Finally, if the relation γ _X(k) = βγ _X(k − 1) also holds for k = 1 and (X _t) is an AR(1) (i.e. α = 0 in model (12.7)).

Now, when β ≠ 0 and σ ≠ 0, we get , using (E.16). It follows that either 0 < α < β < 1/α or 0 > α > β > 1/α . In particular ∣α ∣ < ∣ β∣, which shows that the orders of the ARMA(1,1) representation for X _t are exact.
12.4 By expansion (12.3) and arguments used to prove Proposition 12.7, we obtain

We also have,

Thus

for ρ ≠ 0 and ∣β ∣ < 1.
12.5 The estimated models on the return series {r _t, t = 2, …, 2122} and {r _t, t = 2123, …, 4245} have the volatilities

Denote by θ ⁽¹⁾ = (0.098, 0.087, 0.84)^′ and θ ⁽²⁾ = (0.012, 0.075, 0.919)^′ the parameters of the two models. The estimated values of ω and β seem quite different. Denote by and the estimated standard deviations of the estimators of ω and β of Model Mi . It turns out that the confidence intervals

and

have empty intersection. The same holds true for the confidence intervals

and

The third graph of Figure E.4 displays the boxplot of the distribution of on 100 independent simulations of Model M1. The difference θ ⁽²⁾ − θ ⁽¹⁾ between the parameters of M1 and M2 is marked by a diamond shape. The difference θ ⁽²⁾ − θ ⁽¹⁾ is an outlier for the distribution of , meaning that estimated GARCH on the two periods are significantly distinct.

Figure E.4 The parameter θ ⁽¹⁾ (respectively, θ ⁽²⁾ ) is that of a GARCH(1,1) fitted on the CAC 40 returns from March 1, 1990 to September 3, 1998 (respectively, from September 4, 1998 to December 29, 2006). The box plots display the empirical distributions of the estimated parameters on 100 simulations of the model fitted on the first part of the CAC.
12.6 is a Markov chain, whose initial probability distribution is given by: , and whose transition matrix is:
(E.17)

The number of balls in urn changes successively from odd to even, and conversely, along the steps. For instance . Thus the chain is irreducible but periodic.

Using the formula , it can be seen that is an invariant law. It follows that for all .

When the initial distribution is the Dirac mass at 0 we have when is odd, and when is even. Thus does not exist.
12.7 Let i and j be two different states, and let d(i) be the period of state i . If the chain is irreducible, there exists an integer m ₁ such that and m ₂ such that . The integer d(i) divides m ₁ + m ₂ since . Similarly d(i) divides m ₁ + m + m ₂ for all m ∈ {m : p ^(m)(j, j) > 0}. Using m = m + m ₁ + m ₂ − (m ₁ + m ₂) = k ₁ d(i) − k ₂ d(i) = (k ₁ − k ₂)d(i), it follows that d(i) divides m for all m ∈ {m : p ^(m)(j, j) > 0}. Since d(j) is the gcd of {m : p ^(m)(j, j) > 0}, and we have just shown that d(i) is a common divisor of all the elements of this set, it follows that d(i) ≤ d(j). By symmetry, we also have d(j) ≤ d(i).

12.8 The key part of the code is the following:

# one iteration of the EM algorithm
EM <- function(omega,pi0,p,y){
d<-length(omega)
n <- length(y) # y contient les n observations
vrais<-0
pit.t<-matrix(0,nrow=d,ncol=n)
pit.tm1<-matrix(0,nrow=d,ncol=n+1)
vecphi<-rep(0,d)
pit.tm1[,1]<-pi0
for (t in 1:n) {
 for (j in 1:d) vecphi[j]<-{dnorm(y[t],
      mean=0,sd=sqrt(abs(omega[j])))}
 den<-sum(pit.tm1[,t]*vecphi)
 if(den<=0)return(Inf)
 pit.t[,t]<-(pit.tm1[,t]*vecphi)/den
 pit.tm1[,t+1]<-t(p)%*%pit.t[,t]
 vrais<-vrais+log(den)
        }
pit.n<-matrix(0,nrow=d,ncol=n)
pit.n[,n]=pit.t[,n]
for (t in n:2) {
 for (i in 1:d) {
 pit.n[i,t-1]<- {pit.t[i,t-1]*sum(p[i,1:d]*
         pit.n[1:d,t]/pit.tm1[1:d,t])}
         } }
pitm1et.n<-array(0,dim=c(d,d,n))
for (t in 2:n) {
 for (i in 1:d) {
 for (j in 1:d) {
 pitm1et.n[i,j,t]<-p[i,j]*pit.t[i,t-1]*pit.n[j,t]/pit.tm1[j,t]
         } } }
omega.final<-omega
pi0.final<-pi0
p.final<-p
for (i in 1:d)  {
omega.final[i]<-sum((y[1:n]∧2)*pit.n[i,1:n])/sum(pit.n[i,1:n])
pi0.final[i]<-pit.n[i,1]
 for (j in 1:d) {
 p.final[i,j]<-sum(pitm1et.n[i,j,2:n])/sum(pit.n[i,1:(n-1)])
        } }
liss<-{list(probaliss=pit.n,probatransliss=pitm1et.n,
vrais=vrais,omega.final=omega.final,pi0.final=pi0.final,
        p.final=p.final)}
               }

12.9 The last equality of Step 2 of the EM algoritm shows that π _{t − 1, t ∣ n}(i ₀, j ₀) = 0 for all t . Point 3, then shows that p(i ₀, j ₀) ≡ 0 in all the subsequent steps of the algorithm.
12.10 We have the Markovian representation , with

The proof of Theorem 2.4 in Chapter 2 applies directly with this sequence (A _t), showing that there exists a strictly stationary solution if and only if the top Lyapunov exponent of (A _t) is strictly negative. The solution is then unique, non‐anticipative, and ergodic, and takes the form (2.18).
12.11 As in Example 2.1, the last exercise shows that the necessary and sufficient strict stationarity condition is

In the ARCH(1) case with d regimes, we obtain the necessary and sufficient condition
12.12 If (ε _t) is a strictly stationary and non‐anticipative ³ solution and if the sequence (Δ_t) is iid, then α(Δ_t) and β(Δ_t) are independent of and . If in addition then, setting , we have

For the existence of a positive solution to this equation, it is necessary to have

Conversely, under this condition, the process

is a strictly stationary and non‐anticipative solution which satisfies
12.13 Using the elementary inequality log x ≤ x − 1, we have

and the result follows.
12.14 Conditional to the initial variables , Equations (12.25), (12.22)–(12.23), (12.26)–(12.28), and (12.29)–(12.31) remain valid, provided the density φ _k(ε _t) is replaced by the density φ _k(ε _t ∣ ε _t − 1, …, ε _t − q) of the Gaussian distribution (and by replacing the notation M(ε _t) by M(ε _t ∣ ε _t − 1, …, ε _t − q)).
The EM algorithm cannot be generalised trivially because the maximisation of Eq. (12.32) is replaced by that of

which does not admit an explicit form like (12.35) but requires the use of an optimisation algorithm.
12.15 For the MS‐GARCH(1,1) model , with

we have

under conditions entailing the existence of the series. For the alternative model, ε _t = σ _t(Δ_t)η _t with

we have

Let ℱ_t be the sigma‐field generated by the past observations {ε _u, u < t}, and by the past and present value of the chain {Δ_u, u ≤ t}. We have

but, given the past observations, only depends on Δ_t , whereas h _t depends also on {Δ_u, u < t}.

This entails differences between the two models, in terms of probabilistic properties (the stationarity conditions are easier to obtain for the standard MS‐GARCH model, but they have also been obtained by Liu (2006) for the alternative model), of statistical inference (the fact that only depends on Δ_t renders the alternative model much easier to estimate), and also on the dynamics behaviour and on the interpretation of the parameters.

For instance, for the MS‐GARCH, β(i) can be interpreted as a parameter of inertia of the volatility in regime i : if the volatility h _t − 1 is high and β(i) is close to 1, the next volatility h _t will remain high in regime i . This interpretation is no more valid for the alternative model, since may not be equal to .