Note that, by Jensen's inequality, this correlation is positive.
which gives , and the result follows.
Let U be a random variable, uniformly distributed on {0, 1}. We define the process (Y t ) t = 0, 1, … by
for any ω ∈ Ω and any t ≥ 0. The process (Y t ) is stationary. We have in particular EY t = 0 and Cov(Y t , Y t + h ) = (−1) h . With probability 1/2, the realisation of the stationary process (Y t ) will be the sequence {(−1) t } (and with probability 1/2, it will be {(−1) t + 1}).
This example leads us to think that it is virtually impossible to determine whether a process is stationary or not, from the observation of only one trajectory, even of infinite length. However, practitioners do not consider {(−1) t } as a potential realisation of the stationary process (Y t ). It is more natural, and simpler, to suppose that {(−1) t } is generated by the non‐stationary process (X t ).
Let Ω* = {ω ∣ X 2t = 1, X 2t + 1 = 0, ∀ t}. If (X t ) is ergodic and stationary, the empirical means and both converge to the same limit P[X t = 1] with probability 1, by the ergodic theorem. For all ω ∈ Ω* these means are, respectively, equal to 1 and 0. Thus P(Ω*) = 0. The probability of such a trajectory is thus equal to zero for any ergodic and stationary process.
This value can be arbitrarily larger than 1, which is the value of the asymptotic variance of the empirical autocorrelations of a strong white noise.
where ∣b∣ < 1 and (u t ) is a white noise of variance σ 2 . The coefficients b and σ 2 are determined by
which gives and σ 2 = 2/b .
Since , for k ≠ h the asymptotic variance can be arbitrarily smaller than 1, which corresponds to the asymptotic variance of the empirical autocorrelations of a strong white noise.
exists in ℝ ∪ + {∞}. Using Beppo Levi's theorem,
which shows that the limit is finite almost surely. Thus, as n → ∞, u t (n) converges, both almost surely and in quadratic mean, to . Since
we obtain, taking the limit as n → ∞ of both sides of the equality, u t = au t − 1 + η t . This shows that (X t ) = (u t ) is a stationary solution of the AR(1) equation.
Finally, assume the existence of two stationary solutions to the equation X t = aX t − 1 + η t and u t = au t − 1 + η t . If , then
which entails
This is in contradiction to the assumption that the two sequences are stationary, which shows the uniqueness of the stationary solution.
as k → ∞. If (X t ) were stationary,
and we would have
This is impossible, because by the Cauchy–Schwarz inequality,
We thus have Eε t = 0 and, for all h > 0,
which confirms that ε t is a white noise.
With the change of index h = i − ℓ, we obtain
which gives (B.14), using the parity of the autocovariance functions.
for (i, j) ≠ (0, 0) and
Thus
In formula (B.15), we have ij = 0 when i ≠ j and ii = 1. We also have when i ≠ j and for all i ≠ 0. Since , we obtain
For significance intervals C h of asymptotic level 1 − α , such that , we have
By definition of C h ,
Moreover,
We have used the convergence in law of to a vector of independent variables. When the observed process is not a noise, this asymptotic independence does not hold in general.
For m = 20 and α = 5%, this limit is equal to 0.36. The probability of not rejecting the right model is thus low.
Then, step (B.9) yields
Finally, step (B.8) yields
> # reading the SP500 data set
> sp500data <- read.table("sp500.csv",header=TRUE,sep=",")
> sp500<-rev(sp500data$Close) # closing price
> n<-length(sp500)
> rend<-log(sp500[2:n]/sp500[1:(n-1)]); rend2<-rend∧2
> op <- par(mfrow = c(2, 2)) # 2 × 2 figures per page
> plot(ts(sp500),main="SP 500 from 1/3/50 to 7/24/09",
+ ylab="SP500 Prices",xlab="")
> plot(ts(rend),main="SP500 Returns",ylab="SP500 Returns",
+ xlab="")
> acf(rend, main="Autocorrelations of the returns",xlab="",
+ ylim=c(-0.05,0.2))
> acf(rend2, main="ACF of the squared returns",xlab="",
+ ylim=c(-0.05,0.2))
> par(op)
For the multiplicative norm ‖A‖ = ∑ ∣ a ij ∣, we have The result follows immediately.
When A is any square matrix, the Jordan representation can be used. Let n i be the multiplicity of the eigenvalue λ i . We have the Jordan canonical form A = P −1 JP , where P is invertible, and J is the block‐diagonal matrix with a diagonal of m matrices J i (λ i ), of size n i × n i , with λ i on the diagonal, 1 on the superdiagonal, and 0 elsewhere. It follows that A t = P −1 J t P , where J t is the block‐diagonal matrix whose blocks are the matrices . We have , where N i is such that . It can be assumed that ∣λ 1 ∣ > ∣ λ 2 ∣ > ⋯ > ∣ λ m ∣ . It follows that
as t → ∞, and the proof easily follows.
and thus
Using Eq. (2.21) and the ergodic theorem, we obtain
Consequently, γ < 0 if and only if ρ(A) < exp(−E log ∣ z t ∣).
which completes the proof of 1.
We have shown that, for any , the stationary sequences and have the same top Lyapunov exponent, i.e.
The convergence follows by showing that .
To show that the norm N 1 is not multiplicative, consider the matrix A whose elements are all equal to 1: we then have N 1(A) = 1 but N 1(A 2) > 1.
and
(see Theorem 2.5 and Remark 2.6(1)). The strictly stationary solution satisfies
in ℝ ∪ {+∞}. Moreover,
which gives
Using this relation in the previous expression for , we obtain
If , then the term in brackets on the left‐hand side of the equality must be strictly positive, which gives the condition for the existence of the fourth‐order moment. Note that the condition is not symmetric in α 1 and α 2 . In Figure E.2, the points (α 1, α 2) under the curve correspond to ARCH(2) models with a fourth‐order moment. For these models,
where is a (weak) white noise. The author correlation of thus satisfies
Using the MA(∞) representation
we obtain
and
It follows that the lag 1 autocorrelation is
The other autocorrelations are obtained from (E.1) and . To determine the autocovariances, all that remains is to compute
which is given by
We have
The eigenvalues of A (2) are 0, 0, 0 and 3α 2 + 2αβ + β 2 , thus I 4 − A (2) is invertible (0 is an eigenvalue of I 4 − A (2) if and only if 1 is an eigenvalue of A (2) ), and the system (2.63) admits a unique solution. We have
The solution to Eq. (2.63) is
As first component of this vector, we recognise , and the other three components are equal to . Equation (2.64) yields
which gives , but with tedious computations, compared to the direct method utilised in Exercise 2.8.
and the result follows.
The convergence follows from the Borel–Cantelli lemma.
Now, let (X n ) be an iid sequence of random variables with density f(x) = x −2 x ≥ 1 . For all K > 0, we have
The events {n −1 X n > K} being independent, we can use the counterpart of the Borel–Cantelli lemma: the event {n −1 X n > K for an infinite number of n} has probability 1. Thus, with probability 1, the sequence (n −1 X n ) does not tend to 0.
the first line of EB t + 1 B t …B 1 is thus the product of the first line of EB t + 1 and of EB t …B 1 . The conclusion follows.
The first inequality uses (a + b) s ≤ a s + b s for a, b ≥ 0 and s ∈ (0, 1]. The second inequality is a consequence of . The second convergence then follows from the dominated convergence theorem.
for any i ′ = 1, …, ℓ, j ′ = 1, …, m . In view of the independence between X n and Y , it follows that almost surely as n → ∞. Since is a strictly positive number, we obtain almost surely, for all i ′, j ′ . Using (a + b) s ≤ a s + b s once again, it follows that
where is independent of A k A k − 1…A N + 1. The general term a i, j of A N …A 1 is the (i, j)th term of the matrix A N multiplied by a product of variables. The assumption A N > 0 entails a i, j > 0 almost surely for all i and j . It follows that the i th component of Y satisfies Y i > 0 almost surely for all i . Thus . Now the previous question allows to affirm that E(‖A k A k − 1…A N + 1‖ s ) → 0 and, by strict stationarity, that E(‖A k − N A k − N − 1…A 1‖ s ) → 0 as k → ∞. It follows that there exists k 0 such that
The last inequalities imply β 1 ≥ 0. Finally, the positivity constraints are
If q = 2, these constraints reduce to
Thus, we can have α 2 < 0.
If the last equality is true, it remains true when h is replaced by h + 1 because . Since , it follows that for all h ≥ 0. Moreover,
Since , if then we have, for all h ≥ 1, We have thus shown that the sequence is decreasing when . If , it can be seen that for h large enough, say h ≥ h 0 , we have , again because of . Thus, the sequence is decreasing.
Since in probability, there exist K 0 ∈ ℝ and n 0 ∈ ℕ such that P(X n < K 0/2) ≤ ς < 1 for all n ≥ n 0 . Consequently,
as n → ∞, for all K ≤ K 0 , which entails the result.
as n → ∞. If γ < 0, the Cauchy rule entails that
converges almost surely, and the process (ε t ), defined by , is a strictly stationary solution of model (2.7). As in the proof of Theorem 2.1, it can be shown that this solution is unique, non‐anticipative and ergodic. The converse is proved by contradiction, assuming that there exists a strictly stationary solution . For all n > 0, we have
It follows that a(η −1)…a(η −n )ω(η −n − 1) converges to zero, almost surely, as n → ∞ , or, equivalently, that
We first assume that E log {a(η t )} > 0. Then the strong law of large numbers entails almost surely. For (E.2) to hold true, it is then necessary that log ω(η −n − 1) → − ∞ almost surely, which is precluded since (η t ) is iid and ω(η 0) > 0 almost surely. Assume now that E log {a(η t )} = 0. By the Chung–Fuchs theorem, we have with probability 1 and, using Exercise 2.17, the convergence (E.2) entails log ω(η −n − 1) → − ∞ in probability, which, as in the previous case, entails a contradiction.
Regardless of the value of , fixed or even random, we have almost surely
using the law of large numbers and Jensen's inequality. It follows that almost surely as t → ∞.
which is equivalent to showing that
with X = |η 0|2q , . This inequality holds true by Hölder's inequality. The same argument is used to show the convexity of . It follows that f is convex, as a sum of convex functions. We have f(1) = 0 and f(p) < 0, thus the left derivative of f at 1 is negative, which gives the result.
since the condition for the existence of is . Note that when the GARCH effect is weak (that is, α 1 is small), the part of the variance that is explained by this regression is small, which is not surprising. In all cases, the ratio of the variances is bounded by 1/κ η , which is largely less than 1 for most distributions (1/3 for the Gaussian distribution). Thus, it is not surprising to observe disappointing R 2 values when estimating such a regression on real series.
Moreover, λ is a maximal measure of irreducibility.
Thus μ is an invariant probability measure.
Conversely, suppose that μ is invariant. Using the Chapman–Kolmogorov relation, by which ∀t ∈ ℕ, ∀ s, 0 ≤ s ≤ t, ∀ x ∈ E, ∀ B ∈ ℰ,
we obtain
Thus, by induction, for all t , ℙ[X t ∈ B] = μ(B) (∀B ∈ ℬ ). Using the Markov property, this is equivalent to the strict stationarity of the chain: the distribution of the process (X t , X t + 1, …, X t + k ) is independent of t , for any integer k .
Thus π is invariant. The third equality is an immediate consequence of the Fubini and Lebesgue theorems.
Now let B ∈ ℰ . Then for all x ∈ C ,
The measure ν is non‐trivial since ν(E) = δλ(C) = 2δc > 0.
Thus if K 1 < 1, we have, for K 1 < K < 1 and for g(x) > (K 2 + 1 − K 1)/(K − K 1),
If we put A = {x; g(x) = 1 + ∣ x ∣ ≤ (K 2 + 1 − K 1)/(K − K 1)}, the set A is compact and the conditions of Theorem 3.1 are satisfied, with 1 − δ = K .
It follows that
because V ≥ 1. Thus, there exists κ > 0 such that
Note that the positivity of δ is crucial for the conclusion.
The inequality is justified by (i) and the fact that P f is a continuous positive function. It follows that for f = C , where C is a compact set, we obtain
which shows that,
(that is, π is subvarient) using (ii). If there existed B such that the previous inequality were strict, we should have
and since π(E) < ∞ we arrive at a contradiction. Thus
which signifies that π is invariant.
where
Inequality (A.8) shows that d 7 is bounded by
By an argument used to deal with d 6 , we obtain
and the conclusion follows.
is continuous at x when g is continuous.
To show that the irreducibility condition (ii) is not satisfied, consider the set of numbers in [0,1] such that the sequence of decimals is periodic after a certain lag:
For all h ≥ 0, if and only if . We thus have,
and,
This shows that there is no non‐trivial irreducibility measure.
The drift condition (iii) is satisfied with, for instance, a measure φ such that φ([−1, 1]) > 0, the energy V(x) = 1 + ∣ x∣ and the compact set A = [−1, 1]. Indeed,
provided
Using the independence between and the other variables of , we have, for all ,
when the distribution is symmetric.
and, by continuity of the exponential,
is finite if and only if the series of general term converges. Using the inequalities , we obtain
Since the tend to 0 at an exponential rate and , the series of general term converges absolutely, and we finally obtain
which is finite under condition (4.12).
with probability 1. The integral of a positive measurable function being always defined in , using Beppo Levi's theorem and then the independence of the , we obtain
which is of course finite under condition (4.12). Applying the dominated convergence theorem, and bounding the variables by the integrable variable , we then obtain the desired expression for .
and . With the notation , it follows that
It then suffices to use the fact that is equivalent to , and that is thus equivalent to , in a neighborhood of 0.
where is a white noise with variance . Using and
the coefficients and are such that
and
When, for instance, , , and , we obtain
If the volatility is a positive function of that possesses a moment of order 2, then
under conditions (4.34). Thus, condition (4.38) is necessarily satisfied. Conversely, under (4.38) the strict stationarity condition is satisfied because
and, as in the proof of Theorem 2.2, it is shown that the strictly stationary solution possesses a moment of order 2.
and
Using , we obtain
We then obtain the autocovariances
and the autocorrelations . Note that for all , which shows that is a weak ARMA process. In the standard GARCH case, the calculation of these autocorrelations would be much more complicated because is not a linear function of .
and this solution possesses a moment of order 2 when
which is the case, in particular, for . In the Gaussian case, we have
and
using the calculations of Exercise 4.5 and
Since is an increasing function, provided , we observe the leverage effect .
and the conclusion follows from the Borel–Cantelli lemma.
and
as h → 0. The conclusion follows.
Using the indication, one can check that and
We thus conclude by noting that
The sequence (ε t ε t + h , ℱ t + h ) t is thus a stationary sequence of square integrable martingale increments. We thus have
where . To conclude, 1 it suffices to note that
in probability (and even in L 2 ).
Its fourth‐order moment is
Thus,
Moreover,
Using Exercise 5.1, we thus obtain
By the ergodic theorem, the denominator converges in probability (and even a.s.) to γ ε(0) = ω/(1 − α) ≠ 0. In view of Exercise 5.2, the numerator converges in law to . Cramér's theorem 2 then entails
The asymptotic variance is equal to 1 when α = 0 (that is, when ε t is a strong white noise). Figure E.3 shows that the asymptotic distribution of the empirical autocorrelations of a GARCH can be very different from those of a strong white noise.
In view of Exercises 5.1 and 5.3,
for any h ≠ 0.
Similarly, Eε t ε t + 1ε s ε s + 2 = 0 when t + 1 > s + 2. When t + 1 = s + 2, we have
because ε t − 1 σ t ∈ ℱ t − 1 , , E(η t ∣ ℱ t − 1) = Eη t = 0 and . Using (7.24), the result can be extended to show that Eε t ε t + h ε s ε s + k = 0 when k ≠ h and (ε t ) follows a GARCH(p, q), with a symmetric distribution for η t .
with ω = 1, α = 0.3 and β = 0.55. Thus γ ε(0) = 6.667, , . Thus
for i = 1, …, 5. Finally, using Theorem 5.1,
Since , γ ε(0) = ω/(1 − α) and
(see, for instance, Exercise 2.8), we have
Note that as i → ∞.
where ε t (θ) = Y t − F θ (W t ). We thus have
and, when σ 2 does not depend on θ ,
up to a constant. The constrained estimator is , with . The constrained score and the Lagrange multiplier are related by
On the other hand, the exact laws of the estimators under H 0 are given by
and
with
For the case , we can estimate I 22 by
The test statistic is then equal to
with
and where R 2 is the coefficient of determination (centred if X 1 admits a constant column) in the regression of on the columns of X 2 . For the first equality of (E.3), we use the fact that in a regression model of the form Y = Xβ + U , with obvious notation, Pythagoras's theorem yields
In the general case, we have
Since the residuals of the regression of Y on the columns of X 1 and X 2 are also the residuals of the regression of on the columns of X 1 and X 2 , we obtain LM n by:
Since T is invertible, we have , where Col(Z) denotes the vectorial subspace generated by the columns of the matrix Z , and
If e ∈ Col(X) then and
Noting that , we conclude that
for h = 0, …, q . We then put
and then, for k = 2, …, q (when q > 1),
With standard notation, the OLS estimators are then
and
with equality if and only if , and we are done.
for all t , and consequently , which is not possible.
Using the data, we obtain , and thus . Therefore, the constrained estimate must coincide with one of the following three constrained estimates: that constrained by α 2 = 0, that constrained by α 1 = 0, or that constrained by α 1 = α 2 = 0. The estimate constrained by α 2 = 0 is , and thus does not suit. The estimate constrained by α 1 = 0 yields the desired estimate .
and this estimator satisfies
Under the assumptions of the exercise, the ergodic theorem entails the almost sure convergence
and thus the almost sure convergence of to φ 0 . For the consistency, the assumption suffices.
If , the sequence (ε t X t − 1, ℱ t ) is a stationary and ergodic square integrable martingale difference, with variance
We can see that this expectation exists by expanding the product
The CLT of Corollary A.1 then implies that
and thus
When , the condition suffices for asymptotic normality.
We then have
Similarly,
It follows that C = A −1 BA −1 is of the form
Assume that there exist two solutions of the minimisation problem in C , and . Using the convexity of C , it is then easy to see that satisfies
This is possible only if (once again using the parallelogram identity).
and, dividing by λ ,
Taking the limit as λ tends to 0, we obtain inequality (6.17).
Let z such that, for all y ∈ C , 〈z − x, z − y〉 ≤ 0. We have
the last inequality being simply the Cauchy–Schwarz inequality. It follows that ‖x − z‖ ≤ ‖x − y‖, ∀ y ∈ C . This property characterising x * in view of part 1, it follows that z = x * .
Since and , it follows that , and thus that X n converges to the zero vector of ℝ k .
using Markov's inequality, strict stationarity and the existence of a moment of order s > 0 for .
When κ → ∞, the variable increases to X 1 . Thus by Beppo Levi's theorem converges to E(X 1) = + ∞. It follows that tends almost surely to infinity.
Note that and (7.29) entail that . The limit superior (E.5) being less than any positive number, it is null.
For all c > 0, there exists such that for all t ≥ 0. Note that if and only if c ≠ 1. For instance, for a GARCH(1, 1) model, if we have . Let The minimum of f is obtained at the unique point
If , we have . It follows that c 0 = 1 with probability 1, which proves the result.
In view of (7.41), (7.42), (7.79) and (7.24), we have
and
It follows that
and ℐ is block‐diagonal. It is easy to see that 풥 has the form given in the theorem. The expressions for J 1 and J 2 follow directly from (7.39) and (7.75). The block‐diagonal form follows from (7.76) and (E.6).
Letting ℐ = (ℐ ij ), and , we then obtain
The asymptotic variance of the ARCH parameter estimator is thus equal to : it does not depend on a 0 and is the same as that of the QMLE of a pure ARCH(1) (using computations similar to those used to obtain (7.1.2)).
We note that the estimation of too complicated a model (since the true process is AR(1) without ARCH effect) does not entail any asymptotic loss of accuracy for the estimation of the parameter a 0 : the asymptotic variance of the estimator is the same, , as if the AR(1) model were directly estimated. This calculation also allows us to verify the ‘ α 0 = 0’ column in Table 7.3: for the law we have μ 3 = 0 and κ η = 3; for the normalized χ 2(1) distribution we find and κ η = 15.
It follows that
and, since ε can be chosen arbitrarily small, we have the desired result.
In order to give an example where (7.95) is not satisfied, let us consider the autoregressive model X t = θ 0 X t − 1 + η t where θ 0 = 1 and (η t ) is an iid sequence with mean 0 and variance 1. Let J t (θ) = X t − θX t − 1 . Then J t (θ 0) = η t and the first convergence of the exercise holds true, with J = 0. Moreover, for all neighbourhoods of θ 0 ,
almost surely because the sum in brackets converges to +∞, X t being a random walk and the supremum being strictly positive. Thus (7.95) is not satisfied. Nevertheless, we have
Indeed, converges in law to a non‐degenerate random variable (see, for instance, Hamilton 1994, p. 406) whereas in probability since has a non‐degenerate limit distribution.
Therefore is positive semi‐definite. Thus
Setting x = Jy , we then have
which proves the result.
Since d t → 0 almost surely as t → ∞, the convergence in law of part 2 always holds true. Moreover,
with
which implies that the result obtained in Part 3 does not change. The same is true for Part 4 because
Finally, it is easy to see that the asymptotic behaviour of is the same as that of , regardless of the value that is fixed for ω .
In view of the inequality x ≥ 1 + log x for all x > 0, it follows that
For all M > 0, there exists an integer t M such that for all t > t M . This entails that
Since M is arbitrarily large,
provided that . If is chosen so that the constraint is satisfied, the inequalities
and (E.7) show that
We will define a criterion O n asymptotically equivalent to the criterion Q n . Since a. s. as t → ∞, we have for α ≠ 0,
where
On the other hand, we have
when α 0/α ≠ 1. We will now show that Q n (α) − O n (α) converges to zero uniformly in . We have
Thus for all M > 0 and any ε > 0, almost surely
provided n is large enough. In addition to the previous constraints, assume that . We have for any , and
for any α ≥ α 0 . We then have
Since M can be chosen arbitrarily large and ε arbitrarily small, we have almost surely
For the last step of the proof, let and be two constants such that . It can always be assumed that . With the notation , the solution of
is This solution belongs to the interval when n is large enough. In this case
is one of the two extremities of the interval , and thus
This result, (E.9), the fact that min α Q n (α) ≤ Q n (α 0) = 0 and (E.8) show that
Since is an arbitrarily small interval that contains α 0 and , the conclusion follows.
Since at the optimum
the solution is such that Since we obtain , and then the solution is
Instead of the Lagrange multiplier method, a direct substitution method can also be used.
The constraints can be written as
where H is n × (n − p), of full column rank, and x * is (n − p) × 1 (the vector of the non‐zero components of x ). For instance: (i) if n = 3, x 2 = x 3 = 0 then x * = x 1 and ; (ii) if n = 3, x 3 = 0 then x * = (x 1, x 2)′ and .
If we denote by Col(H) the space generated by the columns of H , we thus have to find
where ‖.‖ J is the norm .
This norm defines the scalar product 〈z, y〉 J = z ′ Jy . The solution is thus the orthogonal (with respect to this scalar product) projection of x 0 on Col(H). The matrix of such a projection is
Indeed, we have P 2 = P , PHz = Hz , thus Col(H) is P ‐invariant, and 〈Hy, (I − P)z〉 J = y ′ H ′ J(I − P)z = y ′ H ′ Jz − y ′ H ′ JH(H ′ JH)−1 H ′ Jz = 0, thus z − Pz is orthogonal to Col(H).
It follows that the solution is
This last expression seems preferable to (E.10) because it only requires the inversion of the matrix H ′ JH of size n − p , whereas in (E.11) the inverse of J , which is of size n , is required.
and then
and, using (E.11),
which gives a constrained minimum at
In case (b), we have
and, using (E.11), a calculation, which is simpler than the previous one (we do not have to invert any matrix since H ′ JH is scalar), shows that the constrained minimum is at
The same results can be obtained with formula (E.10), but the computations are longer, in particular because we have to compute
It follows that the solution will be found among (a) λ = Z ,
The value of Q(λ) is 0 in case (a), in case (b), in case (c) and in case (d).
To find the solution of the constrained minimisation problem, it thus suffices to take the value λ which minimizes Q(λ) among the subset of the four vectors defined in (a)–(d) which satisfy the positivity constraints of the two last components.
We thus find the minimum at λ Λ = Z = (−2, 1, 2)′ in case (i), at
and
It follows that
The coefficient of the regression of Z 1 on Z 2 is −ω 0 . The components of the vector (Z 1 + ω 0 Z 2, Z 2) are thus uncorrelated and, this vector being Gaussian, they are independent. In particular , which gives . We thus have
Finally,
It can be seen that
is a positive semi‐definite matrix.
and the information matrix (written for simplicity in the ARCH(3) case) is equal to
This matrix is invertible (which is not the case for a general GARCH(p, q)). We finally obtain
and thus
In view of Theorem 8.1 and (8.15), the asymptotic distribution of is that of the vector λ Λ defined by
We have , thus
Since the components of the Gaussian vector (Z 1 + ω 0 Z 2, Z 2) are uncorrelated, they are independent, and it follows that
We then obtain
Let f(z 1, z 2) be the density of Z , that is, the density of a centred normal with variance (κ η − 1)J −1 . It is easy to show that the distribution of admits the density and to check that this density is asymmetric.
A simple calculation yields . From , we then obtain . And from we obtain . Finally, we obtain
The p ‐value of C * is . Since log 2{1 − Φ(x)}∼ − x 2/2 in the neighbourhood of +∞, the asymptotic slope of C * is also c *(θ) = θ 2 for θ > 0. The C and C * tests having the same asymptotic slope, they cannot be distinguished by the Bahadur approach.
We know that C is uniformly more powerful than C * . The local power of C is thus also greater than that of C * for all τ > 0. It is also true asymptotically as n → ∞, even if the sample is not Gaussian. Indeed, under the local alternatives , and for a regular statistical model, the statistic is asymptotically 풩(τ, 1) distributed. The local asymptotic power of C is thus γ(τ) = 1 − Φ(c − τ) with c = Φ−1(1 − α). The local asymptotic power of C * is γ *(τ) = 1 − Φ(c * − τ) + Φ(−c * − τ), with c * = Φ−1(1 − α/2). The difference between the two asymptotic powers is
and, denoting the 풩(0, 1) density by φ(x), we have
where
Since 0 < c < c * , we have
Thus, g(τ) is decreasing on [0, ∞). Note that g(0) > 0 and . The sign of g(τ), which is also the sign of D ′(τ), is positive when τ ∈ [0, a] and negative when τ ∈ [a, ∞), for some a > 0. The function D thus increases on [0, a] and decreases on [a, ∞). Since D(0) = 0 and , we have D(τ) > 0 for all τ > 0. This shows that, in Pitman's sense, the test C is, as expected, locally more powerful than C * in the Gaussian case, and locally asymptotically more powerful than C * in a much more general framework.
To justify the score test, we remark that the log‐likelihood constrained by H 0 is
which gives as constrained estimator of σ 2 . The derivative of the log‐likelihood satisfies
at . The first component of this score vector is asymptotically 풩(0, 1) distributed under H 0 . The third test is of course the likelihood ratio test, because the unconstrained log‐likelihood at the optimum is equal to whereas the maximal value of the constrained log‐likelihood is . Note also that under H 0 .
The asymptotic level of the three tests is of course α , but using the inequality for x > 0, we have
with almost surely strict inequalities in finite samples, and also asymptotically under H 1 . This leads us to think that the Wald test will reject more often under H 1 .
Since is invariant by translation of the X i , tends almost surely to σ 2 both under H 0 and under H 1 , as well as under the local alternatives . The behaviour of under H n (τ) is the same as that of under H 0 , and because
under H 0 , we have both under H 0 and under H n (τ). Similarly, it can be shown that under H 0 and under H n (τ). Using these two results and x/(1 + x)∼ log(1 + x) in the neighbourhood of 0, it can be seen that the statistics L n , R n and W n are equivalent under H n (τ). Therefore, the Pitman approach cannot distinguish the three tests.
Using for x in the neighbourhood of +∞, the asymptotic Bahadur slopes of the tests C 1 , C 2 and C 3 are, respectively
Clearly
Thus the ranking of the tests, in increasing order of relative efficiency in the Bahadur sense, is
All the foregoing remains valid for a regular non‐Gaussian model.
Note that Var(Z d )c corresponds to the last column of VarZ = (κ η − 1)J −1 . Thus c is the last column of J −1 divided by the (d, d)th element of this matrix. In view of Exercise 6.7, this element is . It follows that and . By (8.24), we thus have
This shows that the statistic 2/(κ η − 1)L n has the same asymptotic distribution as the Wald statistic W n , that is, the distribution in the case d 2 = 1.
The result then follows from (8.30).
Since and belong to the σ ‐field ℱ t − 1 generated by {ε u : u < t}, and since the distribution of ε t given ℱ t − 1 has the density , we have
and the result follows. We can also appeal to the general result that a score vector is centred.
Thus
when X∼풩(θ 0, σ 2), and
when X∼풩(θ, σ 2). Note that
as in Le Cam's third lemma.
and
Using the ergodic theorem, the fact that (1 − |η t | λ ) is centred and independent of the past, as well as elementary calculations of derivatives and integrals, we obtain
and
almost surely.
where the inequality is strict if σf(ησ)/f(η) is non‐constant. If this ratio of densities were almost surely constant, it would be almost surely equal to 1, and we would have
which is possible only when σ = 1.
Thus and . We then show that κ η ≔ ∫ x 4 f p (x)dx = (3 + p)(2 + p)/p(p + 1). It follows that .
To compare the ML and Laplace QML, it is necessary to normalise in such a way that E ∣ η t ∣ = 1, that is, to take the double Γ(p, p) as density f . We then obtain 1 + xf ′(x)/f(x) = p − p ∣ x∣. We always have , and we have . It follows that , which was already known from Exercise 9.6. This allows us to construct a table similar to Table 9.5.
Denoting by c any constant whose value can be ignored, we have
and thus
Now consider the second density,
We have
which gives
Consider the last instrumental density,
We have
and thus
In each case, does not depend on the parameter λ of h . We conclude that the estimators exhibit the same asymptotic behaviour, regardless of the parameter λ . It can even be easily shown that the estimators themselves do not depend on λ .
It follows that, using obvious notation,
where ϱ = ∫ ∣ x ∣ f(x)dx . Thus, using Exercise 9.9, we obtain
with .
that of the vectorial model is
that of the CCC model is
that of the BEKK model is
For and we obtain Table E.1.
and
Conversely, it is easy to check that (10.101) implies (10.100).
where has the same norm as . Assuming, for instance, that we have
Moreover, this maximum is reached at .
An alternative proof is obtained by noting that solves the maximization problem of the function under the constraint . Introduce the Lagrangian
The first‐order conditions yield the constraint and
This shows that the constrained optimum is located at a normalized eigenvector associated with an eigenvalue of , . Since , we of course have .
The first inequality of (10.69) is a simple application of the Cauchy–Schwarz inequality. The second inequality of (10.69) is obtained by twice applying the second inequality of (10.68).
We now give a second‐order stationarity condition. If exists, then this matrix is symmetric positive semi‐definite and satisfies
that is,
If is positive definite, it is then necessary to have
For the reverse we use Theorem 10.5. Since the matrices are of the form with , the condition is equivalent to (E.14). This condition is thus sufficient to obtain the stationarity, under technical condition (ii) of Theorem 10.5 (which can perhaps be relaxed). Let us also mention that, by analogy with the univariate case, it is certainly possible to obtain the strict stationarity under a condition weaker than (E.14).
when . To show the almost sure convergence, let us begin by noting that, using Hölder's inequality,
with and . Let , for , and which is defined in , a priori. Since
it follows that is almost surely defined in and is almost surely defined in . Since , we have almost surely.
and it suffices to take
The conditional covariance between the factors and , for , is
which is a nonzero constant in general.
Denoting by the th vector of the canonical basis of , we have
and we obtain the BEKK representation with ,
which shows that is an eigenvector associated with an eigenvalue of . Left‐multiplying the previous equation by , we obtain
which shows that must be the largest eigenvalue of . The vector is unique, up to its sign, provided that the largest eigenvalue has multiplicity order 1.
An alternative way to obtain the result is based on the spectral decomposition of the symmetric definite positive matrices
Let , that is, . Maximizing is equivalent to maximizing . The constraint is equivalent to the constraint . Denoting by the components of , the function is maximized at under the constraint, which shows that is the first column of , up to the sign. We also see that other solutions exist when . It is now clear that the vector contains the principal components of the variance matrix .
element by element. This shows that is diagonal, and the conclusion easily follows.
The proof is completed by induction on .
On the right‐hand side of the equality, the term in parentheses is nonnegative and the last term is positive, unless . But in this case and the term in parentheses becomes .
By It 's formula, the SDE satisfied by is
Using the hint, we then have
It follows that
The positivity follows.
this inequality being uniform on any ball of radius . The assumptions of Theorem 11.1 are thus satisfied.
It is then easy to check that the limits in (11.23) and (11.25) are null. The limiting diffusion is thus
The solution of the equation is, using Exercise 11.1 with ,
where is the initial value. It is assumed that and , in order to guarantee the positivity. We have
The result is obtained by multiplying this equality by and taking the expectation with respect to the probability .
with, in particular, In view of Exercise 11.5, we thus have
The maximum likelihood estimator of is then
It is easy to verify that . It follows that
The option buyer wishes to be covered against the risk: he thus agrees to pay more if the asset is more risky.
The property immediately follows.
We are in the framework of model (11.44), with . Thus the risk‐neutral model is given by (11.47), with and
It can easily be seen that if we have, for and for all ,
Writing
we thus obtain
and writing
we have
It follows that
Thus
There are an infinite number of possible choices for and . For instance, if , one can take and with . Then follows. The risk‐neutral probability is obtained by calculating
Under the risk‐neutral probability, we thus have the model
Note that the volatilities of the two models (under historical and risk‐neutral probability) do not coincide unless for all .
Thus, introducing the notation ,
The conditional law of is thus the distribution, and (11.58) follows.
At horizon 2, the conditional distribution of is not Gaussian if , because its kurtosis coefficient is equal to
There is no explicit formula for when .
Using (11.63), the desired equality follows.
Note that
because the two bracketed terms have the same sign. It follows that
The property is thus shown.
It follows that
We have , the inequality being strict because the distribution of is nondegenerate. In view of Theorem 2.1, this implies that a.s., and thus that a.s., when tends to infinity.
is not normal. Indeed, and , but . Similarly, the variable
is centered with variance 2, but is not normally distributed because
Note that the distribution is much more leptokurtic when is close to 0.
We have, using the independence assumptions on the sequences (η t ) and ( ),
provided that the expectation of the term between accolades exists and is finite. To show this, write
We have
Thus EZ t, ∞ < ∞. Similarly, the same arguments show that
Moreover, for all k > 0,
using again the independence between (η t ) and ( ).
where the last equality holds because η t and (h t ) are independent, and because the law of η t is symmetric. The same arguments show that for x ≥ 0. Thus, there is a one‐to‐one relation between the law of ε t and that of . In addition the law of ε t is symmetric. Similarly, for any n ≥ 1, one can show that there is a one‐to‐one relation between the law of (ε t , …, ε t + n ) and that of . When the distribution of η t is not symmetric, the fourth equality of the previous computation fails.
From the independence between (Y t ) and (Z t ), (X t ) is a second‐order process whose mean and autocovariance function are obtained as follows:
Since γ X (k) = βγ X (k − 1), ∀ k > 1, the process (X t ) admits an ARMA(1,1) representation of the form (12.7). The constant α is deduced from the first two autocovariances of (X t ). By Eq. (12.7) we have, denoting by , the variance of the noise in this representation
Hence, if , the coefficient α is a solution of
and the solution of modulus less than 1 is given by
Moreover, the variance of the noise in model (12.7) is if β ≠ 0 (and if β = 0). Finally, if the relation γ X (k) = βγ X (k − 1) also holds for k = 1 and (X t ) is an AR(1) (i.e. α = 0 in model (12.7)).
Now, when β ≠ 0 and σ ≠ 0, we get , using (E.16). It follows that either 0 < α < β < 1/α or 0 > α > β > 1/α . In particular ∣α ∣ < ∣ β∣, which shows that the orders of the ARMA(1,1) representation for X t are exact.
We also have,
Thus
for ρ ≠ 0 and ∣β ∣ < 1.
Denote by θ (1) = (0.098, 0.087, 0.84)′ and θ (2) = (0.012, 0.075, 0.919)′ the parameters of the two models. The estimated values of ω and β seem quite different. Denote by and the estimated standard deviations of the estimators of ω and β of Model Mi . It turns out that the confidence intervals
and
have empty intersection. The same holds true for the confidence intervals
and
The third graph of Figure E.4 displays the boxplot of the distribution of on 100 independent simulations of Model M1. The difference θ (2) − θ (1) between the parameters of M1 and M2 is marked by a diamond shape. The difference θ (2) − θ (1) is an outlier for the distribution of , meaning that estimated GARCH on the two periods are significantly distinct.
The number of balls in urn changes successively from odd to even, and conversely, along the steps. For instance . Thus the chain is irreducible but periodic.
Using the formula , it can be seen that is an invariant law. It follows that for all .
When the initial distribution is the Dirac mass at 0 we have when is odd, and when is even. Thus does not exist.
# one iteration of the EM algorithm
EM <- function(omega,pi0,p,y){
d<-length(omega)
n <- length(y) # y contient les n observations
vrais<-0
pit.t<-matrix(0,nrow=d,ncol=n)
pit.tm1<-matrix(0,nrow=d,ncol=n+1)
vecphi<-rep(0,d)
pit.tm1[,1]<-pi0
for (t in 1:n) {
for (j in 1:d) vecphi[j]<-{dnorm(y[t],
mean=0,sd=sqrt(abs(omega[j])))}
den<-sum(pit.tm1[,t]*vecphi)
if(den<=0)return(Inf)
pit.t[,t]<-(pit.tm1[,t]*vecphi)/den
pit.tm1[,t+1]<-t(p)%*%pit.t[,t]
vrais<-vrais+log(den)
}
pit.n<-matrix(0,nrow=d,ncol=n)
pit.n[,n]=pit.t[,n]
for (t in n:2) {
for (i in 1:d) {
pit.n[i,t-1]<- {pit.t[i,t-1]*sum(p[i,1:d]*
pit.n[1:d,t]/pit.tm1[1:d,t])}
} }
pitm1et.n<-array(0,dim=c(d,d,n))
for (t in 2:n) {
for (i in 1:d) {
for (j in 1:d) {
pitm1et.n[i,j,t]<-p[i,j]*pit.t[i,t-1]*pit.n[j,t]/pit.tm1[j,t]
} } }
omega.final<-omega
pi0.final<-pi0
p.final<-p
for (i in 1:d) {
omega.final[i]<-sum((y[1:n]∧2)*pit.n[i,1:n])/sum(pit.n[i,1:n])
pi0.final[i]<-pit.n[i,1]
for (j in 1:d) {
p.final[i,j]<-sum(pitm1et.n[i,j,2:n])/sum(pit.n[i,1:(n-1)])
} }
liss<-{list(probaliss=pit.n,probatransliss=pitm1et.n,
vrais=vrais,omega.final=omega.final,pi0.final=pi0.final,
p.final=p.final)}
}
The proof of Theorem 2.4 in Chapter 2 applies directly with this sequence (A t ), showing that there exists a strictly stationary solution if and only if the top Lyapunov exponent of (A t ) is strictly negative. The solution is then unique, non‐anticipative, and ergodic, and takes the form (2.18).
In the ARCH(1) case with d regimes, we obtain the necessary and sufficient condition
For the existence of a positive solution to this equation, it is necessary to have
Conversely, under this condition, the process
is a strictly stationary and non‐anticipative solution which satisfies
and the result follows.
The EM algorithm cannot be generalised trivially because the maximisation of Eq. (12.32) is replaced by that of
which does not admit an explicit form like (12.35) but requires the use of an optimisation algorithm.
we have
under conditions entailing the existence of the series. For the alternative model, ε t = σ t (Δ t )η t with
we have
Let ℱ t be the sigma‐field generated by the past observations {ε u , u < t}, and by the past and present value of the chain {Δ u , u ≤ t}. We have
but, given the past observations, only depends on Δ t , whereas h t depends also on {Δ u , u < t}.
This entails differences between the two models, in terms of probabilistic properties (the stationarity conditions are easier to obtain for the standard MS‐GARCH model, but they have also been obtained by Liu (2006) for the alternative model), of statistical inference (the fact that only depends on Δ t renders the alternative model much easier to estimate), and also on the dynamics behaviour and on the interpretation of the parameters.
For instance, for the MS‐GARCH, β(i) can be interpreted as a parameter of inertia of the volatility in regime i : if the volatility h t − 1 is high and β(i) is close to 1, the next volatility h t will remain high in regime i . This interpretation is no more valid for the alternative model, since may not be equal to .