It will be shown that, under mild conditions, GARCH processes are geometrically ergodic and β ‐mixing. These properties entail the existence of laws of large numbers and of central limit theorems (see Appendix A), and thus play an important role in the statistical analysis of GARCH processes. This chapter relies on the Markov chain techniques set out, for example, by Meyn and Tweedie (1996).
Recall that for a Markov chain only the most recent past is of use in obtaining the conditional distribution. More precisely, (X t ) is said to be a homogeneous Markov chain, evolving on a space E (called the state space) equipped with a σ‐field ℰ , if for all x ∈ E , and for all B ∈ ℰ ,
In this equation, P t (x, B) corresponds to the transition probability of moving from the state x to the set B in t steps. The Markov property refers to the fact that P t (x, B) does not depend on X r , r < s . The fact that this probability does not depend on s is referred to as time homogeneity. For simplicity, we write P (x, B) = P 1(x, B). The function P : E × ℰ → [0, 1] is called a transition kernel and satisfies:
The law of the process (X t ) is characterised by an initial probability measure μ and a transition kernel P . For all integers t and all (t + 1)‐tuples (B 0, …, B t ) of elements of ℰ , we set
In what follows, (X t ) denotes a Markov chain on E = ℝ d and ℰ is the Borel σ ‐field.
The Markov chain (X t ) is said to be φ ‐irreducible for a non‐trivial (that is, not identically equal to zero) measure φ on (E, ℰ), if
If (X t ) is φ ‐irreducible, it can be shown that there exists a maximal irreducibility measure, that is, an irreducibility measure M such that all the other irreducibility measures are absolutely continuous with respect to M . If M(B) = 0, then the set of points from which B is accessible is also of zero measure (see Meyn and Tweedie 1996, Proposition 4.2.2). Such a measure M is not unique, but the set
does not depend on the maximal irreducibility measure M . For a particular model, finding a measure that makes the chain irreducible may be a non‐trivial problem (but see Exercise 3.1 for an example of a time series model for which the determination of such a measure is very simple).
A φ ‐irreducible chain is called recurrent if
and is called transient if
Note that can be interpreted as the average time that the chain spends in B when it starts at x . It can be shown that a φ ‐irreducible chain (X t ) is either recurrent or transient (see Meyn and Tweedie 1996, Theorem 8.3.4). It is said that (X t ) is positive recurrent if
If a φ ‐irreducible chain is not positive recurrent, it is called null recurrent. For a φ ‐irreducible chain, positive recurrence is equivalent to the existence of a (unique) invariant probability measure (see Meyn and Tweedie 1996, Theorem 18.2.2), that is, a probability π such that
An important consequence of this equivalence is that, for Markov time series, the issue of finding strict stationarity conditions reduces to that of finding conditions for positive recurrence. Indeed, it can be shown (see Exercise 3.2) that for any chain (X t ) with initial measure μ ,
For this reason, the invariant probability is also called the stationary probability.
For a φ ‐irreducible chain, there exists a class of sets enjoying properties that are similar to those of the elementary states of a finite state space Markov chain. A set C ∈ ℰ is called a small set 1 if there exists an integer m ≥ 1 and a nontrivial measure ν on ℰ such that
In the AR(1) case, for instance it is easy to find small sets (see Exercise 3.4). For more sophisticated models, the definition is not sufficient and more explicit criteria are needed. For the so‐called Feller chains, we will see below that it is very easy to find small sets. For a general chain, we have the following criterion (see Nummelin 1984, Proposition 2.11): C ∈ ℰ + is a small set if there exists A ∈ ℰ + such that, for all B ⊂ A, B ∈ ℰ + , there exists T > 0 such that
If the chain is φ ‐irreducible, it can be shown that there exists a countable cover of E by small sets. Moreover, each set B ∈ ℰ + contains a small set C ∈ ℰ + . The existence of small sets allows us to define cycles for φ ‐irreducible Markov chains with general state space, as in the case of countable space chains. More precisely, the period is the greatest common divisor (gcd) of the set
where C ∈ ℰ + is any small set (the gcd is independent of the choice of C ). When d = 1, the chain is said to be aperiodic. Moreover, it can be shown (see Meyn and Tweedie 1996, Theorem 5.4.4) that there exist disjoint sets D 1, …, D d ∈ ℰ such that (with the convention D d + 1 = D 1 ):
A necessary and sufficient condition for the aperiodicity of (X t ) is that there exists A ∈ ℰ + such that for all B ⊂ A, B ∈ ℰ + , there exists t > 0 such that
(see Chan 1990, Proposition A1.2).
In this section, we study the convergence of the probability ℙ μ (X t ∈ ⋅) to a probability π(⋅) independent of the initial probability μ , as t → ∞.
It is easy to see that if there exists a probability measure π such that, for an initial measure μ ,
where ℙ μ (X t ∈ B) is defined in (3.2) (for (B 0, …, B t ) = (E, …, E, B)), then the probability π is invariant (see Exercise 3.3). Note also that (3.5) holds for any measure μ if and only if
On the other hand, if the chain is irreducible, aperiodic, and admits an invariant probability π, for π‐almost all x ∈ E ,
where ∥ ⋅ ∥ denotes the total variation norm 2 (see Meyn and Tweedie 1996, Theorem 14.0.1). A chain (X t ) such that the convergence (3.6) holds for all x is said to be ergodic. However, this convergence is not sufficient for mixing. We will define a stronger notion of ergodicity.
The chain (X t ) is called geometrically ergodic if there exists ρ ∈ (0, 1) such that
Geometric ergodicity entails the so‐called α ‐ and β ‐mixing. The general definition of the α ‐ and β ‐mixing coefficients is given in Appendix A.3.1. For a stationary Markov process, the definition of the α ‐mixing coefficient reduces to
where the first supremum is taken over the set of the measurable functions f and g such that ∣f ∣ ≤ 1, ∣g ∣ ≤ 1 (see Bradley 1986, 2005). A general process X = (X t ) is said to be α ‐mixing ( β ‐mixing) if α X (k) ( β X (k)) converges to 0 as k → ∞. Intuitively, these mixing properties characterise the decrease in dependence when past and future become sufficiently far apart. The α ‐mixing is sometimes called strong mixing, but β ‐mixing entails strong mixing because α X (k) ≤ β X (k) (see Appendix A.3.1).
Davydov (1973) showed that for an ergodic Markov chain (X t ), of invariant probability measure π,
It follows that β X (k) = O(ρ k ) if the convergence (3.7) holds. Thus
For particular models, it is generally not easy to directly verify the properties of recurrence, existence of an invariant probability law, and geometric ergodicity. Fortunately, there exist simple criteria on the transition kernel.
We begin by defining the notion of Feller chain. The Markov chain (X t ) is said to be a Feller chain if, for all bounded continuous functions g defined on E , the function of x defined by E(g(X t ) ∣ X t − 1 = x) is continuous. For instance, for an AR(1) we have, with obvious notation,
The continuity of the function x → g(θx + y) for all y , and its boundedness, ensure, by the Lebesgue dominated convergence theorem, that (X t ) is a Feller chain. For a Feller chain, the compact sets C ∈ ℰ + are small sets (see Feigin and Tweedie 1985).
The following theorem provides an effective way to show the geometric ergodicity (and thus the β ‐mixing) of numerous Markov processes.
This theorem will be applied to GARCH processes in the next section (see also Exercise 3.5 for a bilinear example). In Eq. (3.10), V can be interpreted as an energy function. When the chain is outside the centre A of the state space, the energy dissipates, on average. When the chain lies inside A , the energy is bounded, by the compactness of A and the continuity of V . Sometimes V is called a test function and (iii) is said to be a drift criterion.
Let us explain why these assumptions imply the existence of an invariant probability measure. For simplicity, assume that the test function V takes its values in [1, + ∞), which will be the case for the applications to GARCH models we will present in the next section. Denote by P the operator which, to a measurable function f in E , associates the function P f defined by
Let P t be the tth iteration of P , obtained by replacing P(x, dy) by P t (x, dy) in the previous integral. By convention P 0 f = f and P 0(x, A) = A . Equations (3.9) and ( 3.10) and the boundedness of V by some M > 0 on A yield an inequality of the form
where b = M − (1 − δ). Iterating this relation t times, we obtain, for x 0 ∈ A
It follows (see Exercise 3.6) that there exists a constant κ > 0 such that for n large enough,
The sequence Q n (x 0, ·) being a sequence of probabilities on (E, ℰ), it admits an accumulation point for vague convergence: there exists a measure π of mass less than 1 and a subsequence (n k ) such that for all continuous functions f with compact support,
In particular, if we take f = A in this equality, we obtain π(A) ≥ κ , thus π is not equal to zero. Finally, it can be shown that π is a probability and that (3.13) entails that π is an invariant probability for the chain (X t ) (see Exercise 3.7).
For some models, the drift criterion (iii) is too restrictive because it relies on transitions in only one step. The following criterion, adapted from Meyn and Tweedie (1996, Theorems 19.1.3, 6.2.9, and 6.2.5), is an interesting alternative relying on the transitions in n steps.
The compact C of condition (iii) can be replaced by a small set, but the function V must be bounded on C . When (X t ) is not a Feller chain, a similar criterion exists, for which it is necessary to consider such small sets (see Meyn and Tweedie 1996, Theorem 19.1.3).
We begin with the ARCH(1) process because this is the only case where the process (ε t ) is Markovian.
Consider the model
where ω > 0, α ≥ 0 and (η t ) is a sequence of iid (0, 1) variables. The following theorem establishes the mixing property of the ARCH(1) process under the necessary and sufficient strict stationarity condition (see Theorem 2.1 and (2.10)). An extra assumption on the distribution of η t is required, but this assumption is mild:
Note that this assumption includes, in particular, the standard case where f is positive over a neighbourhood of 0, possibly over all ℝ. We then have η 0 = 0. Equality (3.17) implies some (local) symmetry of the law of (η t ). This symmetry facilitates the proof of the following theorem, but it can be omitted (see Exercise 3.8).
Step (i) We have
If g is continuous and bounded, the same is true for the function x → g{ψ(x)y}, for all y . By the Lebesgue theorem, it follows that (ε t ) is a Feller chain.
Step (ii) To show the φ ‐irreducibility of the chain, for some measure φ , assume for the moment that η 0 = 0 in Assumption A. Suppose, for instance, that f is positive on [0, τ). Let φ be the restriction of the Lebesgue measure to the interval . Since , it can be seen that
It follows that the chain (ε t ) is φ ‐irreducible. In particular, φ = λ if η t has a positive density over ℝ.
The proof of the irreducibility in the case η 0 > 0 is more difficult. First note that
Now by (3.18). Thus we have
Let τ ′ ∈ (0, τ) be small enough such that
Iterating the model, we obtain that, for ε0 = x fixed,
It follows that the function
is a diffeomorphism between open subsets of ℝ t . Moreover, in view of Assumption A, the vector Y t has a density on ℝ t . The same is thus true for Z t , and it follows that, given ε0 = x ,
We now introduce the event
Assumption A implies that ℙ(Ξ t ) > 0. Conditional on Ξ t , we have
Since the bounds of the interval I t are reached, the intermediate value theorem and (3.19) entail that, given ε0 = x , has, conditionally on Ξ t , a positive density on I t . It follows that
where J t = {x ∈ ℝ ∣ x 2 ∈ I t }. Let
and let λ J be the restriction of the Lebesgue measure to J . We have
The chain (ε t ) is thus φ ‐irreducible with φ = λ J .
Step (iii) We shall use Lemma 2.2. The variable is almost surely positive and satisfies and , in view of assumption ( 3.18). Thus, there exists s > 0 such that
where The proof of Lemma 2.2 shows that we can assume s ≤ 1. Let V(x) = 1 + x 2s . Condition ( 3.9) is obviously satisfied for all x . Let 0 < δ < 1 − c and let the compact set
Since A is a nonempty closed interval with centre 0, we have φ(A) > 0. Moreover, by the inequality (a + b) s ≤ a s + b s for a, b ≥ 0 and s ∈ [0, 1] (see the proof of Corollary 2.3), we have, for x ∉ A ,
which proves condition ( 3.10). It follows that the chain (ε t ) is geometrically ergodic. Therefore, in view of property (3.8), the chain obtained with the invariant law as initial measure is geometrically β ‐mixing. The proof of the theorem is complete.
Let us consider the GARCH(1, 1) model
where ω > 0, α ≥ 0, β ≥ 0 and the sequence (η t ) is as in the previous section. In this case (σ t ) is Markovian, but (ε t ) is not Markovian when β > 0. The following result extends Theorem 3.3.
Theorem 3.4 is of interest because it provides a proof of strict stationarity which is completely different from that of Theorem 2.8. A slightly more restrictive assumption on the law of η t has been required, but the result obtained in Theorem 3.4 is stronger.
The approach developed in the case q = 1 does not extend trivially to the general case because (ε t ) and (σ t ) lose their Markov property when p > 1 or q > 1. Consider the model
where ω > 0, α i ≥ 0, i = 1, …, q , and (η t ) is defined as in the previous section. We will once again use the Markov representation
where
Recall that γ denotes the top Lyapunov exponent of the sequence {A t , t ∈ ℤ}.
A major reference on ergodicity and mixing of general Markov chains is Meyn and Tweedie (1996). For a more succinct presentation, see Chan (1990), Tjøstheim (1990), and Tweedie (2001). For survey papers on mixing conditions, see Bradley (1986, 2005). We also mention the book by Doukhan (1994) which proposes definitions and examples of other types of mixing, as well as numerous limit theorems.
For vectorial representations of the form ( 3.26), the Feller, aperiodicity and irreducibility properties were established by Cline and Pu (1998, Theorem 2.2), under assumptions on the error distribution and on the regularity of the transitions.
The geometric ergodicity and mixing properties of the GARCH(p, q) processes were established in the Ph.D. thesis of Boussama (1998), using results of Mokkadem (1990) on polynomial processes. The proofs use concepts of algebraic geometry to determine a subspace of the states on which the chain is irreducible. For the GARCH(1, 1) and ARCH(q) models we did not need such sophisticated notions. The proofs given here are close to those given in Francq and Zakoïan (2006a), which considers more general GARCH(1, 1) models. Mixing properties were obtained by Carrasco and Chen (2002) for various GARCH‐type models under stronger conditions than the strict stationarity (for example, α + β < 1 for a standard GARCH(1, 1); see their Table 1). Meitz and Saikkonen (2008a,b) showed mixing properties under mild moment assumptions for a general class of first‐order Markov models, and applied their results to the GARCH(1, 1).
The mixing properties of ARCH(∞) models are studied by Fryzlewicz and Rao (2011). They develop a method for establishing geometric ergodicity which, contrary to the approach of this chapter, does not rely on the Markov chain theory. Other approaches, for instance developed by Ango Nze and Doukhan (2004) and Hörmann (2008), aim to establish probability properties (different from mixing) of GARCH‐type sequences, which can be used to establish central limit theorems.
Given a sequence (ℰ t ) t ∈ ℕ of iid centred variables of law P ℰ which is absolutely continuous with respect to the Lebesgue measure λ on ℝ, let (X t ) t ∈ ℕ be the AR(1) process defined by
where θ ∈ ℝ.
where (ℰ t ) is as in Exercise 3.1(a), show that if
then there exists a unique strictly stationary solution and this solution is geometrically ergodic.
Hints: (i) For a function g which is continuous and positive (but not necessarily with compact support), this equality becomes
(see Meyn and Tweedie 1996, Lemma D.5.5).
(ii) For all σ ‐finite measures μ on (ℝ, ℬ(ℝ)) we have
(see Meyn and Tweedie 1996, Theorem D.3.2).
The law P η is absolutely continuous, with density f , with respect to λ . There exists τ > 0 such that
where and .