Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

10
Multivariate GARCH Processes

While the volatility of univariate series has been the focus of the previous chapters, modelling the co‐movements of several series is of great practical importance. When several series displaying temporal or contemporaneous dependencies are available, it is useful to analyse them jointly, by viewing them as the components of a vector‐valued (multivariate) process. The standard linear modelling of real‐time series has a natural multivariate extension through the framework of the vector ARMA (VARMA) models. In particular, the subclass of vector autoregressive (VAR) models has been widely studied in the econometric literature. This extension entails numerous specific problems and has given rise to new research areas (such as co‐integration).

Similarly, it is important to introduce the concept of multivariate GARCH (MGARCH) model. For instance, asset pricing and risk management crucially depend on the conditional covariance structure of the assets of a portfolio. Unlike the ARMA models, however, the GARCH model specification does not suggest a natural extension to the multivariate framework. Indeed, the (conditional) expectation of a vector of size m is a vector of size m, but the (conditional) variance is an m × m matrix. A general extension of the univariate GARCH processes would involve specifying each of the m(m + 1)/2 entries of this matrix as a function of its past values and the past values of the other entries. Given the excessive number of parameters that this approach would entail, it is not feasible from a statistical point of view. An alternative approach is to introduce some specification constraints which, while preserving a certain generality, make these models operational.

We start by reviewing the main concepts for the analysis of the multivariate time series.

10.1 Multivariate Stationary Processes

In this section, we consider a vector process (X _t)_{t ∈ ℤ} of dimension m, X _t = (X _1t,…,X _mt)^′. The definition of strict stationarity (see Chapter 1, Definition 1.1) remains valid for vector processes, while second‐order stationarity is defined as follows.

Obviously, Γ_X(h) = Γ_X(−h)^′ . In particular, Γ_X(0) = Var(X _t) is a symmetric matrix.

The simplest example of a multivariate stationary process is white noise, defined as a sequence of centred and uncorrelated variables whose covariance matrix is time‐independent.

The following property can be used to construct a stationary process by linear transformation of another stationary process.

The proof of an analogous result is given by Brockwell and Davis (1991, pp. 83–84) and the arguments used extend straightforwardly to the multivariate setting. When, in this theorem, (Z _t) is a white noise and C _k = 0 for all k < 0, (X _t) is called a vector moving average process of infinite order, VMA(∞). A multivariate extension of Wold's representation theorem (see Hannan 1970, pp. 157–158) states that if (X _t) is a stationary and purely non‐deterministic process, it can be represented as an infinite‐order moving average,

10.1

where (ε_t) is a (m × 1) white noise, B is the lag operator, , and the matrices C _k are not necessarily absolutely summable but satisfy the (weaker) condition , for any matrix norm ‖ · ‖. The following definition generalises the notion of a scalar ARMA process to the multivariate case.

Denote by det(A), or more simply ∣A∣ when there is no ambiguity, the determinant of a square matrix A . A sufficient condition for the existence of a stationary and invertible solution to the preceding equation is

(see Brockwell and Davis 1991, Theorems 11.3.1 and 11.3.2).

When p = 0, the process is called vector moving average of order q (VMA(q)); when q = 0, the process is called VAR of order p (VAR(p)).

Note that the determinant ∣Φ(z)∣ is a polynomial admitting a finite number of roots z ₁,…, z _mp . Let δ = min_i ∣ z _i ∣ > 1. The power series expansion

10.3

where A ^* denotes the adjoint of the matrix A (that is, the transpose of the matrix of the cofactors of A ), is well defined for ∣z ∣ < δ , and is such that Φ(z)⁻¹Φ(z) = I . The matrices C _k are recursively obtained by

10.4

10.2 Multivariate GARCH Models

As in the univariate case, we can define MGARCH models by specifying their first two conditional moments. An ℝ^m ‐valued GARCH process (ε_t), with ε_t = (ε_1t,…, ε_mt)^′ , must then satisfy, for all t ∈ ℤ,

10.5

The multivariate extension of the notion of the strong GARCH process is based on an equation of the form

10.6

where (η _t) is a sequence of iid ℝ^m ‐valued variables with zero mean and identity covariance matrix. The square root has to be understood in the sense of the Cholesky factorization, that is, . The matrix can be chosen to be symmetric and positive definite, ¹ but it can also be chosen to be triangular, with positive diagonal elements (see, for instance, Harville 1997, Theorem 14.5.11). The latter choice may be of interest because if, for instance, is chosen to be lower triangular, the first component of ε_t only depends on the first component of η _t . When m = 2, we can thus set

10.7

where η _it and h _{ij, t} denote the generic elements of η _t and H _t .

Note that any square integral solution (ε_t) of (10.6) is a martingale difference satisfying (10.5).

Choosing a specification for H _t is obviously more delicate than in the univariate framework because (i) H _t should be (almost surely) symmetric, and positive definite for all t ; (ii) the specification should be simple enough to be amenable to probabilistic study (existence of solutions, stationarity, etc.), while being of sufficient generality; (iii) the specification should be parsimonious enough to enable feasible estimation. However, the model should not be too simple to be able to capture the – possibly sophisticated – dynamics in the covariance structure.

Moreover, it may be useful to have the so‐called stability by aggregation property. If ε_t satisfies ( 10.5), the process defined by , where P is an invertible square matrix, is such that

10.8

The stability by aggregation of a class of specifications for H _t requires that the conditional variance matrices belong to the same class for any choice of P . This property is particularly relevant in finance because if the components of the vector ε_t are asset returns, is a vector of portfolios of the same assets, each of its components consisting of amounts (coefficients of the corresponding row of P ) of the initial assets.

10.2.1 Diagonal Model

A popular specification, known as the diagonal representation, is obtained by assuming that each element h _{kℓ, t} of the covariance matrix H _t is formulated in terms only of the product of the prior k and ℓ returns. Specifically,

with ω _kℓ = ω _ℓk , , for all (k, ℓ). For m = 1, this model coincides with the usual univariate formulation. When m > 1, the model obviously has a large number of parameters and will not in general produce positive definite covariance matrices H _t . We have

where ⊙ denotes the Hadamard product, that is, the element by element product. ² Thus, in the ARCH case ( p = 0), sufficient positivity conditions are that Ω is positive definite and the A ⁽ⁱ⁾ are positive semi‐definite, but these constraints do not easily generalise to the GARCH case. We shall give further positivity conditions obtained by expressing the model in a different way, viewing it as a particular case of a more general class.

It is easy to see that the model is not stable by aggregation: for instance, the conditional variance of ε_{1, t} + ε_{2, t} can in general be expressed as a function of the and , but not of the (ε_{1, t − i} + ε_{2, t − i})² . A final drawback of this model is that there is no interaction between the different components of the conditional covariance, which appears unrealistic for applications to financial series.

In what follows, we present the main specifications introduced in the literature, before turning to the existence of solutions. Let η denote a probability distribution on ℝ^m , with zero mean and unit covariance matrix.

10.2.2 Vector GARCH Model

The vector GARCH (VEC‐GARCH) model is the most direct generalisation of univariate GARCH: every conditional covariance is a function of lagged conditional variances as well as lagged cross‐products of all components. In some sense, everything is explained by everything, which makes this model not only very general but also not very parsimonious.

Denote by vech(·) the operator that stacks the columns of the lower triangular part of its argument square matrix (if A = (a _ij), then vech(A) = (a ₁₁, a ₂₁,…, a _m1, a ₂₂,…, a _m2,…, a _mm)^′ ). The next definition is a natural extension of the standard GARCH( p, q ) specification.

Positivity Conditions

To ensure the positive semi-definiteness of H_t , the initial values and Ω (where ω = vech(Ω)) have to be positive semi‐definite, and the matrices A⁽ⁱ⁾ and B^(j) need to map the vectorized positive semi‐definite matrices into themselves. To be more specific, a generic element of

is denoted by h _{kℓ, t} ( k ≥ ℓ), and we will denote by ( ) the entry of A ⁽ⁱ⁾ ( B ^(j) ) located on the same row as h _{kℓ, t} and belonging to the same column as the element of . We thus have an expression of the form

Denoting by the m × m symmetric matrix with (k ^′, ℓ^′)th entry , for k ^′ ≠ ℓ^′ , and the elements on the diagonal, the preceding equality is written as

10.12

In order to obtain a more compact form for the last part of this expression, let us introduce the spectral decomposition of the symmetric matrices H _t , assumed to be positive semi‐definite. We have , where is an orthogonal matrix of eigenvectors associated with the (positive) eigenvalues of H _t . Defining the matrices by analogy with the , we get

10.13

Finally, consider the m ² × m ² matrix admitting the block form , and let . The preceding expressions are equivalent to

10.14

where Ω is the symmetric matrix such that vech(Ω) = ω .

In this form, it is evident that the assumption

10.15

ensures that if the H _t − j are almost surely positive definite, then so is H _t .

10.2.3 Constant Conditional Correlations Models

Suppose that, for a MGARCH process of the form ( 10.6), all the past information on ε_kt , involving all the variables ε_{ℓ, t − i} , is summarised in the variable h _{kk, t} , with . Then, letting , we define for all k a sequence of iid variables with zero mean and unit variance. The variables are generally correlated, so let , where . The conditional variance of

is then written as

10.16

By construction, the conditional correlations between the components of ε_t are time‐invariant:

To complete the specification, the dynamics of the conditional variances h _{kk, t} has to be defined. The simplest constant conditional correlations (CCC) model relies on the following univariate GARCH specifications:

10.17

where ω _k > 0, a _{k, i} ≥ 0, b _{k, j} ≥ 0, −1 ≤ ρ _kℓ ≤ 1, ρ _kk = 1, and R is symmetric and positive semi‐definite. Observe that the conditional variances are specified as in the diagonal model. The conditional covariances clearly are not linear in the squares and cross products of the returns.

In a multivariate framework, it seems natural to extend the specification (10.17) by allowing h _{kk, t} to depend not only on its own past, but also on the past of all the variables ε_{ℓ, t} . Set

We have where is a centred vector with covariance matrix R . The components of ε_t thus have the usual expression, , but the conditional variance h _{kk, t} depends on the past of all the components of ε_t .

Note that the conditional covariances are generally non‐linear functions of the components of and of past values of the components of H _t . Model (10.18) is thus not a VEC‐GARCH model, defined by ( 10.9), except when R is the identity matrix.

One advantage of this specification is that a simple condition ensuring the positive definiteness of H _t is obtained through the positive coefficients for the matrices A _i and B _j and the choice of a positive definite matrix for R . Conrad and Karanasos (2010) showed that less restrictive assumptions ensuring the positive definiteness of H_t can be found. Moreover there exists a representation of the CCC model in which the matrices B_j are diagonal. Another advantage of the CCC specification is that the study of the stationarity is remarkably simple.

Two limitations of the CCC model are, however, (i) its non‐stability by aggregation and (ii) the arbitrary nature of the assumption of constant conditional correlations.

10.2.4 Dynamic Conditional Correlations Models

Dynamic conditional correlations GARCH (DCC‐GARCH) models are an extension of CCC‐GARCH, obtained by introducing a dynamic for the conditional correlation. Hence, the constant matrix R in Definition 10.4 is replaced by a matrix R _t which is measurable with respect to the past variables {ε_u, u < t}. For reasons of parsimony, it seems reasonable to choose diagonal matrices A _i and B _i in ( 10.18), corresponding to univariate GARCH models for each component as in ( 10.17). Different DCC models are obtained depending on the specification of R _t . A simple formulation is

10.19

where the θ _i are positive weights summing to 1, R is a constant correlation matrix, and Ψ_t − 1 is the empirical correlation matrix of ε_t − 1,…, ε_t − M . The matrix R _t is thus a correlation matrix (see Exercise 10.9). Equation (10.19) is reminiscent of the GARCH(1, 1) specification, θ ₁ R playing the role of the parameter ω , θ ₂ that of α , and θ ₃ that of β .

Another way of specifying the dynamics of R _t is by setting

where diag Q _t is the diagonal matrix constructed with the diagonal elements of Q _t , and Q _t is a sequence of covariance matrices which is measurable with respect to σ(ε_u, u < t). In the original DCC model of Engle (2002), the dynamics of is given by

where denotes the vector of standardized returns, , and is a positive‐definite matrix.

Matrix turns out to be difficult to interpret, and Aielli (2013) pointed out that in general. Thus, the commonly used estimator of defined as the sample second moment of the standardized returns is not consistent in this formulation.

In the so‐called corrected DCC (cDCC) model of Aielli (2013), the dynamics of is reformulated as

under the same constraints on the coefficients. In this model, under stationarity conditions, .

Multiplying the left‐hand side and the right‐hand side of by an arbitrary positive‐definite matrix yields the same conditional correlation matrix . It is thus necessary to introduce an identifiability condition, as for instance imposing that be a correlation matrix.

10.2.5 BEKK‐GARCH Model

The BEKK acronym refers to a specific parameterisation of the MGARCH model developed by Baba, Engle, Kraft, and Kroner, in a preliminary version of Engle and Kroner (1995).

The specification obviously ensures that if the matrices, H _t − i , i = 1,…, p , are almost surely positive definite, then so is H _t .

To compare this model with the representation ( 10.9), let us derive the vector form of the equation for H _t . Using the relations (10.10) and (10.11), we get

The model can thus be written in the VEC‐GARCH(p,q) form ( 10.9), with

10.21

for i = 1,…, q and j = 1,…, p . In particular, it can be seen that the number of coefficients of a matrix A ⁽ⁱ⁾ in ( 10.9) is [m(m + 1)/2]² , whereas it is Km ² in this particular case. However, the converse is not true. Stelzer (2008) showed that, for m ≥ 3, there exist VEC-GARCH models that cannot be represented in the BEKK form.

The BEKK class contains (Exercise 10.13) the diagonal models obtained by choosing diagonal matrices A _ik and B _jk . The following theorem establishes a converse to this property.

Example 10.3 A general and identifiable BEKK representation

Consider the case m = 2, q = 1, and p = 0. Suppose that the distribution η is non‐degenerate, so that there exists no non‐trivial constant linear combination of a finite number of the ε_{k, t − i}ε_{ℓ, t − i} . Let

where Ω is a symmetric positive definite matrix,

with a _{11, 1} ≥ 0, a _{12, 3} ≥ 0, a _{21, 2} ≥ 0 and a _{22, 4} ≥ 0.

Let us show that this BEKK representation is both identifiable and quite general. Easy, but tedious, computation shows that an expression of the form ( 10.9) holds with

In view of the sign constraint, the (1, 1)th element of A ⁽¹⁾ allows us to identify a _{11, 1} . The (1, 2)th and (2, 1)th elements then allow us to find a _{12, 1} and a _{21, 1} , whence the (2, 2)th element yields a _{22, 1} . The two elements of A ₃ are deduced from the (1, 3)th and (2, 3)th elements of A ⁽¹⁾ , and from the constraint a _{12, 3} ≥ 0 (which could be replaced by a constraint on the sign of a _{22, 3} ). A ₂ is identified similarly, and the non‐zero element of A ₄ is finally identified by considering the (3, 3)th element of A ⁽¹⁾ .

In this example, the BEKK representation contains the same number of parameters as the corresponding VEC representation, but has the advantage of automatically providing a positive definite solution H _t .

It is interesting to consider the stability by aggregation of the BEKK class.

As in the univariate case, the ‘square’ of the (ε_t) process is the solution of an ARMA model. Indeed, define the innovation of the process :

10.22

Applying the vec operator, and substituting the variables vec(H _t − j) in the model of Definition 10.5 by , we get the representation

10.23

where r = max(p, q), with the convention A _ik = 0 ( B _jk = 0) if i > q ( j > p ). This representation cannot be used to obtain stationarity conditions because the process (ν _t) is not iid in general. However, it can be used to derive the second‐order moment, when it exists, of the process ε_t as

that is,

provided that the matrix in braces is non‐singular.

10.2.6 Factor GARCH Models

In these models, it is assumed that a non‐singular linear combination f _t of the m components of ε_t , or an exogenous variable summarising the co‐movements of the components, has a GARCH structure.

Factor Models with Idiosyncratic Noise

A very popular factor model links individual returns ε_it to the market return f _t through a regression model

10.24

The parameter β _i can be interpreted as a sensitivity to the factor, and the noise η _it as a specific risk (often called idiosyncratic risk) which is conditionally uncorrelated with f _t . It follows that H _t = Ω + λ _t β β ^′ , where β = (β₁,…, β_m)^′ is the vector of sensitivities, λ _t is the conditional variance of f _t , and Ω is the covariance matrix of the idiosyncratic terms. More generally, assuming the existence of r conditionally uncorrelated factors, we obtain the decomposition

10.25

It is not restrictive to assume that the factors are linear combinations of the components of ε_t (Exercise 10.10). If, in addition, the conditional variances λ _jt are specified as univariate GARCH, the model remains parsimonious in terms of unknown parameters and (10.25) reduces to a particular BEKK model (Exercise 10.11). If Ω is chosen to be positive definite and if the univariate series (λ _jt)_t , j = 1,…, r are independent, strictly and second‐order stationary, then it is clear that ( 10.25) defines a sequence of positive definite matrices (H _t) that are strictly and second‐order stationary.

Principal Components GARCH Model

The concept of factor is central to principal components analysis (PCA) and to other methods of exploratory data analysis. PCA relies on decomposing the covariance matrix V of m quantitative variables as V = PΛP ^′ , where Λ is a diagonal matrix whose elements are the eigenvalues λ ₁ ≥ λ ₂ ≥ ⋯ ≥ λ _m of V , and where P is the orthonormal matrix of the corresponding eigenvectors. The first principal component is the linear combination of the m variables, with weights given by the first column of P, which, in some sense, is the factor which best summarises the set of m variables (Exercise 10.12). There exist m principal components, which are uncorrelated and whose variances λ ₁,…, λ _m (and hence whose explanatory powers) are in decreasing order. It is natural to consider this method for extracting the key factors of the volatilities of the m components of ε_t .

We obtain a principal component GARCH (PC‐GARCH) or orthogonal GARCH (O‐GARCH) model by assuming that

10.26

where P is an orthogonal matrix ( P ^′ = P ⁻¹ ) and Λ_t = diag(λ _1t,…, λ _mt), where the λ _it are the volatilities, which can be obtained from univariate GARCH‐type models. This is equivalent to assuming

10.27

where f _t = P ^′ε_t is the principal component vector, whose components are orthogonal factors. If univariate GARCH(1, 1) models are used for the factors , then

10.28

Remark 10.4 Interpretation, factor estimation, and extensions

Model (10.26) can also be interpreted as a full‐factor GARCH (FF‐GARCH) model, that is, a model with as many factors as components and no idiosyncratic terms. Let P(⋅, j) be the j th column of P (an eigenvector of H _t associated with the eigenvalue λ _jt ). We get a spectral expression for the conditional variance,

which is of the form ( 10.25) with an idiosyncratic variance Ω = 0.
A PCA of the conditional variance H _t should, in full generality, give with factors (that is, principal components) . Model ( 10.26) thus assumes that all factors are linear combinations, with fixed coefficients, of the same returns ε_it . For instance, the first factor f _1t is the conditionally most risky factor (with the largest conditional variance λ _1t , see Exercise 10.12). But since it is assumed that the direction of f _1t is fixed, in the subspace of ℝ^m generated by the components of ε_it , the first factor is also the most risky unconditionally. This can be seen through the PCA of the unconditional variance H = EH _t = PΛP ^′ , which is assumed to exist.
It is easy to estimate P by applying PCA to the empirical variance , where . The components of are specified as GARCH‐type univariate models. Estimation of the conditional variance thus reduces to estimating m univariate models.
It is common practice to apply PCA on centred and standardised data, in order to remove the influence of the units of the various variables. For returns ε_it , standardisation does not seem appropriate if one wishes to retain a size effect, that is, if one expects an asset with a relatively large variance to have more weight in the riskier factors.
In the spirit of the standard PCA, it is possible to only consider the first r principal components, which are the key factors of the system. The variance H _t is thus approximated by
10.29

where the is estimated from simple univariate models, such as GARCH(1, 1) models of the form (10.28), the matrix is obtained from PCA of the empirical covariance matrix , and the factors are approximated by . Instead of the approximation (10.29), one can use

10.30

The approximation in (10.30) is as simple as ( 10.29) and does not require additional computations (in particular, the r GARCH equations are retained) but has the advantage of providing an almost surely invertible estimation of H _t (for fixed n ), which is required in the computation of certain statistics (such as the AIC‐type information criteria based on the Gaussian log‐likelihood).
Note that the assumption that P is orthogonal can be restrictive. The class of generalised orthogonal GARCH (GO‐GARCH) processes assumes only that P is any non‐singular matrix.

10.2.7 Cholesky GARCH

Suppose that the conditional covariance matrix H _t of ε_t is positive‐definite, i.e. that the components of ε_t are not multicolinear. Given the information ℱ_t − 1 generated by the past values of ε_t , let ℓ_{21, t} be the conditional beta in the regression of ε_2t on v _1t ≔ ε_1t . One can write

with β _{21, t} = ℓ_{21, t} ∈ ℱ_t − 1 , and υ _2t is orthogonal to ε_1t conditionally on ℱ_t − 1 . More generally, we have

10.31

where υ _it is uncorrelated with υ _1t,…, υ _{i − 1, t} , and thus uncorrelated with ε_1t,…, ε_{i − 1, t} , conditionally on ℱ_t − 1 . In matrix form, Eq. (10.31) is written as

where L _t and are lower unitriangular (i.e. triangular with 1 on the diagonal) matrices, with ℓ_{ij, t} (respectively −β _{ij, t} ) at row i and column j of L _t (respectively B _t ) for i > j . For m = 3, we have

The vector υ _t of the error terms in the linear regressions ( 10.31) can be interpreted as a vector of orthogonal factors, whose covariance matrix is G _t = diag(g _1t,…, g _mt) with g _it > 0 for i = 1,…, m . We then obtain the so‐called Cholesky decomposition of the covariance matrix of ε_t :

10.32

Note that the Cholesky decomposition also extends to positive semi‐definite matrices.

Taking , we obtain . A simple parametric form of the volatility of the i th factor υ _it could be of the GARCH(1,1)‐type

10.33

In view of the results of the Chapter 2, a necessary and sufficient condition for the existence of a strictly stationary process (υ _t) in this parameterisation is

10.34

Of course, Eq. (10.33) could also include explanatory variables with j ≠ i , or other variables belonging .

To obtain a complete parametric specification for H _t , it now suffices to specify a time series model for the conditional betas, i.e. for the elements of L _t (or alternatively for those of B _t ). Note that this model can be quite general because, to ensure the positive‐definiteness of H _t , the conditional betas are not a priori subject to any constraint.

For instance, one can assume a dynamics of the form

10.35

where f _ij is a real‐valued function, depending on υ _t − 1 and on some parameter θ , and c _ij is a real coefficient. Under the conditions ∣c _ij ∣ < 1, and the condition (10.34) ensuring the existence of the stationary sequence (υ _t), there exists a stationary solution to Eq. (10.35). The Cholesky decomposition (10.32) then provides a model for the conditional covariance matrix H _t .

The absence of strong constraints on the coefficients constitutes an attractive feature of the Cholesky decomposition ( 10.32), compared in particular to the DCC decomposition for which the unexplicit constraints of positive definiteness of R _t render the determination of stationarity conditions challenging. Another obvious interest of the Cholesky approach is to provide a direct way to predict the conditional betas, which appear to be of primary interest for particular financial applications (see Engle 2016).

Now, let us briefly explore the relationships between the Cholesky decomposition and the factor models. Suppose that the first k < m columns of the matrix L _t are constant so that L _t = [P : P _t] with P a non‐random matrix of size m × k and P _t of size m × (m − r). The Cholesky GARCH can then be interpreted as a factor model:

where f _t = (υ _1t,…, υ _kt)^′ is a vector of k orthogonal factors, P is called the loading matrix and u_t = P _t(υ _{k + 1, t},…, υ _mt)^′ is a so‐called idiosyncratic disturbance (independent of f _t ). If ε_t is a vector of excess returns (i.e. each component is the return of a risky asset minus the return of a risk free asset), and if the first component is the excess return of the market portfolio, in the one‐factor case ( k = 1) the previous factor model corresponds to the Capital Asset Pricing Model (CAPM) of Sharpe (1964), Lintner (1965), and Merton (1973). More precisely, we have P = (1, β ^′)^′ , where β represents the ‘sensitivity of returns to market returns’, and . Denoting by r _t = (ε_2t,…, ε_mt)^′ the vector of excess returns, the last (m − 1) lines of the one‐factor model gives the CAPM equation

where υ _1t is the market excess return.

10.3 Stationarity

In this section, we will first discuss the difficulty of establishing stationarity conditions, or the existence of moments, for MGARCH models. For the general vector model ( 10.9), and in particular for the BEKK model, there exist sufficient stationarity conditions. The stationary solution being non‐explicit, we propose an algorithm that converges, under certain assumptions, to the stationary solution. We will then see that the problem is much simpler for the CCC model ( 10.18).

10.3.1 Stationarity of VEC and BEKK Models

It is not possible to provide stationary solutions, in explicit form, for the general VEC model ( 10.9). To illustrate the difficulty, recall that a univariate ARCH(1) model admits a solution ε_t = σ _t η _t with σ _t explicitly given as a function of {η _t − u, u > 0} as the square root of

provided that the series converges almost surely. Now consider a bivariate model of the form ( 10.6) with where α is assumed, for the sake of simplicity, to be scalar and positive. Also choose to be a lower triangular so as to have Eq. (10.7). Then

It can be seen that given η _t − 1 , the relationship between h _{11, t} and h _{11, t − 1} is linear, and can be iterated to yield

under the constraint . In contrast, the relationships between h _{12, t} , or h _{22, t} , and the components of H _t − 1 are not linear, which makes it impossible to express h _{12, t} and h _{22, t} as a simple function of α , {η _t − 1, η _t − 2,…, η _t − k} and H _t − k for k ≥ 1. This constitutes a major obstacle for determining sufficient stationarity conditions.

Remark 10.5 Stationarity does not follow from the ARMA model

Similar to (10.22), letting , we obtain the ARMA representation

by setting C ⁽ⁱ⁾ = A ⁽ⁱ⁾ + B ⁽ⁱ⁾ and by using the usual notation and conventions. In the literature, one may encounter the argument that the model is weakly stationary if the polynomial has all its roots outside the unit circle ( s = m(m + 1)/2). Although the result is certainly true with additional assumptions on the noise density (see Theorem 10.5 and the subsequent discussion), the argument is not correct since

constitutes a solution only if can be expressed as a function of {η _t − u, u > 0}.

Boussama, Fuchs and Stelzer (2011) (see also Boussama (2006)) obtained the following stationarity condition. Recall that ρ(A) denotes the spectral radius of a square matrix A .

In the particular case of the BEKK model of Definition 10.5, condition (iii) takes the form

The proof of Theorem 10.5 relies on sophisticated algebraic tools. Assumption (ii) is a standard technical condition for showing the β ‐mixing property (but is of no use for stationarity). Note that condition (iii), written as in the univariate case, is generally not necessary for the strict stationarity.

This theorem does not provide explicit stationary solutions, that is, a relationship between ε_t and the η _t − i . However, it is possible to construct an algorithm which, when it converges, allows a stationary solution to the vector GARCH model ( 10.9) to be defined.

Construction of a Stationary Solution

For any t, k ∈ ℤ, we define

and, recursively on k ≥ 0,

10.36

with .

Observe that, for k ≥ 1,

where f _k is a measurable function and H ^(k) is a square matrix. The processes and are thus stationary with components in the Banach space L ² of the (equivalence classes of) square integral random variables. It is then clear that Eq. ( 10.9) admits a strictly stationary solution, which is non‐anticipative and ergodic, if, for all t ,

10.37

Indeed, letting and , and taking the limit of each side of (10.36), we note that ( 10.9) is satisfied. Moreover, (ε_t) constitutes a strictly stationary and non‐anticipative solution, because ε_t is a measurable function of {η _u, u ≤ t}. In view of Theorem A.1, such a process is also ergodic. Note also that if H _t exists, it is symmetric and positive definite because the matrices are symmetric and satisfy

This solution (ε_t) is also second‐order stationary if

10.38

Let

From Exercise 10.8 and its proof, we obtain (10.37), and hence the existence of strictly stationary solution to the vector GARCH ( 10.9), if there exists ρ ∈ (0, 1) such that almost surely as k → ∞, which is equivalent to

10.39

Similarly, we obtain (10.38) if . The criterion in (10.39) is not very explicit but the left‐hand side of the inequality can be evaluated by simulation, just as for a Lyapunov coefficient.

10.3.2 Stationarity of the CCC Model

In model ( 10.18), letting , we get

Multiplying by ϒ_t the equation for , we thus have

which can be written

10.40

where

and

10.41

is a (p + q)m × (p + q)m matrix.

We obtain a vector representation, analogous to (2.16) obtained in the univariate case. This allows us to state the following result.

Proof.

The proof is similar to that of Theorem 2.4. The variables η _t admitting a variance, the condition E log⁺‖A _t‖ < ∞ is satisfied.

It follows that when γ < 0, the series

10.42

converges almost surely for all t . A strictly stationary solution to model ( 10.18) is obtained as , where denotes the (q + 1)th sub‐vector of size m of . This solution is thus non‐anticipative and ergodic. The proof of the uniqueness is exactly the same as in the univariate case.

The proof of the necessary part can also be easily adapted. From Lemma 2.1, it is sufficient to prove that . It suffices to show that, for 1 ≤ i ≤ p + q ,

10.43

where and e _i is the i th element of the canonical basis of ℝ^p + q , since any vector x of ℝ^m(p + q) can be uniquely decomposed as , where x _i ∈ ℝ^m . As in the univariate case, the existence of a strictly stationary solution implies that tends to 0, almost surely, as k → ∞. It follows that, using the relation we have

10.44

Since the components of are strictly positive, condition (10.43) thus holds for i = q + 1. Using

10.45

with the convention that , for i = 1 we obtain

where the inequalities are taken componentwise. Therefore, condition ( 10.43) holds true for i = q + 2, and by induction, for i = q + j, j = 1,…, p in view of Eq. (10.45). Since with positive probability, the first convergence in (10.44) implies that ( 10.43) holds for . Since for , we can show by an ascendent recursion that ( 10.43) holds for . Finally, since , ( 10.43) also holds for .

The following result provides a necessary strict stationarity condition which is simple to check.

10.3.3 Stationarity of DCC models

Stationarity conditions for the DCC model, with the specification (10.19) for R _t, have been established by Fermanian and Malongo (2016). However, except in very specific cases, such conditions are non explicit. The corrected DCC model allows for more tractable conditions. Sufficient stationarity conditions – based on the results by Boussama, Fuchs and Stelzer (2011) – are provided for the cDDD model by Aielli (2013).

10.4 QML Estimation of General MGARCH

In this section, we define the quasi‐maximum likelihood estimator (QMLE) of a general MGARCH model, and we provide high‐level assumptions entailing its strong consistency and asymptotic normality (CAN). In the next section, these assumptions will be made more explicit for the particular case of the CCC‐GARCH model.

Assume a general parametric conditional variance H _t = H _t(θ ₀), with an unknown d ‐dimensional parameter θ ₀ . Let Θ be a compact parameter space which contains θ ₀ . For all θ = (θ ₁,…, θ _d)^′ ∈ Θ, assume that

10.46

For particular MGARCH models, it is possible to write H _t as a measurable function of {η _u, u < t}, which entails that (ε_t) is stationary and ergodic (see the ergodic theorem, Theorem A.1). In (10.46), the matrix H _t(θ) is written as a function of the past observations. A model satisfying this requirement is said to be invertible at θ . For prediction purposes, it is obviously necessary that a model of parameter θ ₀ be invertible at θ ₀ . For estimating θ ₀ , it seems also crucial that the model possesses this property at any θ ∈ Θ, since the estimation methods are typically based on comparisons between and H _t(θ), for all observations ε_t and all values of θ ∈ Θ.

Given observations ε₁,…, ε_n , and arbitrary fixed initial values for i ≤ 0, let the statistics

A QMLE of θ ₀ is defined as any measurable solution of

10.47

Note that the QMLE would be simply the maximum likelihood estimation if the conditional distribution of ε_t was Gaussian with mean zero and variance H _t .

10.4.1 Asymptotic Properties of the QMLE

Let ,

Recall that ρ denotes a generic constant belonging to [0, 1), and K denotes a positive constant or a positive random variable measurable with respect to {ε_u, u < 0} (and thus which does not depend on n ). The regularity conditions (which do not depend on the choice of the norms) are the following.

A1: a.s.
A2: a.s.
A3: E‖ε_t ^s‖ < ∞ and E‖H _t(θ ₀)‖^s < ∞ for some s > 0.
A4: For θ ∈ Θ, H _t(θ) = H _t(θ ₀) a.s. implies θ = θ ₀ .
A5: For any sequence x ₁, x ₂,… of vectors of ℝ^m , the function θ ↦ H(x ₁, x ₂,…; θ) is continuous on Θ.
A6: θ ₀ belongs to the interior of Θ.
A7: For any sequence x ₁, x ₂,… of vectors of ℝ^m , the function θ ↦ H(x ₁, x ₂,…; θ) admits continuous second‐order derivatives.
A8: For some neighbourhood V(θ ₀) of θ ₀ ,
A9: For some neighbourhood V(θ ₀) of θ ₀ , for all i, j ∈ {1,…, m} and p > 1, q > 2 and r > 2 such that 2q ⁻¹ + 2r ⁻¹ = 1 and p ⁻¹ + 2r ⁻¹ = 1, we have ⁵
A10: E ∥ η _t∥⁴ < ∞.
A11: The matrices {∂ H _t(θ ₀)/∂θ _i, i = 1,…, d} are linearly independent with non‐zero probability.

In the case m = 1, Assumption A1 requires that the volatility be bounded away from zero uniformly on Θ. For a standard GARCH satisfying ω > 0 and Θ compact, the assumption is satisfied. Assumption A2 is related to the invertibility and entails that H _t(θ) is well estimated by the statistic when t is large. For some models it will be useful to replace A2 by the weaker, but more complicated, assumption

A2': where ρ _t is a random variable satisfying , for some p > 1 and s ∈ (0, 1) satisfying A3.

This assumption, as well as A8, will be used to show that the choice of the initial values is asymptotically unimportant. Corollary 10.2 shows that, for some particular models, the strict stationarity implies the existence of marginal moments, as required in A3. Assumption A4 is an identifiability condition. As shown in Section 8.2, A6 is necessary to obtain the asymptotic normality (AN) of the QMLE. Assumptions A9 and A10 are used to show the existence of the information matrices I and J involved in the sandwich form J ⁻¹ IJ ⁻¹ of the asymptotic variance of the QMLE. Assumption A11 is used to show the invertibility of J .

Remark 10.6 Relative usefulness of (too) general CAN results

The interest of the previous theorem is to highlight that, under a set of regularity conditions that seem reasonable, the QMLE of a GARCH model is CAN. The section devoted to the CCC model will reveal, however, that the regularity conditions are not always straightforwardly verifiable on particular specifications. Moreover, one has to be aware that the MLE/QMLE is not always consistent. The statistical literature provides numerous examples of inconsistent MLE (see e.g. Le Cam 1990). In the framework of GARCH models, checking the irrelevance of the initial value is an important point that should not be neglected. The point is related to the invertibility. As seen in Chapter 4, for some stationary EGARCH processes, Assumption A2 (or even A2^′) may not hold and the MLE may fail.

10.5 Estimation of the CCC Model

We now turn to the estimation of the m ‐dimensional CCC‐GARCH(p, q) model by the quasi‐maximum likelihood method. Recall that (ε_t) is called a CCC‐GARCH(p, q) if it satisfies

10.50

where R is a correlation matrix, is a vector of size m × 1 with strictly positive coefficients, the A _i and B _j are matrices of size m × m with positive coefficients, and (η _t) is a sequence of iid centred variables in ℝ^m with identity covariance matrix.

As in the univariate case, the criterion is written as if the iid process were Gaussian.

The parameters are the coefficients of the matrices , A _i and B _j , and the coefficients of the lower triangular part (excluding the diagonal) of the correlation matrix R = (ρ _ij). The number of unknown parameters is thus

The parameter vector is denoted by

where ρ ^′ = (ρ ₂₁,…, ρ _m1, ρ ₃₂,…, ρ _m2,…, ρ _{m, m − 1}), α _i = vec(A _i), i = 1,…, q, and β _j = vec(B _j), j = 1,…, p . The parameter space is a subspace Θ of

The true parameter value is denoted by

Before detailing the estimation procedure and its properties, we discuss the conditions that need to be imposed on the matrices A _i and B _j in order to ensure the uniqueness of the parameterisation.

10.5.1 Identifiability Conditions

Let By convention, if q = 0 and ℬ_θ(z) = I _m if p = 0.

If ℬ_θ(z) is non‐singular, that is, if the roots of det(ℬ_θ(z)) = 0 are outside the unit disk, we deduce from the representation

10.51

In the vector case, assuming that the polynomials and have no common root is insufficient to ensure that there exists no other pair , with the same degrees (p, q), such that

10.52

This condition is equivalent to the existence of an operator U(B) such that

10.53

this common factor vanishing in ℬ _θ(B)⁻¹ 풜 _θ(B) (Exercise 10.2).

The polynomial U(B) is called unimodular if det{U(B)} is a non‐zero constant. When the only common factors of the polynomials P(B) and Q(B) are unimodular, that is, when

then P(B) and Q(B) are called left coprime.

The following example shows that, in the vector case, assuming that and are left coprime is insufficient to ensure that condition (10.52) has no solution θ ≠ θ ₀ (in the univariate case this is sufficient because the condition imposes U(B) = U(0) = 1).

Example 10.4 Non‐identifiable bivariate model

For m = 2, let

with

and

The polynomial has the same degree q as , and is a polynomial of the same degree p as . On the other hand, U(B) has a nonzero determinant which is independent of B , hence it is unimodular. Moreover, and It is thus possible to find θ such that ℬ(B) = ℬ _θ(B), 풜(B) = 풜 _θ(B) and . The model is thus non‐identifiable, θ and θ ₀ corresponding to the same representation (10.51).

Identifiability can be ensured by several types of conditions; see Reinsel (1997, pp. 37–40), Hannan (1976) or Hannan and Deistler (1988, Section 2.7). To obtain a mild condition define, for any column i of the matrix operators 풜 _θ(B) and ℬ _θ(B), the maximal degrees q _i(θ) and p _i(θ), respectively. Suppose that maximal values are imposed for these orders, that is,

10.54

where q _i ≤ q and p _i ≤ p are fixed integers. Denote by (resp. ) the column vector of the coefficients of (resp. ) in the i th column of (resp. ).

10.5.2 Asymptotic Properties of the QMLE of the CCC‐GARCH model

For particular classes of MGARCH models, the assumptions of the previous section can be made more explicit. Let us consider the CCC‐GARCH model. Let (ε₁,…, ε_n) be an observation of length n of the unique non‐anticipative and strictly stationary solution (ε_t) of model (10.50). Conditionally on non‐negative initial values , the Gaussian quasi‐likelihood is written as

where the are recursively defined, for t ≥ 1, by

A QMLE of θ is defined as in ( 10.47) by:

10.56

Remark 10.7 Choice of initial values

It will be shown later that, as in the univariate case, the initial values have no influence on the asymptotic properties of the estimator. These initial values can be fixed, for instance, so that

They can also be chosen as functions of θ , such as

or as random variable functions of the observations, such as

where the first r = max {p, q} observations are denoted by ε_1 − r,…, ε₀ .

Let γ(A ₀) denote the top Lyapunov coefficient of the sequence of matrices A ₀ = ( A _0t) defined as in ( 10.41), at θ = θ ₀ . The following assumptions will be used to establish the strong consistency of the QMLE.

CC1: θ ₀ ∈ Θ and Θ is compact.
CC2: γ(A ₀) < 0 and, for all θ ∈ Θ , det ℬ(z) = 0 ⇒ ∣ z ∣ > 1.
CC3:The components of η _t are independent and their squares have non‐degenerate distributions.
CC4:If p > 0, then and are left coprime and has full rank m .
CC5: R is a positive definite correlation matrix for all θ ∈ Θ .

If the space Θ is constrained by ( 10.54), that is, if maximal orders are imposed for each component of and in each equation, then Assumption CC4 can be replaced by the following more general condition:

CC4^′:If p > 0, then and are left coprime and has full rank m .

It will be useful to approximate the sequence by an ergodic and stationary sequence. Assumption CC2 implies that, for all θ ∈ Θ , the roots of ℬ_θ(z) are outside the unit disk. Denote by the strictly stationary, non‐anticipative and ergodic solution of

10.57

Now, letting and H _t = D _t RD _t , we define

Let and , where R ^1/2 is the symmetric positive definite square root of R . We also set and We are now in a position to state the following consistency theorem.

To establish the AN we require the following additional assumptions:

CC6: , where is the interior of Θ .
CC7:

10.6 Looking for Numerically Feasible Estimation Methods

Despite the wide range of applicability of the QML estimation method, this approach may entail formidable numerical difficulties when the dimension of the vector of financial returns under study is large. In asset pricing applications or portfolio management, practitioners may have to handle cross sections of hundreds – even thousands – of stocks. Whatever the class of MGARCH used, the number of unknown parameters increases dramatically as the dimension of the cross section increases. This is particularly problematic when the Gaussian QML estimation method is used, because the high‐dimensional conditional variance matrix has to be inverted at every step of the optimisation procedure.

In this section, we present two methods aiming at alleviating the dimensionality curse.

10.6.1 Variance Targeting Estimation

Variance targeting (VT) estimation is a two‐step procedure in which the unconditional variance–covariance matrix of the returns vector process is estimated by a moment estimator in a first step. VT is based on a reparameterisation of the MGARCH model, in which the matrix of intercepts in the volatility equation is replaced by the unconditional covariance matrix.

To be more specific, consider the CCC‐GARCH( p, q ) model

10.58

where and R ₀ is a correlation matrix, A _0i and B _0j are m × m matrices with positive coefficients. Model (10.58) admits a strict and second‐order non‐anticipative stationary solution (ε_t) when

A: the spectral radius of is strictly less than 1.

Moreover, under this assumption, we have that

10.59

It follows that the last equation in model ( 10.58) can be equivalently written

10.60

With this new formulation, the generic parameter value consists of the coefficients of the vector h and the matrices A _i and B _j (corresponding to the true values h ₀, A _0i and B _0j , respectively), and the coefficients of the lower triangular part (excluding the diagonal) of the correlation matrix R = (ρ _ij). One advantage of this parameterisation over the initial one is that the parameter h ₀ can be straightforwardly estimated by the empirical mean

In the VT estimation method, the components of h are thus estimated empirically in a first step, and the other parameters are estimated in a second step, via a QML optimisation. Under appropriate assumptions, the resulting estimator is shown to be consistent and asymptotically normal (see Francq, Horváth, and Zakoian 2016).

One interest of this procedure is computational. Table 10.1 shows the reduction of computational time compared to the full quasi‐maximum likelihood (FQML), when both the dimension m of the series and the sample size n vary. For both methods, the computation time increases rapidly with m , but the relative time‐computation gain does not depend much on m , nor on n .

Table 10.1 Seconds of CPU time for computing the VTE and QMLE (average of 100 replications).

Source: Francq, Horváth, and Zakoian (2016).

	n = 500			n = 5000
	m = 2	m = 3	m = 4	m = 2	m = 3	m = 4
VTE	3.44	7.98	17.29	41.40	94.43	197.53
QMLE	5.48	13.82	25.17	65.22	145.41	284.85

One issue is whether the time‐computation gain is paid in terms of accuracy of the estimator. In the univariate case, m = 1 (that is, for a standard GARCH( p, q ) model), it can be shown that the VTE is never asymptotically more efficient than the QMLE, regardless of the values of the GARCH parameters and the distribution of the iid process (see Francq, Horváth, and Zakoian 2011). In the multivariate setting, the asymptotic distributions of the two estimators are difficult to compare but, on simulation experiments, the accuracy loss entailed by the two‐step VT procedure is often barely visible. In finite sample, the VT estimator may even perform much better than the QML estimator.

A nice feature of the VT estimator is that it ensures robust estimation of the marginal variance, provided that it exists. Indeed, the variance of a model estimated by VT converges to the theoretical variance, even if the model is mis‐specified. For the convergence to hold true, it suffices that the observed process be stationary and ergodic with a finite second‐order moment. This is generally not the case when the misspecified model is estimated by QML. For some specific purposes such as long‐horizon prediction or long‐term value‐at‐risk evaluation, the essential point is to well estimate the marginal distribution, in particular the marginal moments. The fact that the VTE guarantees a consistent estimation of the marginal variance may then be a crucial advantage over the QMLE.

10.6.2 Equation‐by‐Equation Estimation

Another approach for alleviating the dimensionality curse in the estimation of MGARCH models consists in estimating the individual volatilities in a first step – ‘equation‐by‐equation’ (EbE) – and to estimate the remaining parameters in a second step.

More precisely, any ℝ^m ‐valued process (ε_t) with zero‐conditional mean and positive‐definite conditional variance matrix H _t can be represented as

10.61

where is the σ ‐field generated by {ε_u, u < t}, D _t = {diag( H _t)}^1/2 and Let the k th diagonal element of H _t , that is the variance of ε _kt conditional on . Assuming that is parameterised by some parameter , belonging to a compact parameter space Θ _k , we can write

10.62

where σ _k is a positive function. The vector satisfies and Because and , can be called the vector of EbE innovations of (ε_t).

It is important to note that model (10.62), satisfied by the components of (ε_t), is not a standard univariate GARCH. First, because the volatility σ _kt depends on the past of all components of ε_t – not only the past of ε _kt . And second, because the innovation sequence is not iid (except when the conditional correlation matrix is constant, R _t = R ). For instance, a parametric specification of σ _kt mimicking the classical GARCH(1,1) is given by

10.63

Of course, other formulations – for instance including leverage effects – could be considered as well.

Having parameterised the individual conditional variances of (ε_t), the model can be completed by specifying the dynamics of the conditional correlation matrix R _t . For instance, in the DCC model of Engle (2002a), the conditional correlation matrix is modelled as a function of the past standardized returns, as follows

where α, β ≥ 0, α + β < 1, S is a positive‐definite matrix, and is the diagonal matrix with the same diagonal elements as Q _t .

More generally, suppose that matrix R _t is parameterised by some parameter ρ ₀ ∈ ℝ^r , together with the volatility parameter θ ₀ , as

Given observations ε₁,…, ε_n , and arbitrary initial values for i ≤ 0, we define for k = 1,…, m . A two‐step estimation procedure can be developed as follows.

First step: EbE estimation of the volatility parameters , by

and extraction of the vectors of residuals where ;
Second step: QML estimation of the conditional correlation matrix ρ ₀ , as a solution of
10.64

where Λ ⊂ ℝ^r is a compact set, , and the 's are initial values.

Asymptotic properties of the procedure can be established for particular models (BEKK, CCC, etc.).

Example 10.8 Estimating semi‐diagonal BEKK models

Full BEKK models are generally impossible to estimate for large cross‐sectional dimensions (see for instance Laurent, Rombouts, and Violante 2012) and practitioners generally only consider diagonal or even scalar models. The EbE approach can be applied to estimate a BEKK‐GARCH( p, q ) model given by

10.65

where ( η _t) is an iid ℝ^m ‐valued centred sequence with , A _0i = (a _ikℓ)_{1 ≤ k, ℓ ≤ m} , B _0j = diag(b _j1,…, b _jm), and Ω ₀ = (ω _kℓ)_{1 ≤ k, ℓ ≤ m} is a positive definite m × m matrix. In this model, the conditional variance h _{kk, t} of the k th return may depend on the past of all returns; however, the equation of h _{kk, t} does not involve past values of the other conditional variances h _{jj, t} . The model can thus be called semi‐diagonal (as opposed to the diagonal BEKK in which both the B _0j and A _0i are diagonal matrices).

The dynamics of the k th diagonal entry of H _t is given by

10.66

Let for k = 1,…, m , where denotes the k th row of the matrix A _0i , and Note that h _{kk, t} is invariant to a change of sign of the k th row of any matrix A _i . For identifiability, we therefore impose a _ik1 > 0 for i = 1,…, q . Let denote a generic parameter value. The parameter space Θ _k is any compact subset of

Under the strict stationarity condition (see Section 10.3.1), the existence of a positive density around 0 for η ₁ and the moment condition E|η _kt|^{4(1 + δ)} < ∞, the equation‐by‐equation estimator (EbEE) of is shown to be strongly consistent and asymptotically normal (provided that belongs to the interior of Θ _k ).

The diagonal elements of Ω ₀ and the matrices A _0i and B _0j can thus be consistently estimated by successively applying the EbEE to each equation. Note that this is possible because each parameter of the model appears in one, and only one, equation.

Can the semi‐diagonal BEKK‐GARCH( p, q ) model (10.65) be fully estimated by this approach? The answer is positive using a moment estimator. More precisely, let

Under the second‐order stationarity assumption, we have

It follows that, letting and denote the EbE estimators of A ₀ and B ₀ , respectively, a consistent estimator of Ω ₀ is obtained from

In this example, a complex MGARCH model is consistently estimated without relying on a cumbersome full‐likelihood optimisation. When the number m of assets increases, the number of optimisations involved in Step 1 increases accordingly. However, the estimation may remain feasible provided that the number of parameters involved in each volatility equation does not explode.

Numerical experiments confirm that the gains in computation time can be huge compared with the FQML estimator in which all the parameters are estimated in one step. Table 10.2 compares the effective computation times required by the two estimators as a function of the dimension m , for the CCC‐GARCH(1, 1) model with A ₁ = 0.05 I _m , B ₁ = 0.9 I _m , R = I _m , and . The m(m − 1)/2 sub‐diagonal terms of R were estimated, together with the 3m other parameters of the model. The two estimators were fitted on simulations of length n = 2000. The comparison of the CPUs is clearly in favour of the EbEE, the FQML being in failure for m ≥ 10. When m increases, the computation time of the FQML estimator becomes prohibitive, and more importantly, the optimisation fails to provide a reasonable value for . Table 10.2 also compares the relative efficiencies (RE) of the two approaches. To this aim, we first computed the approximated information matrix . The quadratic form is used as a measure of accuracy of an estimator (the Euclidean distance, obtained by replacing J _n by the identity matrix, has the drawback of being scale dependent). The relative efficiency displayed in Table 10.2 is defined by

Table 10.2 Computation time (CPU time in seconds) and relative efficiency (RE) of the EbE with respect to the FQMLE (NA = not available, due to the impossibility to compute the FQMLE), for m ‐dimensional CCC‐GARCH(1,1) models.

Source: Francq and Zakoïan (2016).

Dim. m	2	4	6	8	10
No. of parameters	7	18	33	52	75
CPU for EbEE	0.57	1.18	1.52	2.04	2.82
CPU for FQMLE	32.49	123.33	317.85	876.52	1 292.34
ratio of CPU	57.00	104.52	209.11	429.67	458.28
RE	0.96	0.99	0.99	0.97	102.42
Dim. m	50	100	200	400	800
No. of parameters	1 375	5 250	20 500	81 000	322 000
CPU for EbEE	13.67	27.89	56.58	110.00	226.32
CPU for FQMLE	NA	NA	NA	NA	NA
Ratio of CPU	NA	NA	NA	NA	NA
RE	NA	NA	NA	NA	NA

where and denote, respectively, the EbE and FQML estimators. The computation time of the FQML estimator being huge when m is large, the RE and CPU times are only computed on 1 simulation, but they are representative of what is generally observed. When m ≤ 9, the accuracies are very similar, with a slight advantage for the FQML (which corresponds here to the ML). When the number of parameters becomes too large, the optimisation fails to give a reasonable value of , and the RE clearly indicates the superiority of the EbEE over the QMLE for m ≥ 10.

10.7 Proofs of the Asymptotic Results

10.7.1 Proof of the CAN in Theorem 10.7

We shall use the multiplicative norm (see Exercises 10.5 and 10.6) defined by

10.67

where A is a d ₁ × d ₂ matrix, ∥x∥ is the Euclidean norm of vector , and ρ(·) denotes the spectral radius. This norm satisfies, for any d ₂ × d ₁ matrix B ,

10.68

10.69

When A is a square d × d matrix, the last inequality of (10.68) yields

10.70

We also often use the elementary relation

10.71

when A is a d ₁ × d ₂ matrix and B is d ₂ × d ₁ matrix.

To show the AN, we will use the following elementary results on the differentiation of expressions involving matrices. If f(A) is a real‐valued function of a matrix A whose entries a _ij are functions of some variable x , the chain rule for differentiation of compositions of functions states that

10.72

Moreover, for A invertible we have

10.73

Proof of the Consistency

We shall establish the intermediate results (a), (c), and (d) which are stated as in the univariate case (see the proof of Theorem 7.1 in Section 7.4), the result (b) being satisfied by Assumption A4.

Proof of (a): initial values are forgotten asymptotically. We have

10.79

Using (10.71), (10.69), A1–A2, and omitting the subscript ‘( θ )’, the first sum of (10.79) satisfies

as n → ∞. Indeed is finite a.s. since, for some s < 1,

by A3. If A2 is replaced by A2^′, the previous result follows from the Hölder inequality. Now, consider the second term of the right‐hand side of ( 10.79). The relation (10.70), the Minkowski inequality, the elementary inequality log(x) ≤ x + 1 and the multiplicativity of the spectral matrix norm imply

and, by symmetry,

Using again A1, A2, or A2^′, we thus have shown that

10.80

Proof of (c): the limit criterion is minimised at the true value. As in the univariate case, we first show that Eℓ_t( θ ) is well defined in ℝ ∪ {+∞} for all θ , and in ℝ for θ = θ ₀ . By A1 and ( 10.70) we have and thus

At θ ₀ , Jensen's inequality and A3 entail

Now we show that Eℓ_t( θ ) is minimised at θ ₀ . Without loss of generality, assume that . Let λ _{1, t},…, λ _{m, t} be the eigenvalues of , which are positive (see Exercise 10.15). We have

where the inequality is strict unless if λ _it = 1 a.s. for all i , that is iff H _t( θ ) = H _t( θ ₀) a.s., which is equivalent to θ = θ ₀ under A4.

Proof of (d). The previous results and the ergodic theorem then entail that

Similarly, (10.80) and the ergodic theorem applied to the stationary process (X _t) with show that

where V _m( θ ) denotes the ball of centre θ and radius 1/m . By A1 we have . Subtracting the constant K ₀ to ℓ_t( θ ^*) if necessary, one can always assume that is positive. If E ∣ ℓ_t( θ ) ∣ < ∞, by Fatou's lemma and A5, for any ε > 0 there exists m sufficiently large such that

If , then the left‐hand side of the previous inequality can be made arbitrarily large. The consistency follows.

Proof of the Asymptotic Normality (AN)

The matrix derivative rules (10.72), (10.77), and (10.75) yield

10.81

We thus have

10.82

Under A10, the matrix K is well defined, and the second moment condition in A9 entails E ∥ C _{i, t}∥² < ∞. The existence of the matrix I is thus guaranteed by A9 and A10. It is then clear from (10.82) that is a square integrable stationary and ergodic martingale difference. The central limit theorem in Corollary A.1 and the Cramér–Wold device then entail that

10.83

Differentiating (10.81), we also have

with

Under A9, noting that Ec ₁( θ ₀) = − Ec ₄( θ ₀) and Ec ₃( θ ₀) = − Ec ₅( θ ₀), we thus have

using elementary properties of the vec and Kronecker operators. By the consistency and A6, we have , and thus almost surely for n large enough. Taylor expansions and A7–A8 thus show that almost surely

where the 's are between and θ ₀ component‐wise. To show that the previous matrix into brackets converges almost surely to J , it suffices to use the ergodic theorem, the continuity of the derivatives, and to show that

for some neighbourhood V( θ ₀) of θ ₀ , which follows from A7 and A9.

If J was singular, there would exist some non‐zero vector λ ∈ ℝ^d such that λ ^′ Jλ = 0. Since is almost surely positive definite, this entails that Δ _t λ = 0 with probability one, which is excluded by A10. The AN, as well as the Bahadur linearisation (10.49), easily follow from (10.83).

10.7.2 Proof of the CAN in Theorems 10.8 and 10.9

The proof consists in showing that the regularity conditions of Theorem 10.7 are satisfied.

Proof of Theorem 10.8

Note that CC2 implies that the invertibility condition ( 10.46) holds for all θ ∈ Θ , and that H _t = H _t( θ ₀) is a measurable function of { η _u, u < t}. Corollary 10.2 shows that, under the stationarity condition CC2, the moment conditions A3 are satisfied. The smoothness conditions A5 and A7 are obviously satisfied for the CCC model.

Proof that A1 and A2^′ are satisfied: Rewrite Eq. (10.57) in matrix form as

10.84

where is defined in Corollary 10.1 and

10.85

In view of assumption CC2 and Corollary 10.1, we have By the compactness of Θ , we even have

10.86

Corollary 10.2 and (10.86) thus entail that

10.87

for some s > 0. Using Eq. (10.84) iteratively, as in the univariate case, we deduce that almost surely

where denotes the vector obtained by replacing the variables by in H _t . Observe that K is a random variable that depends on the past values {ε_t, t ≤ 0}. Since K does not depend on n , it can be considered as a constant, like ρ . We deduce that

10.88

where the second inequality is obtained by arguing that the elements of are strictly positive uniformly in Θ . We then have

10.89

with ρ _t = ρ ^tsup_{θ

∈ Θ} ∥ D _t∥. Therefore, the second moment condition of (10.87) shows that A2^′ holds true, for any p and for s sufficiently small.

Noting that ∥ R ⁻¹∥ is the inverse of the eigenvalue of smallest modulus of R , and that , we have

10.90

using CC5, the compactness of Θ and the strict positivity of the components of . Similarly, we have

10.91

We thus have shown that A1 and A2^′ hold true, meaning that the initial values are asymptotically irrelevant, in the sense of ( 10.80).

Proof of A4: Suppose that, for some θ ≠ θ ₀ ,

Then it readily follows that ρ = ρ ₀ and, using the invertibility of the polynomial ℬ_θ(B) under assumption CC2, by ( 10.87),

that is,

Let . Noting that and isolating the terms that are functions of η _t − 1 ,

where Z _t − 2 belongs to the σ ‐field generated by {η _t − 2, η _t − 3,…}. Since η _t − 1 is independent of this σ ‐field, Exercise 10.3 shows that the latter equality contradicts CC3 unless when p _ij h _{jj, t} = 0 almost surely, where the p _ij are the entries of for i, j = 1,…, m . Because h _{jj, t} > 0 for all j , we thus have . Similarly, we show that by successively considering the past values of η _t − 1 . Therefore, in view of CC4 (or CC4 ^′ ), we have α = α ₀ and β = β ₀ by arguments already used. It readily follows that . Hence θ = θ ₀ . We have thus established the identifiability condition A4, and the proof of Theorem 10.8 follows from Theorem 10.7.□

Proof of Theorem 10.9

It remains to show that A8, A9 and A11 hold.

Proof of A9: Denoting by the i ₁ th component of ,

where c ₀ is a strictly positive constant and, by the usual convention, the index 0 corresponds to quantities evaluated at θ = θ ₀ . Under CC6, for a sufficiently small neighbourhood of θ ₀ , we have

for all i ₁, j ₁, j ₂ ∈ {1,…, m} and all δ > 0. Moreover, in , the coefficient of is bounded below by a constant c > 0 uniformly on . We thus have

for some ρ ∈ [0, 1), all δ > 0 and all s ∈ [0, 1]. Corollary 10.2 then implies that, for all r ₀ ≥ 0, there exists such that

Since

where R ₀ denotes the true value of R , the last moment condition of A9 is satisfied (for all r ).

By ( 10.84)–( 10.86),

and, setting s ₂ = m + qm ² ,

Setting s ₁ = m + (p + q)m ² , we have

where is a matrix whose entries are all 0, apart from a 1 located at the same place as θ _i in . By abuse of notation, we denote by H _t(i ₁) and the i ₁ th components of H _t and . Note that, by the arguments used to show ( 10.87), the previous expressions show that

10.92

for some s > 0. With arguments similar to those used in the univariate case, that is, the inequality x/(1 + x) ≤ x ^s for all x ≥ 0 and s ∈ [0, 1], and the inequalities

and, setting ,

we obtain

where the constants (which also depend on i ₁ , s and r ₀ ) belong to the interval [0, 1). Noting that these inequalities are uniform on a neighbourhood of , that they can be extended to higher‐order derivatives, and that Corollary 10.2 implies that , we can show, as in the univariate case, that for all i ₁ = 1,…, m , all i, j, k = 1,…, s ₁ and all r ₀ ≥ 0, there exists a neighbourhood of θ ₀ such that

10.93

and

10.94

Omitting ‘( θ )’ to lighten the notation, note that

for i = 1,…, s ₁ , and

for i = s ₁ + 1,…, s ₀ . It follows that (10.93) entails the second moment condition of A9 (for any q ). Similarly, (10.94) entails the first moment condition of A9 (for any p ).

Proof of A8: First note that by

and (10.88)–(10.91), we have

10.95

From Eq. ( 10.84), we have

where r = max {p, q} and the tilde means that initial values are taken into account. Since for all t > r , we have and

Thus condition ( 10.86) entails that

10.96

Because

the second inequality of ( 10.88) and (10.96) imply that

10.97

Note that ( 10.81) continues to hold when ℓ_t( θ ) and H _t( θ ) are replaced by and . Therefore,

where

Using conditions ( 10.87), (10.90), ( 10.91), (10.92), (10.95), and (10.97), it can be shown that Tr( C ₁ + C ₂) ≤ Kρ ^tu_t , where u_t is a random variable such that sup_t E|u_t|^s < ∞ for some small s ∈ (0, 1). Arguing that is almost surely finite because its moment of order s is finite, the conclusion follows.

Proof of A11: Recall that we applied Theorem 10.7 with d = s ₀ . We thus have to show that it cannot exist as a non‐zero vector , such that with probability 1. Decompose c into with and , where s ₃ = s ₀ − s ₁ = m(m − 1)/2. Rows 1, m + 1,…, m ² of the equations

10.98

give

10.99

Differentiating Eq. ( 10.57) yields

where

Because (10.99) is satisfied for all t , we have

where quantities evaluated at θ = θ ₀ are indexed by 0. This entails that

and finally, introducing a vector θ ₁ whose s ₁ first components are

we have

by choosing c ₁ small enough so that θ ₁ ∈ Θ . If c ₁ ≠ 0 then θ ₁ ≠ θ ₀ . This is in contradiction to the identifiability of the parameter (see the proof of A4), hence c ₁ = 0. Equations (10.98) thus become

Therefore,

Because the vectors, ∂vec R /∂θ _i , i = s ₁ + 1,…, s ₀ , are linearly independent, the vector is null, and thus c = 0, which completes the proof.□

10.8 Bibliographical Notes

Multivariate ARCH models were first considered by Engle, Granger, and Kraft (1984) in the guise of the diagonal model. This model was extended and studied by Bollerslev, Engle, and Woolridge (1988). The reader may refer to Hafner and Preminger (2009a), Lanne and Saikkonen (2007), van der Weide (2002), and Vrontos, Dellaportas, and Politis (2003) for the definition and study of FF‐GARCH models of the form ( 10.26) where P is not assumed to be orthonormal. The CCC‐GARCH model based on ( 10.17) was introduced by Bollerslev (1990) and extended to ( 10.18) by Jeantheau (1998). A sufficient condition for strict stationarity and the existence of fourth‐order moments of the CCC‐GARCH(p, q) is established by Aue et al. (2009). The DCC formulations based on ( 10.19) and (10.20) were proposed, respectively, by Tse and Tsui (2002), and Engle (2002a). The cDCC model (10.1) is extended by allowing for a clustering structure of the univariate GARCH parameters in Aielli and Caporin (2014). The single‐factor model (10.24), which can be viewed as a dynamic version of the capital asset pricing model of Sharpe (1964), was proposed by Engle, Ng, and Rothschild (1990). The main references on the O‐GARCH or PC‐GARCH models are Alexander (2002) and Ding and Engle (2001). See van der Weide (2002) and Boswijk and van der Weide (2006) for references on the GO‐GARCH model. More details on BEKK models can be found in McAleer et al. (2009). . Caporin and McAleer (2012) compared the BEKK and DCC specifications. Hafner (2003) and He and Teräsvirta (2004) studied the fourth‐order moments of MGARCH models. Dynamic conditional correlations models were introduced by Engle (2002a) and Tse and Tsui (2002). Pourahmadi (1999) showed that an unconstrained parametrisation of a covariance matrix can be conveniently obtained from its Cholesky decomposition (see Chapter 10 in Tsay 2010 and Dellaportas and Pourahmadi 2012 for modelling the Cholesky decomposition of a conditional variance and for applications in finance). These references, and those given in the text, can be complemented by the surveys by Bauwens, Laurent, and Rombouts (2006), Silvennoinen and Teräsvirta (2008), Bauwens, Hafner, and Laurent (2012), Almeida, Hotta, and Ruiz (2015), and by the book by Engle (2009). Stationarity conditions for DCC models have been established by Fermanian and Malongo (2016).

Jeantheau (1998) gave general conditions for the strong consistency of the QMLE for MGARCH models. Comte and Lieberman (2003) showed the CAN of the QMLE for the BEKK formulation under some high‐level assumptions, in particular the existence of eight‐order moments for the returns process. On the other hand, Avarucci, Beutner, and Zaffaroni (2013) showed that for the BEKK, the finiteness of the variance of the scores requires at least the existence of second‐order moments of the observable process. Asymptotic results were established by Ling and McAleer (2003a) for the CCC formulation of an ARMA‐GARCH, and by Hafner and Preminger (2009a) for a factor GARCH model of the FF‐GARCH form. Theorem 10.1 provides high‐level assumptions implying the CAN of the QMLE of a general MGARCH model. Theorems 10.2 and 10.3 are concerned with the CCC formulation, and allow us to study a subclass of the models considered by Ling and McAleer (2003a), but do not cover the models studied by Comte and Lieberman (2003) or those studied by Hafner and Preminger (2009b). Theorems 10.2 and 10.3 are mainly of interest because they do not require any moment on the observed process and do not use high‐level assumptions. An extension of Theorems 10.2 and 10.3 to asymmetric CCC‐GARCH can be found in Francq and Zakoian (2012a). For additional information on identifiability, in particular on the echelon form, one may for instance refer to Hannan (1976), Hannan and Deistler (1988), Lütkepohl (1991), and Reinsel (1997). Pedersen (2017) recently studied inference and testing in extended CCC GARCH models in the case where the true parameter vector is a boundary point of the parameter space.

Portmanteau tests on the normalised residuals of MGARCH processes were proposed, in particular, by Tse (2002) and Duchesne and Lalancette (2003).

Bardet and Wintenberger (2009) gave regularity conditions for the strong CAN of the QMLE, for a general class of multidimensional causal processes. Their framework is more general than that of Theorem 10.1, since it also incorporates a conditional mean.

Among models not studied in this book are the spline GARCH models in which the volatility is written as a product of a slowly varying deterministic component and a GARCH‐type component. These models were introduced by Engle and Rangel (2008), and their multivariate generalization is due to Hafner and Linton (2010).

The VT approach is not limited to the standard CCC model. Pedersen and Rahbek (2014) showed that the BEKK‐GARCH model can be reparameterized in such a way that the variance of the observed process appears explicitly in the conditional variance equation. They established that the VT estimator is consistent and asymptotically normal when the process has finite sixth‐order moments. See Hill and Renault (2012), Vaynman and Beare (2014), and Pedersen (2016) for results with infinite fourth moments. Bauwens, Braione, and Storti (2016) use the VT approach for a class of dynamic models for realized covariance matrices. Models incorporating leverage effects can also be estimated in a similar way: instead of targeting the variances, moments related to the signs of the returns are targeted (see Francq, Horváth, and Zakoian 2016). The idea of variance targeting has been extended to covariance targeting by Noureldin, Shephard and Sheppard (2014).

The EbE approach was initially proposed by Engle and Sheppard (2001) and Engle (2002a) in the context of DCC models. It was also suggested by Pelletier (2006) for regime‐switching dynamic correlation models, by Aielli (2013) for DCC models, and it was used in several empirical studies (see for instance Hafner and Reznikova 2012; Sucarrat Grønneberg, and Escribano 2013). The statistical properties of such two‐step estimators have been established by Francq and Zakoïan (2016).

10.9 Exercises

10.1 (More or less parsimonious representations)

Compare the number of parameters of the various GARCH(p, q) representations, as a function of the dimension m .

10.2 (Identifiability of a matrix rational fraction) Let 풜 _θ(z), ℬ _θ(z), , and denote square matrices of polynomials. Show that
10.100

for all z such that if and only if there exists an operator U(z) such that

10.101
10.3 (Two independent non‐degenerate random variables cannot be equal) Let X and Y be two independent real random variables such that Y = X almost surely. We aim to prove that X and Y are almost surely constant.
1. Suppose that Var(X) exists. Compute Var(X) and show the stated result in this case.
2. Suppose that X is discrete and P(X = x ₁)P(X = x ₂) ≠ 0. Show that necessarily x ₁ = x ₂ and show the result in this case.
3. Prove the result in the general case.
10.4 (Duplication and elimination) Consider the duplication matrix D_m and the elimination matrix defined by

where A is any symmetric m × m matrix. Show that
10.8 (Norm and spectral radius) Show that
10.6 (Elementary results on matrix norms) Show the equalities and inequalities of ( 10.68) and ( 10.69).
10.7 (Scalar GARCH) The scalar GARCH model has a volatility of the form

where the α _i and β _j are positive numbers. Give the positivity and second‐order stationarity conditions.
10.8 (Condition for the L ^p and almost sure convergence) Let p ∈ [1, ∞ ) and let (u _n) be a sequence of real random variables of L ^p such that

for some positive constant C , and some constant ρ in (0, 1). Prove that

to some random variable u of L ^p .
10.9 (An average of correlation matrices is a correlation matrix) Let R and Q be two correlation matrices of the same size and let p ∈ [0, 1]. Show that pR + (1 − p)Q is a correlation matrix.
10.10 (Factors as linear combinations of individual returns) Consider the factor model

where the β _j are linearly independent. Show there exist vectors α _j such that

where the are conditional variances of the portfolios . Compute the conditional covariance between these factors.
10.11 (BEKK representation of factor models) Consider the factor model

where the β _j are linearly independent, ω _j > 0, a _j ≥ 0, and 0 ≤ b _j < 1 for j = 1,…, r . Show that a BEKK representation holds, of the form
10.12 (PCA of a covariance matrix) Let X be a random vector of ℝ^m with variance matrix Σ.
1. Find the (or a) first principal component of X , that is a random variable of maximal variance, where . Is C ¹ unique?
2. Find the second principal component, that is, a random variable of maximal variance, where and Cov(C ¹, C ²) = 0.
3. Find all the principal components.
10.13 (BEKK‐GARCH models with a diagonal representation) Show that the matrices A ⁽ⁱ⁾ and B ^(j) defined in ( 10.21) are diagonal when the matrices A _ik and B _jk are diagonal.
10.14 (Determinant of a block companion matrix) If A and D are square matrices, with D invertible, we have

Use this property to show that matrix in Corollary 10.1 satisfies
10.15 (Eigenvalues of a product of positive definite matrices) Let A and B denote symmetric positive definite matrices of the same size. Show that AB is diagonalizable and that its eigenvalues are positive.
10.16 (Positive definiteness of a sum of positive semi‐definite matrices) Consider two matrices of the same size, symmetric, and positive semi‐definite, of the form

where A ₁₁ and B ₁₁ are also square matrices of the same size. Show that if A ₂₂ and B ₁₁ are positive definite, then so is A + B .
10.17 (Positive definite matrix and almost surely positive definite matrix) Let A by a symmetric random matrix such that for all real vectors c ≠ 0,

Show that this does not entail that A is almost surely positive definite.

Notes

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 10 Multivariate GARCH Processes

Create new playlist

Sign In

Sign Up

10.1 Multivariate Stationary Processes

10.2 Multivariate GARCH Models

10.2.1 Diagonal Model

10.2.2 Vector GARCH Model

Positivity Conditions

10.2.3 Constant Conditional Correlations Models

10.2.4 Dynamic Conditional Correlations Models

10.2.5 BEKK‐GARCH Model

10.2.6 Factor GARCH Models

Factor Models with Idiosyncratic Noise

Principal Components GARCH Model

10.2.7 Cholesky GARCH

10.3 Stationarity

10.3.1 Stationarity of VEC and BEKK Models

Construction of a Stationary Solution

10.3.2 Stationarity of the CCC Model

10.3.3 Stationarity of DCC models

10.4 QML Estimation of General MGARCH

10.4.1 Asymptotic Properties of the QMLE

10.5 Estimation of the CCC Model

10.5.1 Identifiability Conditions

10.5.2 Asymptotic Properties of the QMLE of the CCC‐GARCH model

10.6 Looking for Numerically Feasible Estimation Methods

10.6.1 Variance Targeting Estimation

10.6.2 Equation‐by‐Equation Estimation

10.7 Proofs of the Asymptotic Results

10.7.1 Proof of the CAN in Theorem 10.7

Proof of the Consistency

Proof of the Asymptotic Normality (AN)

10.7.2 Proof of the CAN in Theorems 10.8 and 10.9

Proof of Theorem 10.8

Proof of Theorem 10.9

10.8 Bibliographical Notes

10.9 Exercises

10.1 (More or less parsimonious representations)

Notes

Table of Contents for
10 Multivariate GARCH Processes