Chapter 4
Multivariate Data Analysis

Multivariate data analysis studies simultaneously several time series, but the time series properties are ignored, and thus the analysis can be called cross-sectional.

The copula is an important concept of multivariate data analysis. Copula models are a convenient way to separate multivariate analysis to the purely univariate and to the purely multivariate components. We compose a multivariate distribution into the part that describes the dependence and into the parts that describe the marginal distributions. The marginal distributions can be estimated efficiently using nonparametric methods, but it can be useful to apply parametric models to estimate dependence, for a high-dimensional distribution. Combining nonparametric estimators of marginals and a parametric estimator of the copula leads to a semiparametric estimator of the distribution.

Multivariate data can be described using such statistics as linear correlation, Spearman's rank correlation, and Kendall's rank correlation. Linear correlation is used in the Markowitz portfolio selection. Rank correlations are more natural concepts to describe dependence, because they are determined by the copula, whereas linear correlation is affected by marginal distributions. Coefficients of tail dependence can capture whether the dependence of asset returns is larger during the periods of high volatility.

Multivariate graphical tools include scatter plots, which can be combined with multidimensional scaling and other dimension reduction methods.

Section 4.1 studies measures of dependence. Section 4.2 considers multivariate graphical tools. Section 4.3 defines multivariate parametric distributions such as multivariate normal, multivariate Student, and elliptical distributions. Section 4.4 defines copulas and models for copulas.

4.1 Measures of Dependence

Random vectors c04-math-001 are said to be independent if

equation

for all measurable c04-math-002. This is equivalent to

equation

for all measurable c04-math-003, so knowledge of c04-math-004 does not affect the probability evaluations of c04-math-005. The complete dependence between random vectors c04-math-006 and c04-math-007 occurs when there is a bijection c04-math-008 so that

holds almost everywhere. When the random vectors are not independent and not completely dependent we may try to quantify the dependency between two random vectors. We may say that two random vectors have the same dependency when they have the same copula, and the copula is defined in Section 4.4.

Correlation coefficients are defined between two real valued random variables. We define three correlation coefficients: linear correlation c04-math-010, Spearman's rank correlation c04-math-011, and Kendall's rank correlation c04-math-012. All of these correlation coefficients satisfy

equation

where c04-math-013 and c04-math-014 are real valued random variables. Furthermore, if c04-math-015 and c04-math-016 are independent, then c04-math-017 for any of the correlation coefficients. Converse does not hold, so that correlation zero does not imply independence.

Complete dependence was defined by (4.1). Both for the Spearman's rank correlation and for the Kendall's rank correlation we have that

where c04-math-019 or c04-math-020. In the case of real valued random variables the complete dependency can be divided into comonotonicity and countermonotonicity. Real-valued random variables c04-math-021 and c04-math-022 are said to be comonotonic if there is a strictly increasing function c04-math-023 so that c04-math-024 almost everywhere. Real-valued random variables c04-math-025 and c04-math-026 are said to be countermonotonic if there is a strictly decreasing function c04-math-027 so that c04-math-028 almost everywhere. Both for the Spearman's rank correlation and for the Kendall's rank correlation we have that c04-math-029 if and only if c04-math-030 and c04-math-031 are comonotonic, and c04-math-032 if and only if c04-math-033 and c04-math-034 are countermonotonic, where c04-math-035 or c04-math-036.

The linear correlation coefficient c04-math-037 does not satisfy (4.2). However, we have that

If c04-math-039, then c04-math-040. If c04-math-041, then c04-math-042.

4.1.1 Correlation Coefficients

We define linear correlation c04-math-043, Spearman's rank correlation c04-math-044, and Kendall's rank correlation c04-math-045.

4.1.1.1 Linear Correlation

The linear correlation coefficient between real valued random variables c04-math-046 and c04-math-047 is defined as

where the covariance is

equation

and the standard deviation is c04-math-049.

We noted in (4.3) that the linear correlation coefficient characterizes linear dependency. However, (4.2) does not hold for the linear correlation coefficient. Even when c04-math-050 and c04-math-051 are completely dependent, it can happen that c04-math-052. For example, let c04-math-053, c04-math-054, and c04-math-055, where c04-math-056. Then,

equation

and c04-math-057 only for c04-math-058, otherwise c04-math-059; the example is from McNeil et al. (2005, p. 205).

Let us assume that c04-math-060 and c04-math-061 have continuous distributions and let us denote with c04-math-062 the distribution function of c04-math-063 and with c04-math-064 and c04-math-065 the marginal distribution functions. Then,

equation

where c04-math-067, c04-math-068, is the copula of the distribution of c04-math-069, as defined in (4.29). Equation (4.5) is called Höffding's formula, and its proof can be found in McNeil et al. (2005, p. 203). Thus, the linear correlation is not solely a function of the copula, it depends also on the marginal distributions c04-math-070 and c04-math-071.

The linear correlation coefficient can be estimated with the sample correlation. Let c04-math-072 be a sample from the distribution of c04-math-073 and c04-math-074 be a sample from the distribution of c04-math-075. The sample correlation coefficient is defined as

where c04-math-077 and c04-math-078. An alternative estimator is defined in (4.10).

4.1.1.2 Spearman's Rank Correlation

Spearman's rank correlation (Spearman's rho) is defined by

equation

where c04-math-079 is the distribution function of c04-math-080, c04-math-081. If c04-math-082 and c04-math-083 have continuous distributions, then

equation

where c04-math-084, c04-math-085, is the copula as defined in Section 4.4 (see McNeil et al., 2005, p. 207).1 Thus, Spearman's correlation coefficient is defined solely in terms of the copula.

We have still another way of writing Spearman's rank correlation. Let c04-math-087, c04-math-088, and c04-math-089, let c04-math-090 have the same distribution, and let c04-math-091 be independent. Then,

equation

The sample Spearman's rank correlation can be defined as the sample linear correlation coefficient between the ranks. Let c04-math-092 be a sample from the distribution of c04-math-093 and c04-math-094 be a sample from the distribution of c04-math-095. The rank of observation c04-math-096, c04-math-097, c04-math-098, is

equation

That is, c04-math-099 is the number of observations of the c04-math-100th variable smaller or equal to c04-math-101.2 Let us use the shorthand notation

equation

so that c04-math-104, c04-math-105. Then the sample Spearman's rank correlation can be written as

equation

where c04-math-106 is the sample linear correlation coefficient, defined in (4.6). Since c04-math-107 and c04-math-108, we can write

equation

4.1.1.3 Kendall's Rank Correlation

Let c04-math-109 and c04-math-110, let c04-math-111 and c04-math-112 have the same distribution, and let c04-math-113 and c04-math-114 be independent. Kendall's rank correlation (Kenadall's tau) is defined by

When c04-math-116 and c04-math-117 have continuous distributions, we have

equation

and we can write

equation

where c04-math-118, c04-math-119, is the copula as defined in Section 4.4 (see McNeil et al., 2005, p. 207).

Let us define an estimator for c04-math-120. Let c04-math-121 be a sample from the distribution of c04-math-122 and c04-math-123 be a sample from the distribution of c04-math-124. Kendall's rank correlation can be written as

equation

where c04-math-125, if c04-math-126 and c04-math-127, if c04-math-128. This leads to the sample version

The computation takes longer than for the sample linear correlation and for the sample Spearman's correlation.

4.1.1.4 Relations between the Correlation Coefficients

We have a relation between the linear correlation and the Kendall's rank correlation for the elliptical distributions. Let c04-math-130 be a bivariate random vector. For all elliptical distributions with c04-math-131,

where c04-math-133 is the Kendall's rank correlation, as defined in (4.7), and c04-math-134 is the linear correlation, as defined in (4.4) (see McNeil et al., 2005, p. 217). This relationship can be applied to get an alternative and a more robust estimator for the estimator (4.6) of linear correlation. Define the estimator as

where c04-math-136 is the estimator (4.8).

For the distributions with a Gaussian copula, we also have a relation between the Spearman's rank correlation and the linear correlation. Let c04-math-137 be a distribution with a Gaussian copula and continuous margins. Then,

equation

and (4.9) holds also (see McNeil et al., 2005, p. 215).

Figure 4.1 studies linear correlation and Spearman's rank correlation for S&P 500 and Nasdaq-100 daily data, described in Section 2.4.2. Panel (a) shows a moving average estimate of linear correlation (blue) and Spearman's rank correlation (yellow). We use the one-sided moving average defined as

equation

where c04-math-138 are the S&P 500 centered returns and c04-math-139 are the Nasdaq-100 centered returns. The weights c04-math-140 are one for the last 500 observations, and zero for the other observations. See (6.5) for a more general moving average. The moving average estimator c04-math-141 is the Spearman's rho computed from the 500 previous observations. Panel (b) shows the correlation coefficients together with the moving average estimates of the standard deviation of S&P 500 returns (solid black line) and Nasdaq-100 returns (dashed black line). All time series are scaled to take values in the interval c04-math-142. We see that there is some tendency that the inter-stock correlations increase in volatile periods.

Graphical representation of Linear and Spearman's correlation, together with volatility.

Figure 4.1 Linear and Spearman's correlation, together with volatility. (a) Time series of moving average estimates of correlation between S&P 500 and Nasdaq-100 returns, with linear correlation (blue) and Spearman's rho (yellow); (b) we have added moving average estimates of the standard deviation of S&P 500 (black solid) and Nasdaq-100 (black dashed).

4.1.2 Coefficients of Tail Dependence

The coefficient of upper tail dependence is defined for random variables c04-math-143 and c04-math-144 with distribution functions c04-math-145 and c04-math-146 as

equation

where c04-math-147 and c04-math-148 are the generalized inverses. Similarly, the coefficient of lower tail dependence is

equation

See McNeil et al. (2005, p. 209).

4.1.2.1 Tail Coefficients in Terms of the Copula

The coefficients of upper and lower tail dependence can be defined in terms of the copula. Let c04-math-149 and c04-math-150 be continuous. We have that

equation

Also,

equation

Thus, the coefficient of upper tail dependence is

We have that

equation

Also,

equation

Thus, the coefficient of lower tail dependence for continuous c04-math-152 and c04-math-153 is equal to

4.1.2.2 Estimation of Tail Coefficients

Equations (4.11) and (4.12) suggest estimators for the coefficients of tail dependence. We can estimate the upper tail coefficient nonparametrically, using

equation

where c04-math-155 is the empirical copula, defined in (4.38), and c04-math-156 is close to 1. We can take, for example, c04-math-157, where c04-math-158. The coefficient of lower tail dependence can be estimated by

equation

where c04-math-159 is close to zero. We can take, for example, c04-math-160, where c04-math-161. These estimators have been studied in Dobric and Schmid (2005), Frahm et al. (2005), and Schmidt and Stadtmüller (2006).

Figure 4.2 studies tail coefficients for S&P 500 and Nasdaq-100 daily data, described in Section 2.4.2. Panel (a) shows the tail coefficients as a function of c04-math-162 for lower tail coefficients (red) and as a function of c04-math-163 for upper tail coefficients (blue). Panel (b) shows a moving average estimate of the lower tail coefficients. The tail coefficient is estimated using the window of the latest 1000 observations, for c04-math-164.

Graphical representation of Tail coefficients for S&P 500 and Nasdaq-100 returns.

Figure 4.2 Tail coefficients for S&P 500 and Nasdaq-100 returns. (a) Tail coefficients as a function of c04-math-165 for lower tail coefficients (red) and as a function of c04-math-166 for upper tail coefficients (blue); (b) time series of moving average estimates of lower tail coefficients.

4.1.2.3 Tail Coefficients for Parametric Families

The coefficients of lower and upper tail dependence for the Gaussian distributions are zero. The coefficients of lower and upper tail dependence for the Student distributions with degrees of freedom c04-math-167 and correlation coefficient c04-math-168 are

equation

where c04-math-169 is the distribution function of the univariate c04-math-170-distribution with c04-math-171 degrees of freedom, and we assume that c04-math-172; see McNeil et al. (2005, p. 211).

4.2 Multivariate Graphical Tools

First, we describe scatter plots and smooth scatter plots. Second, we describe visualization of correlation matrices with multidimensional scaling.

4.2.1 Scatter Plots

A two-dimensional scatter plot is a plot of points c04-math-173.

Figure 4.3 shows scatter plots of daily net returns of S&P 500 and Nasdaq-100. The data is described in Section 2.4.2. Panel (a) shows the original data and panel (b) shows the corresponding scatter plot after copula preserving transform with standard normal marginals, as defined in (4.36).

Graphical representation of Scatter plots of the net returns of S&P 500 and Nasdaq-100.

Figure 4.3 Scatter plots. Scatter plots of the net returns of S&P 500 and Nasdaq-100. (a) Original data; (b) copula transformed data with marginals being standard normal.

When the sample size is large, then the scatter plot is mostly black, so the visuality of density of the points in different regions is obscured. In this case it is possible to use histograms to obtain a smooth scatter plot. A multivariate histogram is defined in (3.42). First we take square roots c04-math-174 of the bin counts c04-math-175 and then we define c04-math-176. Now c04-math-177. Values c04-math-178 close to one are shown in light gray, and values c04-math-179 close to zero are shown in dark gray. See Carr et al. (1987) for a study of histogram plotting.

Figure 4.4 shows smooth scatter plots of daily net returns of S&P 500 and Nasdaq-100. The data is described in Section 2.4.2. Panel (a) shows a smooth scatter plot of the original data and panel (b) shows the corresponding scatter plot after copula preserving transform when the marginals are standard Gaussian.

Graphical representation of Smooth scatter plots: Scatter plots of the net returns of S&P 500 and Nasdaq-100.

Figure 4.4 Smooth scatter plots. Scatter plots of the net returns of S&P 500 and Nasdaq-100. (a) Original data; (b) copula transformed data with marginals being standard normal.

4.2.2 Correlation Matrix: Multidimensional Scaling

First, we define the correlation matrix. Second, we show how the correlation matrix may be visualized using multidimensional scaling.

4.2.2.1 Correlation Matrix

The correlation matrix is the c04-math-180 matrix whose elements are the linear correlation coefficients c04-math-181 for c04-math-182. The sample correlation matrix is the matrix whose elements are the sample linear correlation coefficients.

The correlation matrix can be defined using matrix notation. The covariance matrix of random vector c04-math-183 is defined by

4.13 equation

The covariance matrix is the c04-math-185 matrix whose elements are c04-math-186 for c04-math-187, where we denote c04-math-188. Let

equation

be the diagonal matrix whose diagonal is the vector of the inverses of the standard deviations. Then the correlation matrix is

equation

The covariance matrix can be estimated by the sample covariance matrix

4.14 equation

where c04-math-190 are identically distributed observations whose distribution is the same as the distribution of c04-math-191, and c04-math-192 is the arithmetic mean.

4.2.2.2 Multidimensional Scaling

Multidimensional scaling makes a nonlinear mapping of data c04-math-193 to c04-math-194, or to any space c04-math-195 with c04-math-196. We can define the mapping c04-math-197 of multidimensional scaling in two steps:

  1. 1. Compute the pairwise distances c04-math-198, c04-math-199.
  2. 2. Find points c04-math-200 so that c04-math-201 for c04-math-202.

In practice, we may not be able to find a mapping that preserves the distances exactly, but we find a mapping c04-math-203 so that the stress functional

equation

is minimized. Sammon's mapping uses the stress functional

equation

This stress functional emphasizes small distances. Numerical minimization is needed to solve the minimization problems.

Multidimensional scaling can be used to visualize correlations between time series. Let c04-math-204 be the time series of returns of company c04-math-205, where c04-math-206. When we normalize the time series of returns so that the vector of returns has sample mean zero and sample variance one, then the Euclidean distance is equivalent to using the correlation distance. Indeed, let

equation

where c04-math-207 and c04-math-208. Now

equation

where c04-math-209 is the sample linear correlation. Thus, we apply the multidimensional scaling for the norm

equation

which is obtained by dividing the Euclidean norm by c04-math-210. Since

equation

we have that

equation

Zero correlation gives c04-math-211, positive correlations give c04-math-212, and negative correlations give c04-math-213.

Figure 4.5 studies correlations of the returns of the components of DAX 30. We have daily observations of the components of DAX 30 starting at January 02, 2003 and ending at May 20, 2014, which makes 2892 observations. Panel (a) shows the correlation matrix as an image. We have used R-function “image.” Panel (b) shows the correlations with multidimensional scaling. We have used R-function “cmdscale.” The image of the correlation matrix is not as helpful as the multidimensional scaling. For example, we see that the return time series of Volkswagen with the ticker symbol “VOW” is an outlier. The returns of Fresenius and Fresenius Medical Care (“FRE” and “FME”) are highly correlated.

Graphical representation of Correlations of DAX 30.

Figure 4.5 Correlations of DAX 30. (a) An image of the correlation matrix for DAX 30; (b) correlations for DAX 30 with multidimensional scaling.

4.3 Multivariate Parametric Models

We give examples of multivariate parametric models. The examples include Gaussian and Student distributions (c04-math-214-distributions). More general families are normal variance mixture distributions and elliptical distributions.

4.3.1 Multivariate Gaussian Distributions

A c04-math-215-dimensional Gaussian distribution can be parametrized with the expectation vector c04-math-216 and the c04-math-217 covariance matrix c04-math-218. When random vector c04-math-219 follows the Gaussian distribution with parameters c04-math-220 and c04-math-221, then we write c04-math-222 or c04-math-223. We say that a Gaussian distribution is the standard Gaussian distribution when c04-math-224 and c04-math-225. The density function of the Gaussian distribution is

where c04-math-227 and c04-math-228 is the determinant of c04-math-229. The characteristic function of the Gaussian distribution is

where c04-math-231.

A linear transformation of a Gaussian random vector follows a Gaussian distribution: When c04-math-232, c04-math-233 is c04-math-234 matrix, and c04-math-235 is a c04-math-236 vector, then

Also, when c04-math-238 and c04-math-239 are independent, then

equation

Both of these facts can be proved using the characteristic function.3

4.3.2 Multivariate Student Distributions

A c04-math-243-dimensional Student distribution (c04-math-244-distribution) is parametrized with degrees of freedom c04-math-245, the expectation vector c04-math-246, and the c04-math-247 positive definite symmetric matrix c04-math-248. When random vector c04-math-249 follows the c04-math-250-distribution with parameters c04-math-251, c04-math-252, and c04-math-253, then we write c04-math-254 or c04-math-255. The density function of the multivariate c04-math-256-distribution is

where

The multivariate Student distributed random vector has the covariance matrix

equation

when c04-math-259.

When c04-math-260, then the Student density approaches a Gaussian density. Indeed, c04-math-261, as c04-math-262, since c04-math-263, when c04-math-264. The Student density has tails c04-math-265, as c04-math-266.

Figure 4.6 compares multivariate Gaussian and Student densities. Panel (a) shows the Gaussian density with marginal standard deviations equal to one and correlation 0.5. Panel (b) shows the density of c04-math-267-distribution with degrees of freedom 2 and correlation 0.5. The density contours are in both cases ellipses but the Student density has heavier tails.

Graphical representation of Gaussian and Student densities.

Figure 4.6 Gaussian and Student densities. (a) Contour plot of the Gaussian density with marginal standard deviations equal to one and correlation 0.5; (b) Student density with degrees of freedom 2 and correlation 0.5.

4.3.3 Normal Variance Mixture Distributions

Random vector c04-math-268 follows a Gaussian distribution with parameters c04-math-269 and c04-math-270 when c04-math-271 for a c04-math-272 matrix c04-math-273 and

equation

where c04-math-274 follows the standard Gaussian distribution. This leads to the definition of a normal variance mixture distribution. We say that c04-math-275 follows a normal variance mixture distribution when

equation

where c04-math-276 follows the standard Gaussian distribution, and c04-math-277 is a random variable independent of c04-math-278. It holds that

equation

and

equation

where c04-math-279. When random vector c04-math-280 follows the normal variance mixture distribution with parameters c04-math-281, c04-math-282, and c04-math-283, where c04-math-284 is the distribution function on c04-math-285, then we write c04-math-286.

The density function can be calculated as

where c04-math-288 is the density of c04-math-289, c04-math-290 is the density of c04-math-291, c04-math-292 is the density of c04-math-293 conditional on c04-math-294, and c04-math-295 is defined by

The characteristic function is obtained, using (4.16), as

equation

where c04-math-297.

The family of normal variance mixtures c04-math-298 is closed under linear transformations: When c04-math-299, c04-math-300 is c04-math-301 matrix, and c04-math-302 is a c04-math-303 vector, then

4.22 equation

This can be seen using the characteristic function, similarly as in (4.17).

Let c04-math-305 be such random variable that c04-math-306 follows the c04-math-307-distribution with degrees of freedom c04-math-308. Then the normal variance mixture distribution is the multivariate c04-math-309-distribution c04-math-310, where c04-math-311, as defined in Section 4.3.2.

4.3.4 Elliptical Distributions

The density function of an elliptical distribution has the form

where c04-math-313 is called the density generator, c04-math-314 is a symmetric positive definite c04-math-315 matrix, and c04-math-316. Since c04-math-317 is positive definite, it has inverse c04-math-318 that is positive definite, which means that for all c04-math-319, c04-math-320. Thus, c04-math-321 needs to be defined only on the nonnegative real axis. Let c04-math-322 be such that c04-math-323. Then c04-math-324 is a density generator when c04-math-325 is chosen by

where c04-math-327. We give examples of density generators.

  1. 1. From (4.15) we see that the Gaussian distributions are elliptical and the Gaussian density generator is
    4.25 equation
  2. where c04-math-329.
  3. 2. From (4.20) we see that the normal variance mixture distributions are elliptical and the normal variance mixture density generator is given in (4.21).
  4. 3. From (4.18) we see that the c04-math-330-distributions are elliptical and the Student density generator is
    4.26 equation
  5. where c04-math-332 is the degrees of freedom, and c04-math-333 is defined in (4.19). The Student density generator has tails c04-math-334, as c04-math-335, and thus the density function is integrable when c04-math-336, according to (4.24).

Let c04-math-337, where c04-math-338 is a c04-math-339 matrix and let

equation

where c04-math-340 follows a spherical distribution with density c04-math-341. Then c04-math-342 follows an elliptical distribution with density (4.23). When random vector c04-math-343 follows the elliptical distribution with parameters c04-math-344, c04-math-345, and c04-math-346, where c04-math-347 is the distribution function on c04-math-348, then we write c04-math-349. The family of elliptical distributions is closed under linear transformations: When c04-math-350, c04-math-351 is a c04-math-352 matrix, and c04-math-353 is a c04-math-354 vector, then

4.27 equation

This can be seen using the characteristic function, similarly as in (4.17).

4.4 Copulas

We can decompose a multivariate distribution into a part that describes the dependence and into parts that describe the marginal distributions. This decomposition helps to estimate and analyze multivariate distributions, and it helps to construct new parametric and semiparametric models for multivariate distributions.

The distribution function c04-math-356 of random vector c04-math-357 is defined by

equation

where c04-math-358. The distribution functions c04-math-359, …, c04-math-360 of the marginal distributions are defined by

equation

where c04-math-361.

A copula is a distribution function c04-math-362 whose marginal distributions are the uniform distributions on c04-math-363. Often it is convenient to define a copula as a distribution function c04-math-364 whose marginal distributions are the standard normal distributions. Any distribution function c04-math-365 may be written as

equation

where c04-math-366, c04-math-367, are the marginal distribution functions and c04-math-368 is a copula. In this sense we can decompose a distribution into a part that describes only the dependence and into parts that describe the marginal distributions.

We show in (4.29) how to construct a copula of a multivariate distribution and in (4.31) how to construct a multivariate distribution function from a copula and marginal distribution functions. We restrict ourselves to the case of continuous marginal distribution functions. These constructions were given in Sklar (1959), who considered also the case of noncontinuous margins. For notational convenience we give the formulas for the case c04-math-369. The generalization to the cases c04-math-370 is straightforward.

4.4.1 Standard Copulas

We use the term “standard copula,” when the marginals of the copula have the uniform distributions on c04-math-371. Otherwise, we use the term “nonstandard copula.”

4.4.1.1 Finding the Copula of a Multivariate Distribution

Let c04-math-372 and c04-math-373 be real valued random variables with distribution functions c04-math-374 and c04-math-375. Let c04-math-376 be the distribution function of c04-math-377, and assume that c04-math-378 and c04-math-379 are continuous. Then,

where

and c04-math-382. We call c04-math-383 in (4.29) the copula of the joint distribution of c04-math-384 and c04-math-385. Copula c04-math-386 is the distribution function of the vector c04-math-387, c04-math-388, and c04-math-389 and c04-math-390 are uniformly distributed random variables.4

The copula density is

because c04-math-395, where c04-math-396 is the density of c04-math-397 and c04-math-398 and c04-math-399 are the densities of c04-math-400 and c04-math-401, respectively.

4.4.1.2 Constructing a Multivariate Distribution from a Copula

Let c04-math-402 be a copula, that is, it is a distribution function whose marginal distributions are uniform on c04-math-403. Let c04-math-404 and c04-math-405 be univariate distribution functions of continuous distributions. Define c04-math-406 by

Then c04-math-408 is a distribution function whose marginal distributions are given by distribution functions c04-math-409 and c04-math-410. Indeed, Let c04-math-411 be a random vector with distribution function c04-math-412. Then,

equation

and c04-math-413 for c04-math-414, because c04-math-415.5

4.4.2 Nonstandard Copulas

Typically a copula is defined as a distribution function with uniform marginals. However, we can define a copula so that the marginal distributions of the copula is some other continuous distribution than the uniform distribution on c04-math-418. It turns out that we get simpler copulas by choosing the marginal distributions of a copula to be the standard Gaussian distribution.

As in (4.28) we can write distribution function c04-math-419 as

equation

where c04-math-420 is the distribution function of the standard Gaussian distribution and

c04-math-422. Now c04-math-423 is a distribution function whose marginals are standard Gaussians, because c04-math-424 follow the uniform distribution on c04-math-425 and thus c04-math-426 follow the standard Gaussian distribution.

Conversely, given a distribution function c04-math-427 with the standard Gaussian marginals, and univariate distribution functions c04-math-428 and c04-math-429, we can define a distribution function c04-math-430 with marginals c04-math-431 and c04-math-432 by the formula

equation

The copula density is

4.33 equation

where c04-math-434 is the density of c04-math-435, c04-math-436 and c04-math-437 are the densities of c04-math-438 and c04-math-439, respectively, and c04-math-440 is the density of the standard Gaussian distribution.

4.4.3 Sampling from a Copula

We do not have observations directly from the distribution of the copula but we show how to transform the sample so that we get a pseudo sample from the copula. Scatter plots of the pseudo sample can be used to visualize the copula. The pseudo sample can also be used in the maximum likelihood estimation of the copula. Before defining the pseudo sample, we show how to generate random variables from a copula.

4.4.3.1 Simulation from a Copula

Let random vector c04-math-441 have a continuous distribution. Let c04-math-442, c04-math-443, be the distribution functions of the margins of c04-math-444. Now

is a random vector whose marginal distributions are uniform on c04-math-446. The distribution function of this random vector is the copula of the distribution of c04-math-447. Thus, if we can generate a random vector c04-math-448 with distribution c04-math-449, we can use the rule (4.34) to generate a random vector c04-math-450 whose distribution is the copula of c04-math-451. Often the copula with uniform marginals is inconvenient due to boundary effects. We may get statistically more tractable distribution by defining

equation

where c04-math-452 is the distribution function of the standard Gaussian distribution. The components of c04-math-453 have the standard Gaussian distribution.

4.4.3.2 Transforming the Sample

Let us have data c04-math-454 and denote c04-math-455. Let the rank of observation c04-math-456, c04-math-457, c04-math-458, be

equation

That is, c04-math-459 is the number of observations of the c04-math-460th variable smaller or equal to c04-math-461. We normalize the ranks to get observations on c04-math-462:

for c04-math-464. Now c04-math-465 for c04-math-466. In this sense we can consider the observations as a sample from a distribution whose margins are uniform distributions on c04-math-467. Often the standard Gaussian distribution is more convenient and we define

for c04-math-469.

4.4.3.3 Transforming the Sample by Estimating the Margins

We can transform the data c04-math-470 using estimates of the marginal distributions. Let c04-math-471 and c04-math-472 be estimates of the marginal distribution functions c04-math-473 and c04-math-474, respectively. We define the pseudo sample as

where c04-math-476. The estimates c04-math-477 and c04-math-478 can be parametric estimates. For example, assuming that the c04-math-479th marginal distribution is a normal distribution, we would take c04-math-480, where c04-math-481 is the distribution function of the standard normal distribution, c04-math-482 is the sample mean of c04-math-483, and c04-math-484 is the sample standard deviation. If c04-math-485 are the empirical distribution functions

equation

then we get almost the same transformation as (4.35), but c04-math-486 is now replaced by c04-math-487:

equation

4.4.3.4 Empirical Copula

The empirical distribution function c04-math-488 is calculated using a sample c04-math-489 of identically distributed observations, and we define

equation

where we denote c04-math-490.

The empirical copula is defined similarly as the empirical distribution function. Now,

where c04-math-492 are defined in (4.37).

4.4.3.5 Maximum Likelihood Estimation

Pseudo samples are needed in maximum likelihood estimation. In maximum likelihood estimation we assume that the copula has a parametric form. For example, the copula of the normal distribution, given in (4.39), is parametrized with the correlation matrix, which contains c04-math-493 parameters. Let c04-math-494 be the copula with parameter c04-math-495. The corresponding copula density is c04-math-496, as given in (4.30). Let us have independent and identically distributed observations c04-math-497 from the distribution of c04-math-498. We calculate the pseudo sample c04-math-499 using (4.35) or (4.37). A maximum likelihood estimate is a value c04-math-500 maximizing

equation

over c04-math-501.

4.4.4 Examples of Copulas

We give examples of parametric families of copulas. The examples include the Gaussian copulas and the Student copulas.

4.4.4.1 The Gaussian Copulas

Let c04-math-502 be a c04-math-503-dimensional Gaussian random vector, as defined in Section 4.3.1. The copula of c04-math-504 is

where c04-math-506 is the distribution function of c04-math-507 distribution, c04-math-508 is the correlation matrix of c04-math-509, and c04-math-510 is the distribution function of c04-math-511 distribution.

Indeed, let us denote c04-math-512, where c04-math-513 is the standard deviation of c04-math-514. Then c04-math-515 follows the distribution c04-math-516.6 Let c04-math-519 be the distribution function of c04-math-520. Then, using the notation c04-math-521,

equation

where c04-math-522. Also,7

equation

for c04-math-526. Thus,

equation

Thus, (4.29) leads to (4.39).

Figure 4.7 shows perspective plots of the densities of the Gaussian copula. The margins are uniform on c04-math-527. The correlation parameter is in panel (a) c04-math-528 and in panel (b) c04-math-529. Figure 4.7 shows that the perspective plots of the copula densities are not intuitive, because the probability mass is concentrated near the corners of the square c04-math-530, especially when the correlation is high. From now on we will show only pictures of copulas with standard Gaussian margins, as defined in (4.32), because these give more intuitive representation of the copula.

Graphical representation of Gaussian copulas.

Figure 4.7 Gaussian copulas. Perspective plots of the densities of the Gaussian copula with the correlation (a) c04-math-531 and (b) c04-math-532. The margins are uniform on c04-math-533.

4.4.4.2 The Student Copulas

Let c04-math-534 be a c04-math-535-dimensional c04-math-536-distributed random vector, as defined in Section 4.3.2. The copula of c04-math-537 is

equation

where c04-math-538 is the distribution function of c04-math-539 distribution, c04-math-540 is the correlation matrix of c04-math-541, and c04-math-542 is the distribution function of the univariate c04-math-543-distribution with degrees of freedom c04-math-544.

Indeed, the claim follows similarly as in the Gaussian case for

equation

where c04-math-545 and c04-math-546 is the square root of the c04-math-547th element in the diagonal of c04-math-548. The matrix c04-math-549 is indeed the correlation matrix, since

equation

where c04-math-550, c04-math-551.

Figure 4.8 shows contour plots of the densities of the Student copula when the margins are standard Gaussian. The correlation is c04-math-552. The degrees of freedom are in panel (a) two and in panel (b) four. The Gaussian and Student copulas are similar in the main part of the distribution but they differ in the tails (in the corners of the unit square). The Gaussian copula has independent extremes (asymptotic tail independence) but the Student copula generates concomitant extremes with a nonzero probability. The probability of concomitant extremes is larger when the degrees of freedom is smaller and the correlation coefficient is larger.

Graphical representation of Student copula with standard Gaussian margins.

Figure 4.8 Student copula with standard Gaussian margins. Contour plots of the densities of the Student copula with degrees of freedom (a) 2 and (b) 4. The correlation is c04-math-553.

4.4.4.3 Other Copulas

We define Gumbel and Clayton copulas. These are examples of Archimedean copulas. Gaussian and Student copulas are examples of elliptical copulas.

The Gumbel–Hougaard Copulas

The Gumbel–Hougaard or the Gumbel family of copulas is defined by

equation

where c04-math-554 is the parameter. When c04-math-555, then c04-math-556 and when c04-math-557, then c04-math-558.

Figure 4.9 shows contour plots of the densities with the Gumbel copula when c04-math-559, c04-math-560, and c04-math-561. The marginals are standard Gaussian.

Graphical representation of Gumbel copula.

Figure 4.9 Gumbel copula. Contour plots of the densities of the Gumbel copula with c04-math-562, c04-math-563, and c04-math-564. The marginals are standard Gaussian.

The Clayton Copulas

Clayton's family of copulas is defined by

4.40 equation

where c04-math-566. When c04-math-567, we define c04-math-568. When the parameter c04-math-569 increases, then the dependence between coordinate variables increases. The dependence is larger in the negative orthant. The Clayton family was discussed in Clayton (1978).

Figure 4.10 shows contour plots of the densities with the Clayton copula when c04-math-570, c04-math-571, and c04-math-572. The marginals are standard Gaussian.

Graphical representation of Clayton copula.

Figure 4.10 Clayton copula. Contour plots of the densities of the Clayton copula with c04-math-573, c04-math-574, and c04-math-575. The marginals are standard Gaussian.

Elliptical Copulas

Elliptical distributions are defined in Section 4.3.4. An elliptical copula is obtained from an elliptical distribution c04-math-576 by the construction (4.29). The Gaussian copula and the Student copula are elliptical copulas.

Archimedean Copulas

Archimedean copulas have the form

equation

where c04-math-577 is strictly decreasing, continuous, convex, and c04-math-578. For c04-math-579 to be a copula, we need that c04-math-580, c04-math-581. The function c04-math-582 is called the generator. The product copula, Gumbel copula, Clayton copula, and Frank copula are all Archimedean copulas and we have:

  • product copula: c04-math-583,
  • Gumbel copula: c04-math-584,
  • Clayton copula: c04-math-585,
  • Frank copula: c04-math-586.

The density of an Archimedean copula is

equation

where c04-math-587 is the second derivative of c04-math-588:

equation

because c04-math-589. We have:

  • Gumbel copula: c04-math-590, c04-math-591, c04-math-592,
  • Clayton copula: c04-math-593, c04-math-594, c04-math-595,
  • Frank copula: c04-math-596, c04-math-597, c04-math-598.

4.4.4.4 Empirical Results

Research on testing the hypothesis of Gaussian copula and other copulas on financial data has been done in Malevergne and Sornette (2003), and summarized in Malevergne and Sornette (2005). They found that the Student copula is a good model for foreign exchange rates but for the stock returns the situation is not clear.

Patton (2005) takes into account the volatility clustering phenomenon. He filters the marginal data by a GARCH process and shows that the conditional dependence structure between Japanese Yen and Euro is better described by Clayton's copula than by the Gaussian copula. Note, however, that the copula of the residuals is not the same as the copula of the raw returns and many filters can be used (ARCH, GARCH, and multifractal random walk). Using the multivariate multifractal filter of Muzy et al. (2001) leads to a nearly Gaussian copula.

Breymann et al. (2003) show that the daily returns of German Mark/Japanese Yen are best described by a Student copula with about six degrees of freedom, when the alternatives are the Gaussian, Clayton's, Gumbel's, and Frank's copulas. The Student copula seems to provide an even better description for returns at smaller time scales, when the time scale is larger than 2 h. The best degrees of freedom is four for the 2-h scale.

Mashal and Zeevi (2002) claim that the dependence between stocks is better described by a Student copula with 11–12 degrees of freedom than by a Gaussian copula.

equation
equation

where c04-math-241 is the characteristic function of c04-math-242.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset