Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 4
Multivariate Data Analysis

Multivariate data analysis studies simultaneously several time series, but the time series properties are ignored, and thus the analysis can be called cross-sectional.

The copula is an important concept of multivariate data analysis. Copula models are a convenient way to separate multivariate analysis to the purely univariate and to the purely multivariate components. We compose a multivariate distribution into the part that describes the dependence and into the parts that describe the marginal distributions. The marginal distributions can be estimated efficiently using nonparametric methods, but it can be useful to apply parametric models to estimate dependence, for a high-dimensional distribution. Combining nonparametric estimators of marginals and a parametric estimator of the copula leads to a semiparametric estimator of the distribution.

Multivariate data can be described using such statistics as linear correlation, Spearman's rank correlation, and Kendall's rank correlation. Linear correlation is used in the Markowitz portfolio selection. Rank correlations are more natural concepts to describe dependence, because they are determined by the copula, whereas linear correlation is affected by marginal distributions. Coefficients of tail dependence can capture whether the dependence of asset returns is larger during the periods of high volatility.

Multivariate graphical tools include scatter plots, which can be combined with multidimensional scaling and other dimension reduction methods.

Section 4.1 studies measures of dependence. Section 4.2 considers multivariate graphical tools. Section 4.3 defines multivariate parametric distributions such as multivariate normal, multivariate Student, and elliptical distributions. Section 4.4 defines copulas and models for copulas.

4.1 Measures of Dependence

Random vectors $c04-math-001$ are said to be independent if

for all measurable $c04-math-002$ . This is equivalent to

for all measurable $c04-math-003$ , so knowledge of $c04-math-004$ does not affect the probability evaluations of $c04-math-005$ . The complete dependence between random vectors $c04-math-006$ and $c04-math-007$ occurs when there is a bijection $c04-math-008$ so that

4.1

holds almost everywhere. When the random vectors are not independent and not completely dependent we may try to quantify the dependency between two random vectors. We may say that two random vectors have the same dependency when they have the same copula, and the copula is defined in Section 4.4.

Correlation coefficients are defined between two real valued random variables. We define three correlation coefficients: linear correlation $c04-math-010$ , Spearman's rank correlation $c04-math-011$ , and Kendall's rank correlation $c04-math-012$ . All of these correlation coefficients satisfy

where $c04-math-013$ and $c04-math-014$ are real valued random variables. Furthermore, if $c04-math-015$ and $c04-math-016$ are independent, then $c04-math-017$ for any of the correlation coefficients. Converse does not hold, so that correlation zero does not imply independence.

Complete dependence was defined by (4.1). Both for the Spearman's rank correlation and for the Kendall's rank correlation we have that

4.2

where $c04-math-019$ or $c04-math-020$ . In the case of real valued random variables the complete dependency can be divided into comonotonicity and countermonotonicity. Real-valued random variables $c04-math-021$ and $c04-math-022$ are said to be comonotonic if there is a strictly increasing function $c04-math-023$ so that $c04-math-024$ almost everywhere. Real-valued random variables $c04-math-025$ and $c04-math-026$ are said to be countermonotonic if there is a strictly decreasing function $c04-math-027$ so that $c04-math-028$ almost everywhere. Both for the Spearman's rank correlation and for the Kendall's rank correlation we have that $c04-math-029$ if and only if $c04-math-030$ and $c04-math-031$ are comonotonic, and $c04-math-032$ if and only if $c04-math-033$ and $c04-math-034$ are countermonotonic, where $c04-math-035$ or $c04-math-036$ .

The linear correlation coefficient $c04-math-037$ does not satisfy (4.2). However, we have that

4.3

If $c04-math-039$ , then $c04-math-040$ . If $c04-math-041$ , then $c04-math-042$ .

4.1.1 Correlation Coefficients

We define linear correlation $c04-math-043$ , Spearman's rank correlation $c04-math-044$ , and Kendall's rank correlation $c04-math-045$ .

4.1.1.1 Linear Correlation

The linear correlation coefficient between real valued random variables $c04-math-046$ and $c04-math-047$ is defined as

4.4

where the covariance is

and the standard deviation is $c04-math-049$ .

We noted in (4.3) that the linear correlation coefficient characterizes linear dependency. However, (4.2) does not hold for the linear correlation coefficient. Even when $c04-math-050$ and $c04-math-051$ are completely dependent, it can happen that $c04-math-052$ . For example, let $c04-math-053$ , $c04-math-054$ , and $c04-math-055$ , where $c04-math-056$ . Then,

and $c04-math-057$ only for $c04-math-058$ , otherwise $c04-math-059$ ; the example is from McNeil et al. (2005, p. 205).

Let us assume that $c04-math-060$ and $c04-math-061$ have continuous distributions and let us denote with $c04-math-062$ the distribution function of $c04-math-063$ and with $c04-math-064$ and $c04-math-065$ the marginal distribution functions. Then,

4.5

where $c04-math-067$ , $c04-math-068$ , is the copula of the distribution of $c04-math-069$ , as defined in (4.29). Equation (4.5) is called Höffding's formula, and its proof can be found in McNeil et al. (2005, p. 203). Thus, the linear correlation is not solely a function of the copula, it depends also on the marginal distributions $c04-math-070$ and $c04-math-071$ .

The linear correlation coefficient can be estimated with the sample correlation. Let $c04-math-072$ be a sample from the distribution of $c04-math-073$ and $c04-math-074$ be a sample from the distribution of $c04-math-075$ . The sample correlation coefficient is defined as

4.6

where $c04-math-077$ and $c04-math-078$ . An alternative estimator is defined in (4.10).

4.1.1.2 Spearman's Rank Correlation

Spearman's rank correlation (Spearman's rho) is defined by

where $c04-math-079$ is the distribution function of $c04-math-080$ , $c04-math-081$ . If $c04-math-082$ and $c04-math-083$ have continuous distributions, then

where $c04-math-084$ , $c04-math-085$ , is the copula as defined in Section 4.4 (see McNeil et al., 2005, p. 207).¹ Thus, Spearman's correlation coefficient is defined solely in terms of the copula.

We have still another way of writing Spearman's rank correlation. Let $c04-math-087$ , $c04-math-088$ , and $c04-math-089$ , let $c04-math-090$ have the same distribution, and let $c04-math-091$ be independent. Then,

The sample Spearman's rank correlation can be defined as the sample linear correlation coefficient between the ranks. Let $c04-math-092$ be a sample from the distribution of $c04-math-093$ and $c04-math-094$ be a sample from the distribution of $c04-math-095$ . The rank of observation $c04-math-096$ , $c04-math-097$ , $c04-math-098$ , is

That is, $c04-math-099$ is the number of observations of the $c04-math-100$ th variable smaller or equal to $c04-math-101$ .² Let us use the shorthand notation

so that $c04-math-104$ , $c04-math-105$ . Then the sample Spearman's rank correlation can be written as

where $c04-math-106$ is the sample linear correlation coefficient, defined in (4.6). Since $c04-math-107$ and $c04-math-108$ , we can write

4.1.1.3 Kendall's Rank Correlation

Let $c04-math-109$ and $c04-math-110$ , let $c04-math-111$ and $c04-math-112$ have the same distribution, and let $c04-math-113$ and $c04-math-114$ be independent. Kendall's rank correlation (Kenadall's tau) is defined by

4.7

When $c04-math-116$ and $c04-math-117$ have continuous distributions, we have

and we can write

where $c04-math-118$ , $c04-math-119$ , is the copula as defined in Section 4.4 (see McNeil et al., 2005, p. 207).

Let us define an estimator for $c04-math-120$ . Let $c04-math-121$ be a sample from the distribution of $c04-math-122$ and $c04-math-123$ be a sample from the distribution of $c04-math-124$ . Kendall's rank correlation can be written as

where $c04-math-125$ , if $c04-math-126$ and $c04-math-127$ , if $c04-math-128$ . This leads to the sample version

4.8

The computation takes longer than for the sample linear correlation and for the sample Spearman's correlation.

4.1.1.4 Relations between the Correlation Coefficients

We have a relation between the linear correlation and the Kendall's rank correlation for the elliptical distributions. Let $c04-math-130$ be a bivariate random vector. For all elliptical distributions with $c04-math-131$ ,

4.9

where $c04-math-133$ is the Kendall's rank correlation, as defined in (4.7), and $c04-math-134$ is the linear correlation, as defined in (4.4) (see McNeil et al., 2005, p. 217). This relationship can be applied to get an alternative and a more robust estimator for the estimator (4.6) of linear correlation. Define the estimator as

4.10

where $c04-math-136$ is the estimator (4.8).

For the distributions with a Gaussian copula, we also have a relation between the Spearman's rank correlation and the linear correlation. Let $c04-math-137$ be a distribution with a Gaussian copula and continuous margins. Then,

and (4.9) holds also (see McNeil et al., 2005, p. 215).

Figure 4.1 studies linear correlation and Spearman's rank correlation for S&P 500 and Nasdaq-100 daily data, described in Section 2.4.2. Panel (a) shows a moving average estimate of linear correlation (blue) and Spearman's rank correlation (yellow). We use the one-sided moving average defined as

where $c04-math-138$ are the S&P 500 centered returns and $c04-math-139$ are the Nasdaq-100 centered returns. The weights $c04-math-140$ are one for the last 500 observations, and zero for the other observations. See (6.5) for a more general moving average. The moving average estimator $c04-math-141$ is the Spearman's rho computed from the 500 previous observations. Panel (b) shows the correlation coefficients together with the moving average estimates of the standard deviation of S&P 500 returns (solid black line) and Nasdaq-100 returns (dashed black line). All time series are scaled to take values in the interval $c04-math-142$ . We see that there is some tendency that the inter-stock correlations increase in volatile periods.

Graphical representation of Linear and Spearman's correlation, together with volatility. — **Figure 4.1** *Linear and Spearman's correlation, together with volatility*. (a) Time series of moving average estimates of correlation between S&P 500 and Nasdaq-100 returns, with linear correlation (blue) and Spearman's rho (yellow); (b) we have added moving average estimates of the standard deviation of S&P 500 (black solid) and Nasdaq-100 (black dashed).

4.1.2 Coefficients of Tail Dependence

The coefficient of upper tail dependence is defined for random variables $c04-math-143$ and $c04-math-144$ with distribution functions $c04-math-145$ and $c04-math-146$ as

where $c04-math-147$ and $c04-math-148$ are the generalized inverses. Similarly, the coefficient of lower tail dependence is

See McNeil et al. (2005, p. 209).

4.1.2.1 Tail Coefficients in Terms of the Copula

The coefficients of upper and lower tail dependence can be defined in terms of the copula. Let $c04-math-149$ and $c04-math-150$ be continuous. We have that

Also,

Thus, the coefficient of upper tail dependence is

4.11

We have that

Also,

Thus, the coefficient of lower tail dependence for continuous $c04-math-152$ and $c04-math-153$ is equal to

4.12

4.1.2.2 Estimation of Tail Coefficients

Equations (4.11) and (4.12) suggest estimators for the coefficients of tail dependence. We can estimate the upper tail coefficient nonparametrically, using

where $c04-math-155$ is the empirical copula, defined in (4.38), and $c04-math-156$ is close to 1. We can take, for example, $c04-math-157$ , where $c04-math-158$ . The coefficient of lower tail dependence can be estimated by

where $c04-math-159$ is close to zero. We can take, for example, $c04-math-160$ , where $c04-math-161$ . These estimators have been studied in Dobric and Schmid (2005), Frahm et al. (2005), and Schmidt and Stadtmüller (2006).

Figure 4.2 studies tail coefficients for S&P 500 and Nasdaq-100 daily data, described in Section 2.4.2. Panel (a) shows the tail coefficients as a function of $c04-math-162$ for lower tail coefficients (red) and as a function of $c04-math-163$ for upper tail coefficients (blue). Panel (b) shows a moving average estimate of the lower tail coefficients. The tail coefficient is estimated using the window of the latest 1000 observations, for $c04-math-164$ .

Graphical representation of Tail coefficients for S&P 500 and Nasdaq-100 returns. — **Figure 4.2** *Tail coefficients for S&P 500 and Nasdaq-100 returns*. (a) Tail coefficients as a function of $c04-math-165$ for lower tail coefficients (red) and as a function of $c04-math-166$ for upper tail coefficients (blue); (b) time series of moving average estimates of lower tail coefficients.

4.1.2.3 Tail Coefficients for Parametric Families

The coefficients of lower and upper tail dependence for the Gaussian distributions are zero. The coefficients of lower and upper tail dependence for the Student distributions with degrees of freedom $c04-math-167$ and correlation coefficient $c04-math-168$ are

where $c04-math-169$ is the distribution function of the univariate $c04-math-170$ -distribution with $c04-math-171$ degrees of freedom, and we assume that $c04-math-172$ ; see McNeil et al. (2005, p. 211).

4.2 Multivariate Graphical Tools

First, we describe scatter plots and smooth scatter plots. Second, we describe visualization of correlation matrices with multidimensional scaling.

4.2.1 Scatter Plots

A two-dimensional scatter plot is a plot of points $c04-math-173$ .

Figure 4.3 shows scatter plots of daily net returns of S&P 500 and Nasdaq-100. The data is described in Section 2.4.2. Panel (a) shows the original data and panel (b) shows the corresponding scatter plot after copula preserving transform with standard normal marginals, as defined in (4.36).

Graphical representation of Scatter plots of the net returns of S&P 500 and Nasdaq-100. — **Figure 4.3** *Scatter plots*. Scatter plots of the net returns of S&P 500 and Nasdaq-100. (a) Original data; (b) copula transformed data with marginals being standard normal.

When the sample size is large, then the scatter plot is mostly black, so the visuality of density of the points in different regions is obscured. In this case it is possible to use histograms to obtain a smooth scatter plot. A multivariate histogram is defined in (3.42). First we take square roots $c04-math-174$ of the bin counts $c04-math-175$ and then we define $c04-math-176$ . Now $c04-math-177$ . Values $c04-math-178$ close to one are shown in light gray, and values $c04-math-179$ close to zero are shown in dark gray. See Carr et al. (1987) for a study of histogram plotting.

Figure 4.4 shows smooth scatter plots of daily net returns of S&P 500 and Nasdaq-100. The data is described in Section 2.4.2. Panel (a) shows a smooth scatter plot of the original data and panel (b) shows the corresponding scatter plot after copula preserving transform when the marginals are standard Gaussian.

Graphical representation of Smooth scatter plots: Scatter plots of the net returns of S&P 500 and Nasdaq-100. — **Figure 4.4** *Smooth scatter plots*. Scatter plots of the net returns of S&P 500 and Nasdaq-100. (a) Original data; (b) copula transformed data with marginals being standard normal.

4.2.2 Correlation Matrix: Multidimensional Scaling

First, we define the correlation matrix. Second, we show how the correlation matrix may be visualized using multidimensional scaling.

4.2.2.1 Correlation Matrix

The correlation matrix is the $c04-math-180$ matrix whose elements are the linear correlation coefficients $c04-math-181$ for $c04-math-182$ . The sample correlation matrix is the matrix whose elements are the sample linear correlation coefficients.

The correlation matrix can be defined using matrix notation. The covariance matrix of random vector $c04-math-183$ is defined by

4.13

The covariance matrix is the $c04-math-185$ matrix whose elements are $c04-math-186$ for $c04-math-187$ , where we denote $c04-math-188$ . Let

be the diagonal matrix whose diagonal is the vector of the inverses of the standard deviations. Then the correlation matrix is

The covariance matrix can be estimated by the sample covariance matrix

4.14

where $c04-math-190$ are identically distributed observations whose distribution is the same as the distribution of $c04-math-191$ , and $c04-math-192$ is the arithmetic mean.

4.2.2.2 Multidimensional Scaling

Multidimensional scaling makes a nonlinear mapping of data $c04-math-193$ to $c04-math-194$ , or to any space $c04-math-195$ with $c04-math-196$ . We can define the mapping $c04-math-197$ of multidimensional scaling in two steps:

1. Compute the pairwise distances $c04-math-198$ , $c04-math-199$ .
2. Find points $c04-math-200$ so that $c04-math-201$ for $c04-math-202$ .

In practice, we may not be able to find a mapping that preserves the distances exactly, but we find a mapping $c04-math-203$ so that the stress functional

is minimized. Sammon's mapping uses the stress functional

This stress functional emphasizes small distances. Numerical minimization is needed to solve the minimization problems.

Multidimensional scaling can be used to visualize correlations between time series. Let $c04-math-204$ be the time series of returns of company $c04-math-205$ , where $c04-math-206$ . When we normalize the time series of returns so that the vector of returns has sample mean zero and sample variance one, then the Euclidean distance is equivalent to using the correlation distance. Indeed, let

where $c04-math-207$ and $c04-math-208$ . Now

where $c04-math-209$ is the sample linear correlation. Thus, we apply the multidimensional scaling for the norm

which is obtained by dividing the Euclidean norm by $c04-math-210$ . Since

we have that

Zero correlation gives $c04-math-211$ , positive correlations give $c04-math-212$ , and negative correlations give $c04-math-213$ .

Figure 4.5 studies correlations of the returns of the components of DAX 30. We have daily observations of the components of DAX 30 starting at January 02, 2003 and ending at May 20, 2014, which makes 2892 observations. Panel (a) shows the correlation matrix as an image. We have used R-function “image.” Panel (b) shows the correlations with multidimensional scaling. We have used R-function “cmdscale.” The image of the correlation matrix is not as helpful as the multidimensional scaling. For example, we see that the return time series of Volkswagen with the ticker symbol “VOW” is an outlier. The returns of Fresenius and Fresenius Medical Care (“FRE” and “FME”) are highly correlated.

Graphical representation of Correlations of DAX 30. — **Figure 4.5** *Correlations of DAX 30*. (a) An image of the correlation matrix for DAX 30; (b) correlations for DAX 30 with multidimensional scaling.

4.3 Multivariate Parametric Models

We give examples of multivariate parametric models. The examples include Gaussian and Student distributions ( $c04-math-214$ -distributions). More general families are normal variance mixture distributions and elliptical distributions.

4.3.1 Multivariate Gaussian Distributions

A $c04-math-215$ -dimensional Gaussian distribution can be parametrized with the expectation vector $c04-math-216$ and the $c04-math-217$ covariance matrix $c04-math-218$ . When random vector $c04-math-219$ follows the Gaussian distribution with parameters $c04-math-220$ and $c04-math-221$ , then we write $c04-math-222$ or $c04-math-223$ . We say that a Gaussian distribution is the standard Gaussian distribution when $c04-math-224$ and $c04-math-225$ . The density function of the Gaussian distribution is

4.15

where $c04-math-227$ and $c04-math-228$ is the determinant of $c04-math-229$ . The characteristic function of the Gaussian distribution is

4.16

where $c04-math-231$ .

A linear transformation of a Gaussian random vector follows a Gaussian distribution: When $c04-math-232$ , $c04-math-233$ is $c04-math-234$ matrix, and $c04-math-235$ is a $c04-math-236$ vector, then

4.17

Also, when $c04-math-238$ and $c04-math-239$ are independent, then

Both of these facts can be proved using the characteristic function.³

4.3.2 Multivariate Student Distributions

A $c04-math-243$ -dimensional Student distribution ( $c04-math-244$ -distribution) is parametrized with degrees of freedom $c04-math-245$ , the expectation vector $c04-math-246$ , and the $c04-math-247$ positive definite symmetric matrix $c04-math-248$ . When random vector $c04-math-249$ follows the $c04-math-250$ -distribution with parameters $c04-math-251$ , $c04-math-252$ , and $c04-math-253$ , then we write $c04-math-254$ or $c04-math-255$ . The density function of the multivariate $c04-math-256$ -distribution is

4.18

where

4.19

The multivariate Student distributed random vector has the covariance matrix

when $c04-math-259$ .

When $c04-math-260$ , then the Student density approaches a Gaussian density. Indeed, $c04-math-261$ , as $c04-math-262$ , since $c04-math-263$ , when $c04-math-264$ . The Student density has tails $c04-math-265$ , as $c04-math-266$ .

Figure 4.6 compares multivariate Gaussian and Student densities. Panel (a) shows the Gaussian density with marginal standard deviations equal to one and correlation 0.5. Panel (b) shows the density of $c04-math-267$ -distribution with degrees of freedom 2 and correlation 0.5. The density contours are in both cases ellipses but the Student density has heavier tails.

Graphical representation of Gaussian and Student densities. — **Figure 4.6** *Gaussian and Student densities*. (a) Contour plot of the Gaussian density with marginal standard deviations equal to one and correlation 0.5; (b) Student density with degrees of freedom 2 and correlation 0.5.

4.3.3 Normal Variance Mixture Distributions

Random vector $c04-math-268$ follows a Gaussian distribution with parameters $c04-math-269$ and $c04-math-270$ when $c04-math-271$ for a $c04-math-272$ matrix $c04-math-273$ and

where $c04-math-274$ follows the standard Gaussian distribution. This leads to the definition of a normal variance mixture distribution. We say that $c04-math-275$ follows a normal variance mixture distribution when

where $c04-math-276$ follows the standard Gaussian distribution, and $c04-math-277$ is a random variable independent of $c04-math-278$ . It holds that

and

where $c04-math-279$ . When random vector $c04-math-280$ follows the normal variance mixture distribution with parameters $c04-math-281$ , $c04-math-282$ , and $c04-math-283$ , where $c04-math-284$ is the distribution function on $c04-math-285$ , then we write $c04-math-286$ .

The density function can be calculated as

4.20

where $c04-math-288$ is the density of $c04-math-289$ , $c04-math-290$ is the density of $c04-math-291$ , $c04-math-292$ is the density of $c04-math-293$ conditional on $c04-math-294$ , and $c04-math-295$ is defined by

4.21

The characteristic function is obtained, using (4.16), as

where $c04-math-297$ .

The family of normal variance mixtures $c04-math-298$ is closed under linear transformations: When $c04-math-299$ , $c04-math-300$ is $c04-math-301$ matrix, and $c04-math-302$ is a $c04-math-303$ vector, then

4.22

This can be seen using the characteristic function, similarly as in (4.17).

Let $c04-math-305$ be such random variable that $c04-math-306$ follows the $c04-math-307$ -distribution with degrees of freedom $c04-math-308$ . Then the normal variance mixture distribution is the multivariate $c04-math-309$ -distribution $c04-math-310$ , where $c04-math-311$ , as defined in Section 4.3.2.

4.3.4 Elliptical Distributions

The density function of an elliptical distribution has the form

4.23

where $c04-math-313$ is called the density generator, $c04-math-314$ is a symmetric positive definite $c04-math-315$ matrix, and $c04-math-316$ . Since $c04-math-317$ is positive definite, it has inverse $c04-math-318$ that is positive definite, which means that for all $c04-math-319$ , $c04-math-320$ . Thus, $c04-math-321$ needs to be defined only on the nonnegative real axis. Let $c04-math-322$ be such that $c04-math-323$ . Then $c04-math-324$ is a density generator when $c04-math-325$ is chosen by

4.24

where $c04-math-327$ . We give examples of density generators.

1. From (4.15) we see that the Gaussian distributions are elliptical and the Gaussian density generator is
4.25
where $c04-math-329$ .
2. From (4.20) we see that the normal variance mixture distributions are elliptical and the normal variance mixture density generator is given in (4.21).
3. From (4.18) we see that the $c04-math-330$ -distributions are elliptical and the Student density generator is
4.26
where $c04-math-332$ is the degrees of freedom, and $c04-math-333$ is defined in (4.19). The Student density generator has tails $c04-math-334$ , as $c04-math-335$ , and thus the density function is integrable when $c04-math-336$ , according to (4.24).

Let $c04-math-337$ , where $c04-math-338$ is a $c04-math-339$ matrix and let

where $c04-math-340$ follows a spherical distribution with density $c04-math-341$ . Then $c04-math-342$ follows an elliptical distribution with density (4.23). When random vector $c04-math-343$ follows the elliptical distribution with parameters $c04-math-344$ , $c04-math-345$ , and $c04-math-346$ , where $c04-math-347$ is the distribution function on $c04-math-348$ , then we write $c04-math-349$ . The family of elliptical distributions is closed under linear transformations: When $c04-math-350$ , $c04-math-351$ is a $c04-math-352$ matrix, and $c04-math-353$ is a $c04-math-354$ vector, then

4.27

This can be seen using the characteristic function, similarly as in (4.17).

4.4 Copulas

We can decompose a multivariate distribution into a part that describes the dependence and into parts that describe the marginal distributions. This decomposition helps to estimate and analyze multivariate distributions, and it helps to construct new parametric and semiparametric models for multivariate distributions.

The distribution function $c04-math-356$ of random vector $c04-math-357$ is defined by

where $c04-math-358$ . The distribution functions $c04-math-359$ , …, $c04-math-360$ of the marginal distributions are defined by

where $c04-math-361$ .

A copula is a distribution function $c04-math-362$ whose marginal distributions are the uniform distributions on $c04-math-363$ . Often it is convenient to define a copula as a distribution function $c04-math-364$ whose marginal distributions are the standard normal distributions. Any distribution function $c04-math-365$ may be written as

where $c04-math-366$ , $c04-math-367$ , are the marginal distribution functions and $c04-math-368$ is a copula. In this sense we can decompose a distribution into a part that describes only the dependence and into parts that describe the marginal distributions.

We show in (4.29) how to construct a copula of a multivariate distribution and in (4.31) how to construct a multivariate distribution function from a copula and marginal distribution functions. We restrict ourselves to the case of continuous marginal distribution functions. These constructions were given in Sklar (1959), who considered also the case of noncontinuous margins. For notational convenience we give the formulas for the case $c04-math-369$ . The generalization to the cases $c04-math-370$ is straightforward.

4.4.1 Standard Copulas

We use the term “standard copula,” when the marginals of the copula have the uniform distributions on $c04-math-371$ . Otherwise, we use the term “nonstandard copula.”

4.4.1.1 Finding the Copula of a Multivariate Distribution

Let $c04-math-372$ and $c04-math-373$ be real valued random variables with distribution functions $c04-math-374$ and $c04-math-375$ . Let $c04-math-376$ be the distribution function of $c04-math-377$ , and assume that $c04-math-378$ and $c04-math-379$ are continuous. Then,

4.28

where

4.29

and $c04-math-382$ . We call $c04-math-383$ in (4.29) the copula of the joint distribution of $c04-math-384$ and $c04-math-385$ . Copula $c04-math-386$ is the distribution function of the vector $c04-math-387$ , $c04-math-388$ , and $c04-math-389$ and $c04-math-390$ are uniformly distributed random variables.⁴

The copula density is

4.30

because $c04-math-395$ , where $c04-math-396$ is the density of $c04-math-397$ and $c04-math-398$ and $c04-math-399$ are the densities of $c04-math-400$ and $c04-math-401$ , respectively.

4.4.1.2 Constructing a Multivariate Distribution from a Copula

Let $c04-math-402$ be a copula, that is, it is a distribution function whose marginal distributions are uniform on $c04-math-403$ . Let $c04-math-404$ and $c04-math-405$ be univariate distribution functions of continuous distributions. Define $c04-math-406$ by

4.31

Then $c04-math-408$ is a distribution function whose marginal distributions are given by distribution functions $c04-math-409$ and $c04-math-410$ . Indeed, Let $c04-math-411$ be a random vector with distribution function $c04-math-412$ . Then,

and $c04-math-413$ for $c04-math-414$ , because $c04-math-415$ .⁵

4.4.2 Nonstandard Copulas

Typically a copula is defined as a distribution function with uniform marginals. However, we can define a copula so that the marginal distributions of the copula is some other continuous distribution than the uniform distribution on $c04-math-418$ . It turns out that we get simpler copulas by choosing the marginal distributions of a copula to be the standard Gaussian distribution.

As in (4.28) we can write distribution function $c04-math-419$ as

where $c04-math-420$ is the distribution function of the standard Gaussian distribution and

4.32

$c04-math-422$ . Now $c04-math-423$ is a distribution function whose marginals are standard Gaussians, because $c04-math-424$ follow the uniform distribution on $c04-math-425$ and thus $c04-math-426$ follow the standard Gaussian distribution.

Conversely, given a distribution function $c04-math-427$ with the standard Gaussian marginals, and univariate distribution functions $c04-math-428$ and $c04-math-429$ , we can define a distribution function $c04-math-430$ with marginals $c04-math-431$ and $c04-math-432$ by the formula

The copula density is

4.33

where $c04-math-434$ is the density of $c04-math-435$ , $c04-math-436$ and $c04-math-437$ are the densities of $c04-math-438$ and $c04-math-439$ , respectively, and $c04-math-440$ is the density of the standard Gaussian distribution.

4.4.3 Sampling from a Copula

We do not have observations directly from the distribution of the copula but we show how to transform the sample so that we get a pseudo sample from the copula. Scatter plots of the pseudo sample can be used to visualize the copula. The pseudo sample can also be used in the maximum likelihood estimation of the copula. Before defining the pseudo sample, we show how to generate random variables from a copula.

4.4.3.1 Simulation from a Copula

Let random vector $c04-math-441$ have a continuous distribution. Let $c04-math-442$ , $c04-math-443$ , be the distribution functions of the margins of $c04-math-444$ . Now

4.34

is a random vector whose marginal distributions are uniform on $c04-math-446$ . The distribution function of this random vector is the copula of the distribution of $c04-math-447$ . Thus, if we can generate a random vector $c04-math-448$ with distribution $c04-math-449$ , we can use the rule (4.34) to generate a random vector $c04-math-450$ whose distribution is the copula of $c04-math-451$ . Often the copula with uniform marginals is inconvenient due to boundary effects. We may get statistically more tractable distribution by defining

where $c04-math-452$ is the distribution function of the standard Gaussian distribution. The components of $c04-math-453$ have the standard Gaussian distribution.

4.4.3.2 Transforming the Sample

Let us have data $c04-math-454$ and denote $c04-math-455$ . Let the rank of observation $c04-math-456$ , $c04-math-457$ , $c04-math-458$ , be

That is, $c04-math-459$ is the number of observations of the $c04-math-460$ th variable smaller or equal to $c04-math-461$ . We normalize the ranks to get observations on $c04-math-462$ :

4.35

for $c04-math-464$ . Now $c04-math-465$ for $c04-math-466$ . In this sense we can consider the observations as a sample from a distribution whose margins are uniform distributions on $c04-math-467$ . Often the standard Gaussian distribution is more convenient and we define

4.36

for $c04-math-469$ .

4.4.3.3 Transforming the Sample by Estimating the Margins

We can transform the data $c04-math-470$ using estimates of the marginal distributions. Let $c04-math-471$ and $c04-math-472$ be estimates of the marginal distribution functions $c04-math-473$ and $c04-math-474$ , respectively. We define the pseudo sample as

4.37

where $c04-math-476$ . The estimates $c04-math-477$ and $c04-math-478$ can be parametric estimates. For example, assuming that the $c04-math-479$ th marginal distribution is a normal distribution, we would take $c04-math-480$ , where $c04-math-481$ is the distribution function of the standard normal distribution, $c04-math-482$ is the sample mean of $c04-math-483$ , and $c04-math-484$ is the sample standard deviation. If $c04-math-485$ are the empirical distribution functions

then we get almost the same transformation as (4.35), but $c04-math-486$ is now replaced by $c04-math-487$ :

4.4.3.4 Empirical Copula

The empirical distribution function $c04-math-488$ is calculated using a sample $c04-math-489$ of identically distributed observations, and we define

where we denote $c04-math-490$ .

The empirical copula is defined similarly as the empirical distribution function. Now,

4.38

where $c04-math-492$ are defined in (4.37).

4.4.3.5 Maximum Likelihood Estimation

Pseudo samples are needed in maximum likelihood estimation. In maximum likelihood estimation we assume that the copula has a parametric form. For example, the copula of the normal distribution, given in (4.39), is parametrized with the correlation matrix, which contains $c04-math-493$ parameters. Let $c04-math-494$ be the copula with parameter $c04-math-495$ . The corresponding copula density is $c04-math-496$ , as given in (4.30). Let us have independent and identically distributed observations $c04-math-497$ from the distribution of $c04-math-498$ . We calculate the pseudo sample $c04-math-499$ using (4.35) or (4.37). A maximum likelihood estimate is a value $c04-math-500$ maximizing

over $c04-math-501$ .

4.4.4 Examples of Copulas

We give examples of parametric families of copulas. The examples include the Gaussian copulas and the Student copulas.

4.4.4.1 The Gaussian Copulas

Let $c04-math-502$ be a $c04-math-503$ -dimensional Gaussian random vector, as defined in Section 4.3.1. The copula of $c04-math-504$ is

4.39

where $c04-math-506$ is the distribution function of $c04-math-507$ distribution, $c04-math-508$ is the correlation matrix of $c04-math-509$ , and $c04-math-510$ is the distribution function of $c04-math-511$ distribution.

Indeed, let us denote $c04-math-512$ , where $c04-math-513$ is the standard deviation of $c04-math-514$ . Then $c04-math-515$ follows the distribution $c04-math-516$ .⁶ Let $c04-math-519$ be the distribution function of $c04-math-520$ . Then, using the notation $c04-math-521$ ,

where $c04-math-522$ . Also,⁷

for $c04-math-526$ . Thus,

Thus, (4.29) leads to (4.39).

Figure 4.7 shows perspective plots of the densities of the Gaussian copula. The margins are uniform on $c04-math-527$ . The correlation parameter is in panel (a) $c04-math-528$ and in panel (b) $c04-math-529$ . Figure 4.7 shows that the perspective plots of the copula densities are not intuitive, because the probability mass is concentrated near the corners of the square $c04-math-530$ , especially when the correlation is high. From now on we will show only pictures of copulas with standard Gaussian margins, as defined in (4.32), because these give more intuitive representation of the copula.

Graphical representation of Gaussian copulas. — **Figure 4.7** *Gaussian copulas*. Perspective plots of the densities of the Gaussian copula with the correlation (a) $c04-math-531$ and (b) $c04-math-532$ . The margins are uniform on $c04-math-533$ .

4.4.4.2 The Student Copulas

Let $c04-math-534$ be a $c04-math-535$ -dimensional $c04-math-536$ -distributed random vector, as defined in Section 4.3.2. The copula of $c04-math-537$ is

where $c04-math-538$ is the distribution function of $c04-math-539$ distribution, $c04-math-540$ is the correlation matrix of $c04-math-541$ , and $c04-math-542$ is the distribution function of the univariate $c04-math-543$ -distribution with degrees of freedom $c04-math-544$ .

Indeed, the claim follows similarly as in the Gaussian case for

where $c04-math-545$ and $c04-math-546$ is the square root of the $c04-math-547$ th element in the diagonal of $c04-math-548$ . The matrix $c04-math-549$ is indeed the correlation matrix, since

where $c04-math-550$ , $c04-math-551$ .

Figure 4.8 shows contour plots of the densities of the Student copula when the margins are standard Gaussian. The correlation is $c04-math-552$ . The degrees of freedom are in panel (a) two and in panel (b) four. The Gaussian and Student copulas are similar in the main part of the distribution but they differ in the tails (in the corners of the unit square). The Gaussian copula has independent extremes (asymptotic tail independence) but the Student copula generates concomitant extremes with a nonzero probability. The probability of concomitant extremes is larger when the degrees of freedom is smaller and the correlation coefficient is larger.

Graphical representation of Student copula with standard Gaussian margins. — **Figure 4.8** *Student copula with standard Gaussian margins*. Contour plots of the densities of the Student copula with degrees of freedom (a) 2 and (b) 4. The correlation is $c04-math-553$ .

**Figure 4.8** *Student copula with standard Gaussian margins*. Contour plots of the densities of the Student copula with degrees of freedom (a) 2 and (b) 4. The correlation is $c04-math-553$ .

4.4.4.3 Other Copulas

We define Gumbel and Clayton copulas. These are examples of Archimedean copulas. Gaussian and Student copulas are examples of elliptical copulas.

The Gumbel–Hougaard Copulas

The Gumbel–Hougaard or the Gumbel family of copulas is defined by

where $c04-math-554$ is the parameter. When $c04-math-555$ , then $c04-math-556$ and when $c04-math-557$ , then $c04-math-558$ .

Figure 4.9 shows contour plots of the densities with the Gumbel copula when $c04-math-559$ , $c04-math-560$ , and $c04-math-561$ . The marginals are standard Gaussian.

Graphical representation of Gumbel copula. — **Figure 4.9** *Gumbel copula*. Contour plots of the densities of the Gumbel copula with $c04-math-562$ , $c04-math-563$ , and $c04-math-564$ . The marginals are standard Gaussian.

**Figure 4.9** *Gumbel copula*. Contour plots of the densities of the Gumbel copula with $c04-math-562$ , $c04-math-563$ , and $c04-math-564$ . The marginals are standard Gaussian.

The Clayton Copulas

Clayton's family of copulas is defined by

4.40

where $c04-math-566$ . When $c04-math-567$ , we define $c04-math-568$ . When the parameter $c04-math-569$ increases, then the dependence between coordinate variables increases. The dependence is larger in the negative orthant. The Clayton family was discussed in Clayton (1978).

Figure 4.10 shows contour plots of the densities with the Clayton copula when $c04-math-570$ , $c04-math-571$ , and $c04-math-572$ . The marginals are standard Gaussian.

Graphical representation of Clayton copula. — **Figure 4.10** *Clayton copula*. Contour plots of the densities of the Clayton copula with $c04-math-573$ , $c04-math-574$ , and $c04-math-575$ . The marginals are standard Gaussian.

Elliptical Copulas

Elliptical distributions are defined in Section 4.3.4. An elliptical copula is obtained from an elliptical distribution $c04-math-576$ by the construction (4.29). The Gaussian copula and the Student copula are elliptical copulas.

Archimedean Copulas

Archimedean copulas have the form

where $c04-math-577$ is strictly decreasing, continuous, convex, and $c04-math-578$ . For $c04-math-579$ to be a copula, we need that $c04-math-580$ , $c04-math-581$ . The function $c04-math-582$ is called the generator. The product copula, Gumbel copula, Clayton copula, and Frank copula are all Archimedean copulas and we have:

product copula: $c04-math-583$ ,
Gumbel copula: $c04-math-584$ ,
Clayton copula: $c04-math-585$ ,
Frank copula: $c04-math-586$ .

The density of an Archimedean copula is

where $c04-math-587$ is the second derivative of $c04-math-588$ :

because $c04-math-589$ . We have:

Gumbel copula: $c04-math-590$ , $c04-math-591$ , $c04-math-592$ ,
Clayton copula: $c04-math-593$ , $c04-math-594$ , $c04-math-595$ ,
Frank copula: $c04-math-596$ , $c04-math-597$ , $c04-math-598$ .

4.4.4.4 Empirical Results

Research on testing the hypothesis of Gaussian copula and other copulas on financial data has been done in Malevergne and Sornette (2003), and summarized in Malevergne and Sornette (2005). They found that the Student copula is a good model for foreign exchange rates but for the stock returns the situation is not clear.

Patton (2005) takes into account the volatility clustering phenomenon. He filters the marginal data by a GARCH process and shows that the conditional dependence structure between Japanese Yen and Euro is better described by Clayton's copula than by the Gaussian copula. Note, however, that the copula of the residuals is not the same as the copula of the raw returns and many filters can be used (ARCH, GARCH, and multifractal random walk). Using the multivariate multifractal filter of Muzy et al. (2001) leads to a nearly Gaussian copula.

Breymann et al. (2003) show that the daily returns of German Mark/Japanese Yen are best described by a Student copula with about six degrees of freedom, when the alternatives are the Gaussian, Clayton's, Gumbel's, and Frank's copulas. The Student copula seems to provide an even better description for returns at smaller time scales, when the time scale is larger than 2 h. The best degrees of freedom is four for the 2-h scale.

Mashal and Zeevi (2002) claim that the dependence between stocks is better described by a Student copula with 11–12 degrees of freedom than by a Gaussian copula.