CHAPTER 7
Statistics of Copulas

7.1 The Formula that Killed Wall Street

This is how copulas became infamous. Very few mathematical formulas ever achieved such a huge popular recognition, even with clearly negative connotations, as the title of this section may suggest. Formulas rarely kill. Gaussian copulas (or, rather, applications of Gaussian copulas to pricing credit derivatives) have earned this distinction. As Felix Salmon [38] tells the story, it all started in the year 2000 when David X. Li, a former actuary turned financial mathematician, suggested a smart way to estimate default correlations using historical credit swap rates [29].

The approach suggested by Li was indeed very attractive. Being able to use correlation structure to define dependence, as was discussed in Chapter 5, really simplifies the problem of building a model for joint distribution. As we have mentioned, this is how it works for multivariate normal distribution: all dependence that may exist between the components of a Gaussian random vector can be expressed in terms of the correlation matrix. In the bivariate case the correlation matrix degenerates into just one number: the correlation coefficient.

By that time (the turn of the new millennium) it was already well known that Gaussian approximation was not working very well for many financial variables. Empirical observations of the long-term behavior of daily stock price returns, market indexes, and currency exchange rates had exposed some strange patterns hardly consistent with the assumption of normality. Their distributions were characterized by fat tails and skewness, which gave evidence that large deviations of these variables, especially on the down side, were much more likely to occur in reality than it would be predicted by Gaussian model. By that time the Black–Scholes formula, the crown achievement of mathematical finance (pricing European options based on the assumption of normality) already ceased to satisfy financial practitioners, and alternative distribution models were being investigated making full use of the Monte Carlo approach.

However, the Gaussian copulas looked like a very different tool distinctive from regular multivariate normal models. They did not require any assumptions of normality. On the contrary, arbitrary random variables, X and Y, characterized by fat tails, skewness, and/or other indecent behavioral patterns, in short, variables behaving as wildly as they could with whatever exotic distribution functions F and G, would be transformed into uniform U = F(X) and V = G(Y) and then again individually transformed into normal variables S = Φ− 1(U) and T = Φ− 1(V) and sent to live and interact in the normal world, where their dependence patterns were to be studied. And as we know, in the world of Gaussian copulas Cρ(u, v) = Φρ− 1(u), Φ− 1(v)) all the dependence between the marginals may be expressed in terms of correlation ρ. Then the backward transform would send our variables back into the wild, but with their dependence patterns well recorded. Temptation was obviously there and a nice modeling algorithm was open for the takers: The study of the marginal distributions was isolated from the study of the dependence structure, and these two could be done separately, with the latter carried out in a surgically clean world of Gaussian variables. That is how the default correlations were estimated and pricing of credit derivatives became technically possible. The result was a spectacular growth in the industry of credit derivatives, especially, CDOs, which became extremely popular as evidenced by McKenzie et al. in [33].

CDOs or collateralized debt obligations are special credit derivative securities packaging and repackaging debt, buying and selling risks in exchange for positive cash flows. A simplified example of a CDO, three loan portfolio, where we priced the separate tranches, for example, obligations to pay back specially allocated portions of the mortgage debt in case of defaults was discussed in Section 5.7. Much more complex CDOs were created, priced, and sold in the real world, with their senior tranches deemed virtually riskless and thus being assigned very high credit ratings based on Gaussian copula models. Credit agencies trusted the models and also added to the common illusion of safety.

On the top of CDO popularity, this is what an analytic paper [8] indicated as their main benefits (2004 Outlook):

  • Investment diversification
  • Customized risk–reward
  • Attractive yields
  • Leveraged returns (optional redemption)
  • Transparency

Evidently, things looked good in 2004. Then bad things started happening. With the US housing market going down in 2008, all the estimates of probabilities of joint defaults on mortgages went down in flames. The simplest reasons for that were suggested in Section 5.7: First, it was extremely hard to estimate correlation using historical data. Second, even if we knew everything about correlations, this still did not give us enough information to estimate probabilities of joint defaults.

One possible solution was to throw out all copula techniques as compromised by one of the worst crises of the Wall Street and therefore one of the worst moments in the history of financial mathematics. The authors of this book cannot recommend this solution with two motivations: First, we have been indeed heavily invested in copula modeling for the last fifteen years. This however would be an insufficient motivation to recommend the copula approach to anyone, if not for the other reason: in applications, we have encountered many examples when copulas do their job perfectly well, providing good estimation for actuarial, financial, and other problems. We certainly acknowledge the deep and serious critiques of copula models expertly done by Mikosch in [32] and also many other sources. But we do not find enough evidence to totally deny copula methods their rightful place among the other statistical techniques.

Another possible solution of which we can approve: always look for an adequate copula model. Do not fall for the first or the easiest choice. As we will see very soon, attractive Gaussian copulas obtain one property (or rather, lack one feature), which makes them often unfit for risk management purposes. This lacking feature is known as tail dependence.

Sklar’s theorem tells us that for any joint distribution there always exists a copula representation, which means that the true model of dependence can be found inside the class of copula functions. Unfortunately, no one has provided an algorithm yet, which can efficiently search for the true solution over the entire set of copula models. Mathematically speaking, we can just regret the absence of a general parametrization for the entire set of copula functions. Such a parametrization would reduce the search for a good copula model to a routine problem of parametric estimation. What statisticians have learned to do very well, is to search for the best solution in a given specific parametric subclass of copulas such as any of those described in Chapter 6: Gaussian, FGM, Clayton, etc.

We can think of this problem as similar to the one we may encounter while looking for a treasure chest hidden in a huge office building. If we know that this treasure chest exists, but do not have keys to all the offices in the building, we might be restricted to the confines of the offices which either are already open, or to which the keys are readily available. Nobody guarantees the existence of true distribution or even a good approximation in these open offices. Actually, the optimal solution for a given office might be a quarter or even a penny we find in a dusty corner, and it is worth much less than the treasure chest which is still sitting behind a closed door in a different room.

Thus we can consider an entirely different problem: We wish to develop the methods which will allow us to compare the relative wealth of open offices so that we could select the copula model which were at least the best available for our application. Following George Box’s idea that “essentially, all models are wrong, but some are useful” [5], we would like to suggest a useful model inside the subclass which we can properly analyze.

Section 7.2 will contain the general discussion of the most logical criteria of quality of statistical models helpful in choosing one copula model over another one. We will follow the empirical principle: the best model is the one that fits our data the best. However, keeping the best fit in mind, we should not forget about the predictive quality of the model, which may often suffer as the result of overfitting. We will consider standard goodness-of-fit criteria which will typically provide some measure of the overall fit of a statistical model. Then we will review information criteria such as AIC and its later versions, which will equip us with certain measures against overfitting. Two criteria of fit which are more specific for copulas address the tail dependence and such measures of overall dependence as Kendall’s concordance and Spearman’s rank correlation.

Given we know the office we want to search, how to find a strategy to find the wealth it contains? Which methods can we use to find the optimal solution for a given parametric subclass of copula functions? This is the problem of parametric estimation which we will consider in Section 7.3. First of all, the structure of copula models allows for two options: we either can estimate all parameters at once, including parameters of the marginals and the parameters of association. If this one-step approach is too cumbersome, we may opt for a two-step approach: estimate the marginals first, and then estimate the parameters of association given marginals. This can be done either parametrically at both steps or semiparametrically (using nonparametric estimates of the marginals and parametric estimates of the association). We will not consider fully nonparametric estimation, which will stay out of the limits of this discussion.

We will also make use of the Bayesian approach. The difference between Bayesian and classical methods consists in more than just the difference between two point estimators: say, maximum likelihood estimator versus the posterior mean incorporating some prior information. The most important aspect of Bayesian estimation is that the inference gives us entire posterior distribution rather than one point estimate. To choose the best representative of a given parametric class, we will utilize some of the the criteria discussed in Section 7.2 and will compare the results of parametric and semiparametric estimation with these criteria in mind. Among the methods we will consider, the most important role will belong to a version of the method of moments, minimum distance (least squares) approach, method of maximum likelihood, and, naturally, Bayesian estimation methods.

Section 7.4 will be dedicated to the selection of the office (sorry, the subclass of copulas), which is the best of those being open. To make sure that a good enough choice exists, we will discuss goodness-of-fit tests which can address the availability of feasible solutions in these classes. One logical way would be to designate the best solution for each open office (subclass of copulas), and compare these solutions using the criteria discussed in Section 7.2. Another possible solution is to apply Bayesian model selection procedures which will base the comparison between the subclasses not on the sole representatives of each, but rather on the integral performance of models from these subclasses.

Thus Bayesian statistics plays two separate roles in this chapter: We apply Bayesian estimation to find the best solutions in the subclasses of copulas, and also Bayesian hypothesis testing for the purpose of model selection between these subclasses.

In order to illustrate our point, we will consider two applications which utilize some of these approaches in actuarial and financial contexts mentioned in Chapter 5. Section 7.5 considers Bayesian copula estimation for a joint survival problem and Section 7.6 touches on Bayesian model selection for the related failures of vehicle components. Chapter 8 will contain a study of international financial markets which will combine most of the elements of statistical analysis considered in Chapter 7.

7.2 Criteria of Model Comparison

Suppose we use a finite sample X = (X1, …, Xn) to build a distribution model, which will both explain the data we obtain and provide a valid prediction for future, hopefully supported by the data yet to come. If a distribution model F0(x) is suggested, what makes it a viable candidate? How can we choose a better model out of two candidates? Most of the criteria providing comparison for copula models are based on the closeness of the empirical distribution or its specific characteristics to those of the candidate distribution. In parametric case a special role is played by the likelihood function and its modifications.

7.2.1 Goodness-of-Fit Tests

Traditional goodness-of-fit approach is about testing the null hypothesis H0: XF0(x) versus H1: otherwise. In general setting, we define a nonnegative “distance” (which may or may not be a proper mathematical distance or metric, satisfying conditions of identity of indiscernibles, symmetry, and triangle inequality) d(F*n(x), F0(x)) between the theoretical c.d.f. F0(x) suggested by the model and the empirical c.d.f.

of the data sample X = (X1, …, Xn), where IA(x) is the standard indicator function of the set A. Notice the slightly nonstandard use of the denominator n + 1 in (7.1) instead of more common n, in which we follow Genest and Rivest [19] and others, who find it convenient that the empirical c.d.f. never takes the value 1. We will briefly discuss two common measures of goodness-of-fit, which are most popular in application to copula structures. These measures are Kolmogorov–Smirnov distance

and Cramer–von Mises distance

(7.3)numbered Display Equation

defining two corresponding criteria of closeness.

The tests based on these two distances with rejection regions of the form dKS > c and ω2 > c are convenient because the distributions of the test statistics and thus the p-values of the tested data are available under the assumptions of H0 being true, at least for one-dimensional case. In order to determine dKS one has to compare the vertical distance between two functions (empirical and theoretical), only at the sample points, which represent the jumps of empirical c.d.f., so that the search for the maximum is simplified. Similarly, instead of integral ω2 the following test statistic can be calculated:

numbered Display Equation

where x(i) is the ith order statistic.

One way both of these measures can be applied beyond testing the null hypothesis is to compare several theoretical candidate models, selecting those with the lower values of dKS or ω2.

Let us notice that while goodness-of-fit tests are essentially non-Bayesian, both Kolmogorov–Smirnov and Cramer–von Mises distances may be used in a Bayesian context as we will see later on. We should also be aware that in case of copula models F0(x) is a joint distribution of vector X with two or more components, which makes practical application of goodness-of-fit tests much more technically difficult. In particular, distributions of test statistics are no longer distribution free and finding the maximum for Kolmogorov–Smirnov statistic is not as trivial as in one dimension. That is why the determination of even approximate p-values of the tests requires a substantial amount of work such as Monte Carlo generation of multiple samples from the theoretical distribution [18]. However, this is not an issue for model comparison purposes.

The real issue is: using goodness-of-fit criteria for model selection, we do not put any barriers against overfitting. Which means that the theoretical model with a c.d.f. close to the empirical c.d.f. is always preferable. It is always easy to imagine a theoretical model with way too many parameters (overparameterized), which will provide a really good fit for the empirical c.d.f. However this model will take all observed features of the sample for granted, even those ”features” which have completely random origin. At the same time, it will not allow for the features not observed. It is like looking at a picture of a human face which has a wart on the nose, an ink stain in the middle of the forehead, and one ear cut-off. Should we take all other human faces to have warts and ink stains in appropriate places, but just one ear? Such a model will provide a good description of a particular portrait, but a poor prediction of what most humans look like.

7.2.2 Posterior Predictive p-values

Before moving on to the measures of model fit, which will address overfitting, let us introduce a fully Bayesian concept which may serve the purpose of testing goodness-of-fit hypotheses without applying Bayesian factors as was done in Chapter 2. To be more precise, we will follow the idea of Bayesian predictive checking introduced by Rubin [37].

Let us review a parametric version of the general goodness-of-fit test and the frequentist concept of p-value, which is consistently criticized by Bayesian statisticians and lately (along with confidence intervals) became a sore spot of applied statistics as one of the most intuitively misunderstood and misused traditional statistical techniques. This critique was epitomized in an editorial by Trafimov and Marks [44] in a popular applied journal Basic and Applied Social Psychology and further developed by Wasserstein and Lazar [47].

For simplicity, let us consider only the continuous case treating all data as one vector and parameter as another. Let us denote by X the data (it could be a finite i.i.d. sample or a different object taking values x on the sample space S, X = xS) with the distribution density (likelihood) f(x; θ) depending on parameter θ. Consider a null hypothesis H0: θ = θ0 to be tested against the composite alternative H1: θ ≠ θ0. For test statistic T(X) and critical region of the form RT(x) = {xS: T(x) ≥ c}, define the test:

Reject H0” if T(x) ≥ c or “Do not reject H0” if T(x) < c.

Significance level approach suggests determining first the size or significance level of the test as the tolerable level of the Type I Error α = P(T(X) ≥ c0), so that the critical value c(α) depends on α. However a more popular approach is to determine the p-value of the data, p = P(T(Y) ≥ T(x)|x, θ0), which may be interpreted as the tail probability of statistic T(Y) for the given x and θ0. We also may rewrite it as

Notice that under the null hypothesis p-value as a function of x is uniformly distributed on [0, 1], so that exactly 5% of all data from S0) will have p ≤ 0.05. General recommendation is to reject the null hypothesis for data with low p-values.

If the choice of statistic T(X) is smart, this test will have high power defined as rejection probability under the alternative: k = P(T(X) ≥ c|H1). Famous Lindley’s paradox [30] guarantees that for large enough sample sizes, virtually all values θ ≠ θ0 will be eventually rejected. However high rejection probability is not the goal of Bayesian hypothesis testing. If the model is false (and all models are false, according to George Box), but fits our data well, we do not want to reject it. This is the motivation to introduce a Bayesian concept, which will measure the feasibility not of a single value of parameter θ0, but rather of the entire posterior distribution, see also Meng [31].

In order to formulate a Bayesian alternative to the p-value approach, we need to introduce Bayesian predictive distributions, which we ignored so far being focused on estimation. In estimation framework we have considered prior π(θ) and posterior π(θ|x) densities. Now let us consider a new sample, yS. What is the distribution of y given our information on θ, provided both by the prior and the data x? We can define Bayesian predictive density as the integral over posterior

(7.5)numbered Display Equation

It is also possible to define prior predictive density as an extreme case with no data available (with prior replacing posterior):

numbered Display Equation

It is easy to see that the prior predictive density g(y) coincides with the marginal density m(y) in the denominator of the formula (2.26) playing the central role in Bayesian estimation.

Let us use the posterior predictive density to define the posterior predictive p-value similar to (7.4) as

It is important that the latter does not depend on a particular value of θ0, but integrates the information from the posterior. The role of pp(x) is similar to traditional p-values: reject null when pp(x) is small; let us say, if it does not exceed usual benchmark of 0.1, 0.05, or 0.01.

In order to illustrate the difference between (7.4) and (7.6), let us consider a slightly paraphrased example introduced by Gelman [15].

Example 7.2.1 Let us test a hypothesis regarding the normal mean using a single data point: H0: θ = 0 versus a two-sided alternative H1: θ ≠ 0. Assume f(x; θ) ∼ N(θ, 1) and π(θ) ∼ N(0, τ2). It is easy to see (compare to Exercise 2.8) that

numbered Display Equation

Defining T(X) = |X| and taking into account that one tail of the test statistic translates into two symmetric tails of the standard normal distribution Φ(u), we obtain from (7.4) and (7.6) the following expressions:

numbered Display Equation

What can we say about particular data values which are extreme from the point of view of the likelihood? Consider, for instance, x = 3, three standard deviations away from the mean. It gives ample evidence against H0 from frequentist point of view, bringing about p-value of 0.0027. However, if we consider a vague prior with τ = 10, pp(3) = 0.9832 and H0 should hardly be rejected from a Bayesian standpoint. Even if x = 30, which is three standard deviations of prior predictive distribution away from the mean, p(30, 0) ≈ 0, but pp(30) = 0.8333 and hypothesis H0 stays highly feasible and should not be rejected.

But why do we not want our hypothesis to be rejected when the observation x = 30 seriously contradicts the prior? Yes, in this case the prior is far away from the likelihood, but somehow the posterior is O’K: π(θ|30) ∼ N(29.7, 0.99), which means that the Bayesian model works too well to be discarded. The difference between the classical decision to reject H0 and the Bayesian decision not to reject corresponds to the difference between the way we treat the null hypothesis: in classical setting it characterizes the point θ0 of the parametric space, while in Bayesian setting it corresponds to the entire posterior distribution.

7.2.3 Information Criteria

In parametric setting, a special role when the theoretical c.d.f. is defined as is played by the likelihood function L(x; θ) defined in (2.13). Let us define deviance of a parametric model M with the k-dimensional parameter value θ as

(7.7)numbered Display Equation

The smaller values of the deviance correspond to the higher likelihood. The maximum likelihood estimator will provide the smallest possible deviance. We will define Akaike Information Criterion or AIC [1] as

(7.8)numbered Display Equation

Information criterion represents the relative loss of information which occurs when we use model M instead of the true model [2]. The lower values of AIC correspond to the better quality of the model. Most of it is determined by likelihood, but additional term 2k represents a penalty for overfitting: larger models with more parameters (higher dimension of vector θ) will be penalized, and smaller parsimonious models will be preferred.

Another information criterion was suggested by Schwarz [39]. It penalizes overparameterized models even more substantially. It is called Bayesian Information Criterion or BIC and is defined as

(7.9)numbered Display Equation

The names might be misleading, and there is nothing particularly Bayesian in BIC. Both AIC and BIC are based on maximizing the likelihood function, and only the sizes of the penalty for the parametric dimension are different, for BIC increasing with the sample size. The main advantage of both AIC and BIC and their later versions [2] becomes evident when we compare two or more parametric models of different parametric dimensions or even different parametrizations with regard to overfitting. Let us say, there exist two models M1 with parameter θ1 of dimension k1 and M2 with parameter θ2 of dimension k2. Then we say that a model M1 is better than M2 if AIC(M1) < AIC(M2) or BIC(M1) < BIC(M2), depending on your preference of AIC or BIC. Models M1 and M2 may be non-nested (e.g., Clayton copula with Weibull margins vs. Gumbel–Hougaard copula with Gompertz margins).

None of these criteria are especially convenient for comparison of models within Bayesian framework. First, they are based on the value of MLE, which might not be directly available in Bayesian estimation. Second, they do not utilize the entire form of the posterior distribution. Deviance Information Criterion introduced by Speielgelhalter et al. [43] resolves these problems. For a model M with parameter θ it may be defined as

(7.10)numbered Display Equation

where is the deviance at the posterior mean and is the posterior mean of the deviance D(θ). In this presentation the first term characterizes the fit of the model, and the second term implicitly penalizes overparameterization. Another version of DIC suggested by Gelman et al. in Reference [16]

(7.11)numbered Display Equation

is based on the posterior variance. The most attractive feature of DIC is its straightforward application to MCMC procedures of Bayesian estimation, where all three characteristics , , and can be estimated directly from the generated sample. As with other information criteria, the lower values of DIC indicate the better model.

7.2.4 Concordance Measures

What is the most important feature of pair copula models? It is the way they make it possible to model joint distribution of two variables P(Xx, Yy) = Cα(F(x), G(y)), and also to separate the estimation of dependence from the estimation of marginal distributions F(x) and G(y). A better copula model should accurately estimate the strength of dependence whatever the marginals are. The strength of dependence suggested by model Cα within a given copula family is determined by the association parameter(s). The strength of dependence between components of the random vector (X, Y) can be also expressed in terms of Spearman’s coefficient of rank correlation ρ* or Kendall’s concordance τ defined in (5.4). It is important that both the rank correlation and concordance are invariant with respect to marginals F and G and are fully determined by the copula function Cα(u, v). Relationships between the copula function and its concordance measures are well established:

(7.12)numbered Display Equation

and

(7.13)numbered Display Equation

In particular, for Archimedean copula with generator ϕα(t) it holds that

(7.14)numbered Display Equation

For instance, see [20], for Clayton’s family

for Gumbel–Hougaard’s family (also, its survival copula version)

For Frank’s family the formula is not as simple, though we still can express concordance

(7.17)numbered Display Equation

through Debye’s integral

(7.18)numbered Display Equation

and approximate it numerically. Also, for FGM family

numbered Display Equation

and for all elliptical copulas (including Gaussian and t-copulas, see [9])

For two-parametric BB1 copulas (6.19) concordance may be expressed through both association parameters:

numbered Display Equation

Using the fact that Kendall’s tau may be estimated with the help of its empirical value according to (5.5), we can suggest two options for one-parametric copulas. First, may be used to directly obtain an association parameter estimate within a given family of copulas. Second, if several models with different values of association are considered, the difference between the values of τ induced by models via the formulae above and the empirical value may be treated as a concordance-based criterion of model comparison.

7.2.5 Tail Dependence

In applications of copula models to risk management, the attention is often concentrated on the tails of the joint distributions indicating extreme co-movements of two or more variables of interest. These tails correspond to catastrophic events. That is why it makes sense to construct the models with especially good fit at the tails. Coefficients of lower tail dependence

and upper tail dependence

where is the survival copula defined in (6.4), characterize respectively the behavior of the lower left and upper right tails of the copula function. If tail dependence coefficient is zero, we can talk about ”no tail dependence.” As well as the concordance measures from the previous subsection, tail dependence is invariant with respect to marginal distributions.

For popular elliptical and Archimedean copulas tail dependence can be expressed in terms of association parameter, namely, for Clayton’s family

for Gumbel–Hougaard copula

(7.23)numbered Display Equation

and for its survival version

(7.24)numbered Display Equation

Frank and Gaussian copula have no tail dependence, while t-copula has both lower and upper ones:

(7.25)numbered Display Equation

where Tν(t) is the c.d.f. of t-distribution with ν degrees of freedom [9]. Two-parametric BB1 copulas allow for two different tail dependence coefficients:

A better copula model should give a good correspondence of the model-induced tail dependence with the empirical tail dependence. The only problem is caused by a certain ambiguity in estimating empirical tail dependence.

7.3 Parametric Estimation

In this chapter we concentrate on pair copula analysis, restricting ourselves to two marginal distributions tied together with a pair copula. Suppose we have chosen a particular family of copulas (say, FGM or Clayton’s). How can we specify the values of parameters (both marginal parameters and copula parameter, which is also called the association parameter)? Typically, we will have to work in the parametric set of three to six dimensions (one or two per marginal and one or two for the association).

First of all, we have to determine whether we want to estimate all parameters at once (one-step estimation) or we prefer to use an important property of copula models, which allows us to estimate the marginal parameters first, and then estimate the association given the marginals. This constitutes two-step estimation also known as IFM: inference functions for margins. As shown by Joe [22], see also [23], both procedures are consistent. The first one seems to be more theoretically appealing, while the second is certainly more practical, reducing the number of parameters to be estimated on one step, therefore reducing the dimensionality of the parametric set.

Example 7.3.1 For a numerical example, let us consider a sample of size n = 10 consisting of pairs (xi, yi), i = 1, …, n, where variables X and Y are distributed exponentially with scale parameters θ1 and θ2, so that and . Suppose that their association can be described by survival Gumbel–Hougaard copula with copula parameter α, so that their joint survival function is

numbered Display Equation

Actually, this is exactly how they are distributed, because for the construction of this mini-sample we used the simulation procedure from Example 6.6.3, which can be easily adjusted for exponential marginals. The choice of parameters was: θ1 = 3, θ2 = 5, and α = 2. This choice of association parameter corresponds to τ = 0.5 and also simplifies simulation in Marshall–Olkin construction, because for this case the stable distribution reduces to Levy distribution and its values can be generated as , where UUni f[0, 1].

Using simulated data for illustration purposes is a very common practice. It allows for some meaningful comparisons, even for such a small dataset. Here is the sample, and respective ranks of the sample values listed in Table 7.1. Further in this section we will use it to demonstrate results of application of the most common techniques of parametric estimation and obtain numerical estimates for θ1, θ2, and α.

Table 7.1 Simulated exponential/Gumbel–Hougaard sample

i 1 2 3 4 5 6 7 8 9 10
x 1.95 5.53 0.65 0.19 2.34 3.22 0.48 2.59 5.73 4.60
y 1.86 5.81 1.53 3.57 4.32 1.13 0.16 6.23 6.86 8.14
Rank (x) 4 9 3 1 5 7 2 6 10 8
Rank (y) 4 7 3 5 6 2 1 8 9 10

Aside from practicality, there are some additional justifications for the two-step approach. One of them is evident when we analyze multiple pair copulas connecting many possible pairs on the same set of marginal distributions. In this case it will be counter-intuitive to allow different parametric models for the same marginal distributions when they are coupled into different pairs. This is the setting of Section 7.6, all cases in Chapter 8, and many other popular applications.

7.3.1 Parametric, Semiparametric, or Nonparametric?

If we choose the two-step IFM approach, it allows for one more choice. In general, one may choose nonparametric approach for both steps: estimating marginal parameters and building a nonparametric copula, see, for example, Genest and Rivest [19] and Nicoloutsopoulos [34]. In the book dedicated to parametric estimation we will choose not to consider a fully nonparametric approach and concentrate on parametric copulas. However it still allows for both parametric and nonparametric estimation of the marginals thus bringing about either fully parametric estimation (for both marginal and association parameters) or semiparametric estimation (nonparametric marginal and parametric copula model).

Example 7.3.2 In order to apply IFM approach to the data in Table 7.1, we have to estimate the margins first. We will start with parametric estimation. Using standard procedure for exponential distribution, we can use either MLE or the method of moments with the same results: and . Corresponding values of survival functions for sample data are listed in the first and the second rows of Table 7.2. Similarly, we can obtain empirical estimates for the same survival functions using empirical c.d.f. as defined in (7.1), see the third and the fourth rows of Table 7.2.

Table 7.2 Parametric and nonparametric estimates of marginals

i 1 2 3 4 5 6 7 8 9 10
S1(x; 2.73) 0.52 0.16 0.81 0.94 0.46 0.34 0.85 0.42 0.15 0.22
S2(y; 3.96) 0.62 0.23 0.68 0.40 0.33 0.75 0.96 0.21 0.18 0.13
S*1(x) 0.64 0.18 0.73 0.91 0.55 0.36 0.82 0.45 0.09 0.27
S*2(y) 0.64 0.36 0.73 0.55 0.45 0.82 0.91 0.27 0.18 0.09

In general, fully parametric models will be less data dependent and will be expected to have higher predictive quality. On the other hand, semiparametric estimation is fully theoretically justifiable as discussed by Genest et al. [17] and may be more appropriate for the comparison of copula models and copula model selection, eliminating the effect of possible misspecification of the marginal distributions discussed by Kim et al. [24]. From this point on we will concentrate our attention on the copula parameter of association, estimating marginal distributions parametrically or nonparametrically, separately or simultaneously with the copula parameter as dictated by the context and convenience.

7.3.2 Method of Moments

Generalized method of moments or plug-in approach suggests (a) finding functional relationships between the parameters of interest and some distribution characteristics, such as moments; (b) using sample moments to estimate distribution moments; and (c) estimating the parameters by plugging in sample moments instead of distribution moments into the formulas established in (a). In case of copula parameters the role of moments may be played by concordance measures: Kendall’s and Spearman’s rank correlation. Formulas introduced in Subsection 7.2.4 provide functional relationships for (a) and can be used to estimate association for one-parameter Archimedean copulas and correlation for elliptical copulas. It is less obvious what moment-like measures should be used in case of two-parameter families.

Example 7.3.3 It is easy to estimate Kendall’s concordance τ for the initial sample from Table 7.2 or even from ranks in Table 7.1. Kendall’s concordance is based on ranks, so parametric (rows 1 and 2) and nonparametric (rows 3 and 4) estimates of the margins in Table 7.2 will bring about the same result . Solving Equation (7.16) for α, we obtain the method of moment estimate

(7.27)numbered Display Equation

which is pretty close to the true value. This degree of precision cannot be guaranteed for such a small sample, but confirms that the method of moments provides a “fast and dirty” procedure for the estimation of copula parameters.

7.3.3 Minimum Distance

This method suggests minimizing certain distance between the empirical distribution and distributions in the chosen family of copulas. The distribution from the family minimizing the distance is supposed to be the best choice. Difficulties with this approach may be related both to the proper choice of a distance and the technical task of finding its minimum. For certain classes of Archimedean copulas this approach may utilize the concept of Kendall’s distribution defined in Section 6.6 and was proven to be very productive by Nicoloutsopoulos [34].

To illustrate this point, let us define pseudo-observations or empirical copula for i = 1, …, n as

(7.28)numbered Display Equation

the number of sample points both strictly to the left and below of (Xi, Yi). Slightly unusual normalizing factor n − 1 may be explained by convenience of the following representation of sample concordance in terms of pseudo-observations: . Then define for any t ∈ [0, 1] the empirical Kendall’s function

(7.29)numbered Display Equation

which can serve as an empirical analog (and estimator) of the theoretical Kendall’s function Kα(t) = P(Cα(U, V) ≤ t) corresponding to the true model. The values of Zi and Kn(Zi) calculated by the ranks are summarized in Table 7.3.

Table 7.3 Ranks, pseudo-observations, and empirical Kendall’s function

i 1 2 3 4 5 6 7 8 9 10
Rank (x) 4 9 3 1 5 7 2 6 10 8
Rank (y) 4 7 3 5 6 2 1 8 9 10
Zi 2/9 6/9 1/9 0 4/9 1/9 0 5/9 8/9 7/9
Kn(Zi) 0.46 0.73 0.36 0.18 0.55 0.36 0.18 0.64 0.91 0.82

Then we can compare the distances

for different values of association. The value minimizing dK may serve as an estimate of α.

Example 7.3.4 In the case of Gumbel–Hougaard family, according to (6.31), Kendall’s function may be represented as

(7.31)numbered Display Equation

and the numerical minimization of dK brings about .

7.3.4 MLE and MPLE

The most popular method used in applications is probably still the method of maximum likelihood estimation (MLE). In the IFM setting [22], when on the second step we form pseudo-likelihood function by plugging in estimates of the marginals instead of their values and maximize this function to obtain the estimate for the association, it is more appropriate to talk about maximum pseudo-likelihood estimation (MPLE). In case of empirical estimates of the marginals, this method was discussed and justified by Genest et al. [18] and is also known as canonical maximum likelihood (CMLE).

Example 7.3.5 We will use both MPLE in IFM setting and full MLE to obtain copula parameter estimate for the data in Table 7.1. Pseudo-likelihood function for IFM can be obtained as

where cα(u, v) is the copula density defined in (6.27); and for parametric estimation and and for semiparametric estimation. Maximizing in (7.32) brings about the estimates for the association parameter: for parametric and for semiparametric estimation (CMLE).

Full likelihood function for MLE can be expressed as

(7.33)numbered Display Equation

and maximized with respect to all three parameters at a time. The numerical results yield: , , and .

It will be presumptuous to make conclusions based on a small sample numerical example. However, it seems to be true in general that the method of moments using Kendall’s tau is the simplest and fastest way to obtain numerical value of association. Maximum likelihood approach is preferable for large samples because of its good large sample properties (asymptotic optimality). Using either nonparametric or parametric margins in IFM two-step context is possible, but numerical results may differ. The role of marginal estimation and possible marginal misspecification may be significant as our examples demonstrate. Full MLE and MPLE in parametric version typically bring about close numerical results, but it has to be decided whether the change of the marginal estimates from one-dimensional to two-dimensional case is an advantage or a drawback, and whether it is appropriate at all in the context of complex research projects with many pairs of variables being analyzed.

7.3.5 Bayesian Estimation

In Bayesian framework, we can certainly utilize both one-step and two-step approaches, and also use nonparametric estimates for the marginal. The difference between Bayesian approach and such classical methods as MLE, MPLE, or the method of moments as it was stated in Chapter 2 is more than just the choice of the best fit for our data within the given family of copulas. The benefit, as usual, is the use of entire posterior distribution instead of one representative of a family, providing more stable procedures and fully utilizing the benefits of integration versus optimization. The problem, as usual, is the choice of priors. In the presence of prior information we can elicit prior distributions using subjective considerations or use relevant data to implement empirical Bayes approach. In the absence of prior information we use noninformative priors. Examples and case studies in Sections 7.5 and 7.6 and Chapter 8 demonstrate a variety of ways in which Bayesian approach can be implemented to copula estimation. Here we will provide one particular implementation of Bayesian estimation via Metropolis algorithm using the familiar 10-point example and assuming a lack of reliable prior information.

Example 7.3.6 Let us consider a two-step IFM scheme, where both marginals are estimated first. This approach will allow us to avoid the problem of defining a joint prior for all three parameters and to assign priors independently. Assume very weak priors for the inverses of marginal parameters λi = 1/θii = 1, 2, π(λ1) ∼ Gamma(ϵ, ϵ) and π(λ2) ∼ Gamma(ϵ, ϵ), where ϵ is chosen as small as possible in order not to sacrifice computational stability. This way, the prior means are equal to 1, and prior variances are both equal to 1/ϵ, which makes priors as nonintrusive as possible. The resulting chains with 10,000 iterations produce the left and central trace plots in Figure 7.1 with Bayes estimates of the posterior mean and .

Graph showing the trace plots for two-step Bayesian estimation where the curve is drawn with time versus LAMBDA1, LAMBDA2 and ALPHA and shows the frequency differences.

Figure 7.1 Trace plot for two-step Bayesian estimation.

At the second stage, using the estimated marginals, assume a conservative prior , penalizing higher values of association. The resulting trace plot is shown in the bottom part of Figure 7.1 and the estimate of posterior mean is , which as can be expected for weak priors is similar to MLE.

Using the same independent priors for one-step analysis with simultaneous block update of all three parameters, we can expect a lower acceptance rate. Corresponding calculations bring about numerical estimates of , , and α = 1.747. The corresponding trace plots are demonstrated in Figure 7.2.

Graph showing the trace plots for one-step Bayesian estimation where the curve is drawn with time versus LAMBDA1, LAMBDA2 and ALPHA shown as light shades.

Figure 7.2 Trace plots for one-step Bayesian estimation.

In this example we cannot see any distinctive advantage of using Bayesian approach. First of all, we use no prior information, and therefore the final numerical estimates are not too far from MLE or method of moments estimates. Second, there is no technical problem with appying MLE for such a simple setting, therefore Bayesian approach is not necessary. However, it is definitely applicable, and leads to reasonable numerical results.

7.4 Model Selection

In the previous section we have discussed parametric estimation within a chosen family of copulas. Here we will consider a different problem. How to make this choice of a family? Why should we prefer Gaussian to Frank’s or Clayton’s to Gumbel–Hougaard’s? We will consider several principles which can help us to make this choice.

Let us say we want to compare two families of copulas and determine which of the two provides the best fit of the data. We can use a method like MLE, MPLE, method of moments, or Bayesian estimation to determine the best parametric fit in each family separately. Then we will select the family whose representative provides a better fit.

7.4.1 Hybrid Approach

This approach will combine two methods of estimation of the copula parameter: say, method of moments and MPLE. Suppose we consider two different families of copulas and want to choose the better model. To reduce the role of marginal specification, we will use the same estimate for the marginal distributions in both cases. Let us determine MPLE for the parameter of association for each of the two families. Then we can use the idea of generalized moments or concordances. Calculate concordance values implied by MPLEs in Subsection 7.2.3 for each family. Then compare these values with empirical concordance, and the smaller deviation will determine the best fit. For instance, model-induced values of τ obtained from estimated parameter values via 7.15 and 7.16 can be compared with the empirical value , and a smaller discrepancy will indicate a better model fit.

We can also use the hybrid between the minimum distance method and MLE or MPLE. In this case we choose two best representatives of two different copula families, and then compare them from the point of view of a distance from the empirical measure. The smaller distance wins. This approach is similar to the minimum distance method from the previous section, but now the problem is much simplified by the need to compare two numbers instead of running a minimization procedure over entire family. For instance, we can calculate the values of distances dK defined in (7.30) for two best representatives of two families and choose the one granting the smaller value. The role of such distances in hybrid approach can be also played by goodness-of-fit measures discussed in Subsection 7.2.1. However there exists some danger of choosing an overfitted model if we have to choose between two families of different parametric complexity.

7.4.2 Information Criteria

Information criteria defined in Subsection 7.2.4 penalize model complexity and overfitting, and their application makes more sense for comparison of models with a different number of parameters. These criteria are based on the values of likelihood function, so they cannot be used “across the board” and we have to compare separately two-step to two-step, IFM to IFM, or CMLE to CMLE estimates obtained for two families. These criteria are not necessarily good for nonlikelihood-based methods of estimation.

Example 7.4.1 Let us extend the example with data from Table 7.1 to a different family of copulas. Clayton’s family is often applied in the same context as survival Gumbel–Hougaard, when the modeling of the lower tails is of primary importance. Let us use Clayton copula to obtain MPLE estimates for α in parametric and nonparametric setting and also full MLE. Table 7.4 will summarize all information on numerical estimates obtained by full one-step MLE, parametric IFM (MPLE), and semiparametric method CMLE. We also calculate the model-induced values of tau and Kendall’s distance dK to implement the hybrid approach to model selection. In order to calculate the distance in (7.30) for the Clayton’s family we can use the following formula obtained from (6.31):

(7.34)numbered Display Equation

Table 7.4 Comparison of survial Gumbel–Hougaard and Clayton copulas

Copula Gumbel–Hougaard Clayton
method MLE IFM CMLE MLE IFM CMLE
2.64 2.73 * 2.72 2.73 *
3.70 3.96 * 3.85 3.96 *
1.69 1.69 2.03 0.86 0.86 1.48
τ 0.41 0.41 0.51 0.30 0.30 0.43
dK 0.37 0.37 0.36 0.47 0.47 0.46
Log like −42.06 1.72 2.45 −42.51 1.48 1.98
AIC 90.12 −1.44 −2.90 91.02 −0.96 −1.96
BIC 91.03 −1.14 −2.60 91.93 −0.66 −1.66
Tail 0.49 0.49 0.59 0.45 0.45 0.63

log-likelihood function, information criteria values, and tail dependence according to Subsection 7.2.5. We will leave the derivation of Bayes estimates for Clayton’s family and comparison of two Bayes estimates for the end-of-the-chapter exercises.

It is clear that no parameter estimates are obtained for marginals when semiparametric estimation is applied. Also, we can see slight discrepancies between the values of marginal parameters obtained by full one-step MLE and MPLE with IFM. Comparing values of α between Gumbel–Hougaard and Clayton model is in general not practical, because the meaning of these parameters is quite different. In this particular case τ = 0.5 corresponds to α = 2 for both families, but this is a very rare occasion.

However, according to the hybrid approach to estimation, comparison of corresponding values of τ is always informative. The model providing concordance estimates closer to the empirical value is more adequate. We should notice though that is not a very reliable estimate of the theoretical τ, especially for small samples. For instance, in our example data were generated using Gumbel–Hougaard copula with association parameter α = 2, corresponding by (7.16) to theoretical concordance value of τ = 0.5. In this case survival Gumbel–Hougaard model performs somewhat better. This is to be expected, because the data were generated using this model, and Clayton copula should not necessarily provide a good fit.

From row 7, corresponding to dK, it also looks like Gumbel–Hougaard family provides a better fit than Clayton: values of dK are smaller for all pairs of estimates: MLE versus MLE, IFM versus IFM, and semiparametric CMLE versus CMLE.

Comparison of two families using AIC and BIC based on log-likelihood function should also be done pairwise. Full MLE will bring about the largest values of information criteria, which reflects the more complex structure of the model. Semiparametric model by definition will provide a better fit than the parametric IFM. We can compare though the pairs provided by similar settings for different copula choices. In all the three cases (MLE, IFM, CMLE) the Gumbel–Hougaard model ends up slightly on top. It would be interesting to compare the tail behavior of two copula families, but it is much harder to do so for the lack of reliable empirical measures of tail dependence, especially for small samples.

Bayesian estimation for Clayton family and comparison of Bayesian estimates for two families of copulas using information criterion DIC specific for Bayesian estimation is left for the end-of-chapter exercises in Chapter 8.

7.4.3 Bayesian Model Selection

Following Bretthorst [7] and Huard et al. [20], we suggest to compare the data fit provided by several pair copula models not at a single value of association parameter(s) obtained by MPLE, but rather over the entire range of possible association values. This can be accomplished by specifying a prior distribution for association parameter(s) and integrating the likelihood with respect to the prior distribution. The problem is the difference of meaning and ranges of association parameters for different copula classes. If we want to compare several classes of copulas in Bayesian framework, we need to establish the common basis of comparison. For that purpose we need to suggest a universal parameter, which can be evaluated for all classes of copulas under consideration.

One such universal parameter is Kendall’s concordance τ, which as we have seen can be expressed in terms of association for many copula families. Sample concordance is a reasonable nonparametric estimator of τ [19]. Using formulas expressing concordance through association parameters, for example, (7.15), (7.16), and (7.19), we can calculate values of τ induced by MPLE for parameters of elliptic or Archimedean copulas and compare them to the sample values of , which is done in Table 7.4. Proximity of model-induced values of τ to the sample value may serve as a measure of the model fit and help to compare the model performance as in hybrid approach to model selection discussed above in Example 7.4.1. However, this comparison is still using single values to represent entire families.

We will assume that the classes of copulas we choose correspond to exhaustive and mutually exclusive hypotheses H1, H2, …, Hm. Posterior probabilities of hypotheses Hk, k = 1, …, m, for data D may be rewritten as

(7.35)numbered Display Equation

where we will consider all m hypotheses a priori equally likely. If the dependence between variables is positive for all hypotheses, we can assume τ ≥ 0. In this case the natural choice of prior for τ is beta distribution, and the choice of parameters for the prior can be subjective or noninformative objective. However, in the presence of relevant additional data, similar in nature, the prior might be suggested by sample concordance for this data consistently with empirical Bayes approach: . If we have multiple pairs of variables included in the study, estimates of parameters of the beta distribution for empirical Bayes can be obtained from all pairs of components.

We will not need to calculate the denominator of the posterior in (7.35). It suffices to calculate the weights

(7.36)numbered Display Equation

or, using Monte Carlo approach and drawing samples from the Beta prior, evaluate

(7.37)numbered Display Equation

Then we select the class with the highest weight and obtain the Bayes estimate of the association parameter using MCMC.

7.5 Copula Models of Joint Survival

This study has been mentioned in the introduction to Chapter 5 and some of its results were used to numerically illustrate copula models in Chapter 6. Now we will consider it in more detail. The data we use consists of 11,457 joint last survivor annuity contracts of a large Canadian insurer in payoff status over a 5-year observation period 1989–1993. Most of the sample represented heterosexual married couples. We will construct copula models for bivariate joint distributions of the couples. As discussed in Section 5.1, there exist multiple factors of dependence between two lives in a couple such as common shock, common lifestyle, or broken heart syndrome. To address all of these factors, it would be beneficial to model entire joint distribution, and copulas provide convenient tools for that.

Frees et al. [14] were the first to suggest copula models of joint survival in this context, and they used this dataset for their research. Shemyakin and Youn applied the same dataset for illustration of different estimation techniques in [40], [41], and [42], and we will use some results of these papers. However, we will restrict ourselves to joint first life analysis (time to the first death in a couple) and choose not to discuss more complicated joint last survivor models (time to the second death in a couple), which are built, for example, in [42]. The main objective of this example is to compare the results of maximum likelihood estimation to the results of Bayesian analysis with informative priors.

Estimating the joint first life survival function of a married couple requires observing a pair of associated lives until the first death. Each observation of a pair of associated lives (Li1, Li2) in a sample y = (y1, y2, ..., yn) represents a vector yi = (ai1, ai2, ti, ci), where aijj = 1, 2 are entry ages of lives Lij, ti is the time till termination, and ci is the censoring indicator:

numbered Display Equation

where Ti is the duration of observation period for yi. This setting represents right censoring (not all lives terminate during the observation period) and left truncation (represented by entry ages).

Conventional copula modeling based on two marginal distribution for male and female lives requires a very careful treatment because of the necessity to address both censoring and truncation. Therefore in this setting the idea of using conditional copula is very attractive. We will apply a copula function to conditional survival functions instead of the marginals so that the joint first life survival function

numbered Display Equation

numbered Display Equation

can be represented through conditional survival functions

numbered Display Equation

where the marginal survival functions Sj and the copula function Cα are left to be specified. Using Weibull marginals with scale parameters θj and shapes τj with a survival Gumbel–Hougaard copula brings about expression

numbered Display Equation

where for j = 1, 2,

numbered Display Equation

and the likelihood function

numbered Display Equation

with ci = 1 for i = 1, ..., r and ci = 0 for i = r + 1, ..., n.

Frees et al. [14] used a different model specification, not requiring conditional copulas. They also considered Frank’s copula along with Gumbel–Hougaard family, and Gompertz model for the marginals along with Weibull model. Prior to their work, most related actuarial studies of joint life assumed independence of the marginal distributions and in pricing joint life products applied special corrections compensating for underestimation of risks caused by this assumption. Following the guidelines of [14], Shemyakin and Youn applied one-step maximum likelihood estimation to Weibull marginals and survival Gumbel–Hougaard copula in [40]. Some of the results of estimation and standard errors are presented in Table 7.5. Model M1 assumes independence of the marginals, which can be treated as Gumbel–Hougaard copula with α = 1. Parameters in Model M2 were estimated by one-step full MLE [40]. The estimates from M2 were used in joint life examples in Chapter 6.

Table 7.5 Parameter estimates (with standard errors)

Parameters M1 M2 M3
Independence MLE Bayes
Female θ1 92.62 89.51 89.59
(0.59) (0.48) (0.6332)
τ1 9.98 9.96 8.614
(0.38) (0.34) (0.655)
Male θ2 86.32 85.98 87.06
(0.37) (0.40) (0.6324)
τ2 7.94 7.65 7.202
(0.26) (0.26) (0.68)
Association α 1 1.64 1.812
(0.49) (0.729)

Discrepancies between the marginal estimates for models M1 and M2 can be explained by construction of MLE, and more general model M2 should be preferable. However, with massive information available on male and female mortality (mortality tables), it seems reasonable to use Bayesian approach. Bayesian models, based on the conditional copula approach with Weibull marginals and stable copula, were first applied to joint mortality in [41]. The structure of priors suggested normal or lognormal for the shape and scale parameters, and diffuse or noninformative priors on the parameter of association. Hyperparameters for the shape and scale distributions can be chosen to fit the existing male and female ultimate mortality tables. This choice reflects the belief that a substantial amount of prior information on individual male and female mortality can be incorporated in a copula model improving the estimation of marginals.

Model M3 in Table 7.5 is based on applying copula structure to conditional survival functions. It requires Bayesian estimation of five parameters of the conditional copula function with likelihood (7.38) and the following priors: , , π(α)∝α− 1. The hyperparameters are determined from the US male and female mortality tables as: , , , , , , , .

Bayesian computation is performed using independent Metropolis–Hastings algorithm in special software package WinBUGs [35] with chain length of 10,000. In order to compare performance of the models, we calculate Bayesian Information Criterion (BIC) based on the likelihood (7.38) and using Model 1 as the baseline. In order to take into account the effect of censoring, we use the methodology of [46], suggesting

numbered Display Equation

where Lk and mk, k = 2, 3, are, respectively, maximum log-likelihoods and number of parameters for models Mk, r is the number of noncensored observations. We arrive at the lowest value of BIC for Model M3.

Table 7.5 demonstrates that the results of Bayesian estimation with informative priors may substantially differ from MLE, especially when a one-step full MLE procedure is applied. Bayesian approach in this case allows for the use of mortality tables representing substantial prior information on the marginals.

7.6 Related Failures of Vehicle Components

We will review the example from Section 3.6, where Markov chains were applied to model-related failures of vehicle components. In this section we will take a different approach: we will analyze time-to-failure (TTF) variables for different car parts using the methods of survival analysis similar to those applied to time-to-default (TTD) variables in Section 4.5. Now we will use copulas to model dependence between TTF variables for different vehicle components, providing a valid alternative to Markov chain analysis of related failures. Actually, copula approach has an advantage of modeling entire joint distribution of components, directly evaluating all relevant conditional probabilities and assessing the chances of a new failure given the history of vehicle repairs. In this example based on research paper [27], we will restrict ourselves to pair copulas and will not address data censoring and selection bias.

The main purpose of this example is to illustrate the principles of copula model selection considered in Section 7.4, choosing a family of copulas based on multiple pairs of variables. We will consider hybrid approach, information criteria, and finally, Bayesian model selection.

Let us recall the problem from Section 3.6. Five main components chosen out of 60 detected in Hyundai Accent warranty claim records are: the spark plug assembly (A), ignition coil assembly (B), computer assembly (C), crankshaft position sensor (D), and oxygen heated sensor (E). Data were recorded for the cars that had at least one of the five main components fail within the warranty period. We analyze TTF of each component measured in days from the car sale (or the previous repair) to the next repair. The ultimate goal is to suggest a pair copula model for the joint distribution of all 10 possible pairs of TTF variables of five engine assembly components. If one pair copula provides a better fit for most of the pairs, it can be used in further multidimensional analysis.

In the choice of copula families, the factor of tail dependence plays a very important role. Preliminary analysis indicated the presence of two tails of bivariate distributions corresponding to joint or related failures of two parts occurring either very early in the car history, or very late (close to the end of the warranty period). It is evident that both tails represent a special interest for car manufacturers and insurers. Therefore we will consider six different copula families: Clayton (H1) and survival Gumbel–Hougaard (H4), representing the left tail dependence, Gumbel–Hougaard (H2) and survival Clayton (H3), capturing the right tail behavior, and also two two-parameter copulas allowing for both: BB1 copulas (H5) defined in (6.29) and t-copulas (H6) defined in (6.14).

For each of these families we can define copula density as

numbered Display Equation

Consider all 10 possible pairs (X, Y) of TTF for components (A)–(E). For a matched i.i.d. sample (xi, yi), i = 1, ..., n, we use copulas to model the joint distribution of the underlying TTF variables X and Y. First, estimate the marginal distributions of X and Y as and to obtain the sample and then estimate the association parameter α for the copula .

IFM approach suggests using a sensible parametric model for marginals and then estimating the association. Properties of the estimates obtained by this approach were fully investigated in [21]. Weibull distribution often provides a good fit for individual parts’ TTFs, see Baik [3], Lawless [28], and also Wu [48]. However, here we concentrate on the identification of a good copula model of association. Therefore in order to minimize the role of possible misspecification of the marginal distributions, we apply semiparametric CMLE approach: use corresponding empirical distribution functions as the estimates and as suggested by Genest and Rivest [19] and followed by many other authors.

7.6.1 Estimation of Association Parameters

The pseudo-likelihood function for the copula estimation can be written as

numbered Display Equation

where and are marginal empirical distribution functions. Using a numerical implementation of the maximum likelihood method [6, 25], obtain for each class of copluas H1–H5 the CMLE (also for BB1 and and for t-copula). We present these values for each of the 10 pairs of five components in Table 7.6.

Table 7.6 Estimates of association for six classes of copulas

H1 H2 H3 H4 H5 H5 H6 H6
AB 3.02 3.12 2.95 3.14 0.62 2.48 0.92 2.00
AC 0.58 1.36 0.57 1.37 0.20 1.26 0.42 2.44
AD 1.05 1.61 0.93 1.64 0.52 1.33 0.54 2.00
AE 0.62 1.44 0.71 1.42 0.15 1.36 0.48 2.34
BC 0.74 1.41 0.58 1.44 0.44 1.19 0.46 2.37
BD 0.93 1.64 0.95 1.63 0.26 1.48 0.67 2.00
BE 0.81 1.57 0.87 1.54 0.27 1.41 0.57 3.46
CD 1.12 1.74 1.10 1.75 0.42 1.48 0.64 2.98
CE 0.62 1.46 0.73 1.42 0.15 1.37 0.49 2.51
DE 0.43 1.21 0.30 1.24 0.32 1.07 0.28 3.43

Notice that the number of degrees of freedom η was set to be at least 2, and this limit was attained for AB, AD, and BD.

The choice of a specific copula class from the set H1–H6 is important. Let us consider the example of early failure of both components A (spark plug assembly) and B (ignition coil assembly) in the first year of the vehicle use. There was a total of K failures for the first 365 days out of N vehicles on the record, corresponding to the relative frequency of the first-year failure of . If we use the parameter values from Table 7.6, Clayton’s copula H1 would suggest the probability of failure p1 = 0.365, copula H2 would yield p2 = 0.371, BB1 copula H5 would yield p5 = 0.373, and t-copula gives p6 = 0.387.

7.6.2 Comparison of Copula Classes

We will first use information criteria to determine which of the six copula classes H1–H6 provides the best fit. Applying Akaike information criterion (AIC) and Bayes information criterion (BIC) brings about the results in Tables 7.7 and 7.8. The larger negative values correspond to the better models. The best values of the criteria are boldfaced. Two-parametric class H6 provides the best fit in most of the cases.

Table 7.7 AIC values for six classes of copulas

H1 H2 H3 H4 H5 H6
AB −521.8 −633.4 −518.9 −637.7 −661.9 809.5
AC −57.8 −79.3 −64.6 −81.4 −82.1 103.7
AD −50.5 −54.1 −40.8 −58.8 −60.2 73.5
AE −46.6 −72.9 −57.0 −61.8 −72.7 89.0
BC −55.6 −51.6 −34.3 −63.6 −61.6 70.5
BD −41.3 −54.9 −39.9 −54.3 −55.0 81.9
BE −65.0 −83.2 −62.4 −77.5 −85.4 88.8
CD −106.4 −127.8 −102.0 −129.9 −135.6 143.3
CE −114.6 −179.5 −141.1 −150.7 −182.2 208.8
DE −24.4 −17.1 −10.8 −28.1 −23.9 28.8

Table 7.8 BIC values for six classes of copulas

H1 H2 H3 H4 H5 H6
AB −517.7 −629.2 −514.7 −633.5 −653.6 801.2
AC −53.7 −75.2 −60.5 −77.3 −74.0 95.5
AD −47.5 −51.1 −37.8 −55.7 −54.2 67.4
AE −42.8 −69.2 −53.3 −58.0 −65.2 81.5
BC −52.1 −48.1 −30.7 −60.0 −54.5 63.4
BD −38.3 −51.8 −36.8 −51.2 −48.8 75.7
BE −61.5 −79.6 −58.9 −73.9 −78.3 81.7
CD −102.8 −124.2 −98.4 −126.3 −128.4 136.1
CE −110.1 −174.9 −136.5 −146.2 −173.2 199.8
DE −20.8 −13.5 −7.2 24.5 −16.7 −21.7

The Kolmogorov–Smirnov distance defined in (7.2) measures the maximum distance between a cumulative distribution function (c.d.f.) and its empirical cumulative distribution function (e.c.d.f.), and is applicable to joint distributions as well. Table 7.9 summarizes the Kolmogorov–Smirnov distances between the joint e.c.d.f.s and model c.d.f.s corresponding to the MPLE of parameters for each copula class.

Table 7.9 dKS distances for six classes of copulas

H1 H2 H3 H4 H5 H6
AB 0.245 0.167 0.179 0.161 0.0317 0.0301
AC 0.379 0.115 0.115 0.111 0.0283 0.0274
AD 0.215 0.106 0.112 0.102 0.0600 0.0533
AE 0.144 0.215 0.166 0.132 0.0347 0.0314
BC 0.242 0.146 0.141 0.146 0.0368 0.0312
BD 0.414 0.157 0.173 0.158 0.0304 0.0344
BE 0.266 0.144 0.133 0.164 0.0287 0.0310
CD 0.324 0.178 0.175 0.206 0.0382 0.0368
CE 0.176 0.083 0.065 0.096 0.0256 0.0225
DE 0.498 0.264 0.230 0.228 0.0392 0.0341

From the computed Kolmogorov–Smirnov distance values, H5 and H6 are two best choices by far when compared to the other hypotheses. The distance values for H5 and H6 range from 0.0225 up to 0.06, whereas the other four combined range from 0.0653 up to 0.498. Each of H1–H4 misrepresents the e.c.d.f.s at some point, but H5 and H6 accurately represent the entirety of the data. H5 and H6 can be concluded as the obvious selections when comparing Kolmogorov–Smirnov distance values.

Tail dependence describes extreme co-movements between a pair of random variables in the tails of the distributions. The tail dependence coefficients for copulas C(u, v) was defined in (7.20) and (7.21). Certain copulas, such as the Gaussian copula, do not admit tail dependence (their tail dependence coefficients are equal to 0). Some copulas (e.g., H1–H4) exhibit only either upper or lower tail dependence, and the t-copula and BB1 family exhibit both. Thus, when selecting an accurate copula model for data it is important to consider whether the data displays upper and/or lower tail dependence. Formulas (7.22)–(7.26), from Section 7.2 can be used to calculate the lower and/or upper tail dependences of the selected five copula models: H1: λl = 2− 1/α, H2: λu = 2 − 21/α, H3: λu = 2− 1/α, H4: λl = 2 − 21/α, H5 reflects both lower and upper tail dependences, and they are allowed to be asymmetric: λl = 2− 1/αβ, and λu = 2 − 21/β, H6: .

All 10 pairs in Table 7.10 exhibit both upper and lower tail dependence. Since H5 and H6 are the only two classes that allow for both, they demonstrate the best fit, though the symmetry of the tails is forced for H6. Lower values for H5, especially at the lower tails, may indicate some problems.

Table 7.10 Tail dependence induced by the models

H1 (L) H2 (U) H3 (U) H4 (L) H5 (L) H5 (U) H6 (L=U)
AB 0.795 0.751 0.790 0.753 0.639 0.678 0.741
AC 0.300 0.334 0.297 0.340 0.063 0.268 0.313
AD 0.515 0.460 0.475 0.472 0.367 0.318 0.415
AE 0.324 0.384 0.376 0.365 0.032 0.336 0.351
BC 0.394 0.364 0.305 0.380 0.270 0.214 0.339
BD 0.475 0.473 0.483 0.471 0.161 0.404 0.501
BE 0.427 0.446 0.449 0.431 0.158 0.365 0.322
CD 0.540 0.511 0.532 0.515 0.325 0.405 0.404
CE 0.327 0.392 0.388 0.369 0.036 0.342 0.341
DE 0.200 0.228 0.099 0.250 0.136 0.094 0.184

All of these approaches (using AIC, BIC, Kolmogorov–Smirnov statistic, or tail dependence) can help us to determine the best class of pair copula models. However they share the same weakness: the analysis is restricted to the comparison of the single representatives of each class obtained by CMLE. The following section discusses a possibility to make conclusions based on multiple representatives of six classes H1–H6.

7.6.3 Bayesian Model Selection

We suggest Kendall’s concordance τ as a universal parameter, which can be evaluated for all six classes H1–H6 defined earlier. Sample concordance is a reasonable nonparametric estimator of τ. Using parametric estimates from Table 7.5 and formulas expressing concordance through association parameters (see definitions of classes H1 and H2, (7.15) and (7.16)), we can calculate the values of τ induced by CMLE for parameters of copula classes H1–H6, and compare them to the sample values of , which is done in Table 7.11.

Table 7.11 Sample and model-induced values of τ

H1 H2 H3 H4 H5 H6
AB 0.707 0.601 0.679 0.596 0.682 0.692 0.737
AC 0.269 0.224 0.264 0.222 0.269 0.278 0.278
AD 0.375 0.343 0.377 0.318 0.389 0.403 0.364
AE 0.314 0.235 0.307 0.262 0.291 0.316 0.320
BC 0.307 0.271 0.290 0.226 0.304 0.311 0.306
BD 0.400 0.318 0.389 0.323 0.387 0.402 0.471
BE 0.386 0.289 0.364 0.302 0.350 0.375 0.382
CD 0.452 0.360 0.425 0.354 0.429 0.442 0.443
CE 0.318 0.237 0.315 0.268 0.295 0.321 0.324
DE 0.188 0.177 0.174 0.130 0.193 0.194 0.182

Proximity of model-induced values of τ to the sample value may serve as a measure of the model fit and help to compare the model performance. However, this comparison is still using single values representing entire families.

Following Bretthorst [7] and Huard [20], in Section 7.4 we suggested to compare the data fit provided by models H1–H6 not at a single value of association parameter(s) obtained by MLE, but rather over the entire range of possible association values. This can be accomplished by specifying a prior distribution for association parameter(s) and integrating the likelihood with respect to the prior distribution. The problem is the difference of meaning and ranges of association parameters for different copula classes. However this problem is resolved by specifying a prior on τ.

There exist convenient formulas expressing association as a function of τ for all four one-parametric copulas H1–H4, inverting (7.15) and (7.16):

numbered Display Equation

for H1 and H3, and

numbered Display Equation

for H2 and H4.

In the case of the two-parametric class H5 we will use the conditional relationship between α and τ treating β as the nuisance parameter and using its MPLE estimate :

numbered Display Equation

This approach can be justified by the higher risks of early related failures corresponding to the lower tails of the joint distribution represented by α, which becomes the critical parameter for the class H5. Notice that in the two-parametric class H6, there is an additional parameter η (degrees of freedom), which is not directly related to τ. We will apply (7.19) and use CMLE in the further analysis.

We will assume that six classes H1–H6 represent exhaustive and mutually exclusive hypotheses. Posterior probabilities of these hypotheses may be rewritten as

where we will consider all six hypotheses a priori equally likely, the dependence between variables being positive which suggests τ ≥ 0. In this case the natural choice of prior for τ is Beta distribution, and the choice of parameters for the prior is suggested by sample concordance for the entire dataset consistently with empirical Bayes approach: . Estimates of parameters of the Beta distribution are obtained from 10 pairs of components as and .

We will not need to calculate the denominator of the posterior in (7.39). It suffices to calculate the weights

(7.40)numbered Display Equation

or, using Monte Carlo approach and drawing samples from the Beta prior, evaluate

(7.41)numbered Display Equation

Then we choose the class with the highest weight and obtain the Bayes estimate of the association parameter using MCMC.

Table 7.12 gives the values of the natural logarithms of weights calculated by N = 20, 000 runs of Monte Carlo sampling from the Beta prior. The highest values for each pair of components are boldfaced.

Table 7.12 Logs of posterior weights for H1–H6

H1 H2 H3 H4 H5 H6
AB 258.41 313.16 256.99 315.24 327.88 401.31
AC 27.84 38.83 31.18 39.88 41.14 52.22
AD 24.83 26.73 20.03 29.03 30.53 37.58
AE 22.40 35.87 27.71 30.29 36.60 45.09
BC 27.11 25.29 16.32 31.28 31.13 35.90
BD 20.29 27.12 19.59 26.82 27.98 41.48
BE 31.80 41.06 30.58 38.23 42.94 44.97
CD 52.49 63.13 50.29 64.16 67.75 71.96
CE 56.03 88.76 69.37 74.38 90.92 104.61
DE 10.96 7.46 3.61 13.12 11.88 14.52

7.6.4 Conclusions

As we can see from the warranty claim data, a substantial dependence is observed between the TTFs for automotive components related to engine subassembly of Hyundai Accent vehicles. This dependence cannot be ignored, because the assumption of independence would lead to grossly underestimated related failure risks and warranty costs. However it can be addressed using pair copula models. Open source software packages in R environment [6, 25] allow for a straightforward implementation of parametric and semiparametric estimation for different classes of copulas: direct and survival Archimedean copulas (Clayton and Gumbel–Hougaard classes), two-parameter Archimedean copulas (BB1), and elliptical copulas (Student t-class).

The problem of model selection a.k.a. an adequate choice of a copula class stays central, because misspecification of the class of copulas can be critical. For model comparison, one can consider using information criteria or Kolmogorov–Smirnov statistic as a good measure of overall fit, or studying tail dependence to assess the model quality. A convenient tool of model selection is provided by Bayesian framework using Kendall’s tau as the common parameter for different classes of copulas.

For TTFs of automotive components most of the tools of model selection indicate t-copulas being superior to four one-parameter Archimedean copulas and even to BB1. This fact can be explained by t-copulas exhibiting tail dependence in both lower and upper tails of the joint TTF distribution, while direct or survival Archimedean copulas concentrate at one of the tails or, as seems to be the case with BB1, underestimates tail dependence. However, it is not clear whether the assumption of symmetry of the tails of t-copulas is plausible and not too restrictive. One possible suggestion for future work is to use more complex hybrid or mixed Archimedean copula models as an alternative to t-copulas (see also Reference [26]). Another alternative would be to consider skewed or asymmetric t-copulas [9] or other more complex extensions of elliptical copula models.

References

  1. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723.
  2. Akaike, H. (1985), Prediction and entropy. In: A. C. Atkinson, and S. E. Fienberg (editors), A Celebration of Statistics, 1–24. Springer.
  3. Baik, J. (2010). Warranty analysis on engine: case study. Technical Report, Korea Open National University, Seoul, 1–25.
  4. Berg, D. (2007). Copula goodness-of-fit testing: an overview and power comparison. Statistical Research Report No. 5, University of Oslo, Oslo.
  5. Box, G. E. P., and Draper, N. R. (1987). Empirical Model-Building and Response Surfaces, John Wiley & Sons, Inc.
  6. Brechmann, E. C., and Schepsmeier, U. (2013). Modeling dependence with C- and D-vince copulas: the R package CDVine. Journal of Statistical Software, 52(3), 1–27.
  7. Bretthorst, G.L. (1996). An introduction to model selection using probability theory as logic. In: G. Heidbreger (editor, ) Maximum Entropy and Bayesian Methods, 1–42. Kluwer Academic Publishers.
  8. CDO Primer: The Bond Market Association (2004).
  9. Demarta, S., and McNeil, A. (2005). The t copulas and related copulas. International Statistical Review, 73, 111–129.
  10. Embrechts, P., and Hofert, M. (2014). Statistics and quantitative risk management for banking and insurance. Annual Review of Statistics and its Application, 1, 492–514.
  11. Embrechts, P., McNeil, A., and Straumann, D. (2003). Correlation and dependency in risk management: properties and pitfalls. In: M. Dempster, and H. K. Moffat (editors) Risk Management: Value at Risk and Beyond. Cambridge University Press, 176–223.
  12. Fermanian, J-D., and Scaillet, O. (2005). Some statistical pitfalls in copula modelling for financial applications. In: E. Klein (editor), Capital Formation, Governance and Banking, 59–74. New York: Nova Science Publication.
  13. Fermanian, J-D., Charpentier, A., and Scaillet, O. (2006). The estimation of copulas: theory and practice. In: J. Rank (editor), Copulas, from Theory to Application in Finance., New York: Risk Books.
  14. Frees, E. W., Carriere, J. F., and Valdez, E. (1996), Annuity valuation with dependence mortality. Journal of Risk and Insurance, 63, 2, 229–261.
  15. Gelman, A. (2013). Two simple examples for understanding posterior p-values whose distributions are far from unform. Electronic Journal of Statistics, 7, 2595–2602.
  16. Gelman, A., Carlin, J. B., Stern, H.S., and Rubin, D. B. (2004). Bayesian Data Analysis, 2nd ed. Texts in Statistical Science. CRC Press.
  17. Genest, C., Ghoudi, K., and Rivest, L. P. (1995). A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika, 82(3), 543–552.
  18. Genest, C., Remillard, B., and Beaudoin, D. (2009). Goodness-of-fit tests for copulas: a review and a power study. Insurance: Mathematics and Economics, 44, 199–213.
  19. Genest, C., and Rivest, L.-P. (1993). Statistical inference procedures for bivariate Archimedean copulas. Journal of the American Statistical Association, 88, 1034–1043.
  20. Huard, D., Evin, G., and Favre, A.-C. (2006). Bayesian copula selection. Computational Statistics and Data Analysis, 51(2), 809–822.
  21. Joe, H. (1997). Multivariate Models and Dependence Concepts. London: Chapman & Hall.
  22. Joe, H. (2005). Asymptotic efficiency of the two-stage estimation method for copula-based models. Journal of Multivariate Analysis, 94, 401–419.
  23. Joe, H. (2014). Dependence Modeling with Copulas. London: Chapman & Hall/CRC.
  24. Kim, G., Silvapulle, M., and Silvapulle, P. (2007). Comparison of semiparametric and parametric methods for estimating copulas. Computational Statistics and Data Analysis, 51(6), 2836–2850.
  25. Kojadinovic, I., and Yan, J. (2010). Modeling multivariate distributions with continuous margins using the copula R package. Journal of Statistical Software, 34(9), 1–20.
  26. Komornikova, M., and Komornik, J. (2010). A copula based approach to the analysis of returns of exchange rates to EUR of the Visegrad countries. Acta Polytechnica Humgarica, 7(3), 79–91.
  27. Kumerow, J., Lenz, N., Sargent, K., Shemyakin, A., and Wifvat, K. (2014). Modelling related failures of vehicle components via Bayesian copulas, ISBA-2014, Cancun, Mexico, § 307, 195.
  28. Lawless, J. F., Hu, J., and Cao, J. (1995). Methods for estimation of failure distribution and rates from automobile warranty data. Lifetime Data Analysis, 1, 227–240.
  29. Li, D.X. (2000) On default correlation: a copula function approach. Journal of Fixed Income, 9, 43–52.
  30. Lindley, D.V. (1957). A statistical paradox. Biometrika, 44, 187–192.
  31. Meng, X. L. (1994). Posterior predictive p-values. Annals of Statistics, 22(1), 1142–1160.
  32. McKenzie, D., and Spears, T. (2014). The formula that killed Wall Street: the Gaussian copula and modelling practices in investment banking. Social Studies of Science, 44, 393–417.
  33. Mikosch, T. (2006). Copulas: tales and facts. Extremes, 9, 3–20.
  34. Nicoloutsopoulos, D. (2005). Parametric and Bayesian non-parametric estimation of copulas, Ph.D. Thesis.
  35. Ntsoufras, I. (2009). Bayesian Modeling Using WinBUGs. London - New York: John Wiley & Sons, Inc.
  36. Okhrin, O., and Ristig, A. (2014). Hierarchical Archimedean copulae: the HAC package. Journal of Statistical Software, 58, 4.
  37. Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Annals of Statistics, 12, 1151–1172.
  38. Salmon, F. (2012) The formula that killed Wall Street. Significance, 9(1), 16–20.
  39. Schwarz, G. E. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2) 461–464.
  40. Shemyakin, A., and Youn, H. (2000). Statistical aspects of joint-life insurance pricing. 1999 Proceedings of American Statisical Association, 34–38.
  41. Shemyakin, A., and Youn, H. (2001). Bayesian estimation of joint survival functions in life insurance. In: Bayesian Methods with Applications to Science, Policy and Official Statistics, 489–496. European Communities.
  42. Shemyakin, A., and Youn, H. (2006) Copula models of joint last survivor insurance. Applied Stochastic Models of Business and Industry, 22(2), 211–224.
  43. Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and van der Linde, A. (2002). Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society, Series B, 64( 4), 583–639.
  44. Trafimow, D. and Marks, M. (2015). Editorial. Basic and Applied Social Psychology, 37(1), 1–2.
  45. Trivedi, K. S. (2008). Probability and Statistics with Reliability, Queueing and Computer Science Applications. London/New York: John Wiley & Sons, Inc.
  46. Volinsky, C.T., and Raftery, A.E. (2000). Bayesian information criterion for censored survival models. Biometrics, 56, 256–262.
  47. Wasserstein, R. L., and Lazar, N.A. (2016). The ASA’s statement on p-values: context, process, and purpose. American Statistician, 20(2), 129–133.
  48. Wu, J., McHenry, S., and Quandt, J. (2000). An application of Weibull analysis to determine failure rates in automotive components, NHTSA, U.S. Department for Transportation, Paper No.13-0027.

Exercises

  1. Calculate the values of association parameter corresponding to Kendall’s τ = 0.5 for Clayton’s, Frank’s, and Gumbel–Hougaard’s families. Use it to calculate values Cα(u = 0.1, v = 0.1) and Cα(u = 0.9, v = 0.9) for all three classes. Compare.

  2. Verify all results obtained in Table 7.4.

  3. Develop parametric models for the adjusted residual returns for daily index values of IGBM Madrid Stock Exchange and JSE Africa stock indexes in the file IGBMJSE.xlsx or use empirical distributions to estimate for both separately the probabilities of one-day drop by one or more standard deviations.

  4. Based on the models of the marginals from the previous problem, estimate copula parameters for (a) Gaussian copula model; (b) Clayton copula model. Use any of the statistical methods discussed in Chapter 7. Explain your choice of the method, and the choice of the copula between (a) and (b).

  5. Using results of the previous problem, estimate probability of simultaneous one-day drop of both indexes by one or more standard deviations.

  6. Suggest and implement a Bayesian model for data in Table 7.1, based on Clayton copula and non-informative priors.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset