Marie Kratz
ESSEC Business School, CREAR, Paris, France
AMS 2000 subject classification. 60F05; 62G32; 62G30; 62P05; 62G20; 91B30; 91G70.
A universally accepted lesson of the last financial crisis has been the urgent need to improve risk analysis within financial institutions. Taking into account extreme risks is recognized nowadays as a necessary condition for good risk management in any financial institution and not restricted anymore to reinsurance companies. Minimizing the impact of extreme risks, or even ignoring them because of a small probability of occurrence, has been considered by many professionals and supervisory authorities as a factor of aggravation of the last financial crisis. The American Senate and the Basel Committee on Banking Supervision confirm this statement in their report. Therefore it became crucial to include and evaluate correctly extreme risks. It is our goal here, when considering a portfolio of heavy-tailed risks, notably when the tail risk is larger than 2, that is, when there is a finite variance. It is the case when studying not only financial assets but also insurance liabilities. It concerns life insurance as well, because of investment risks and interest rates; to have omitted them was at the origin of the bankruptcy of several insurance life companies as, for instance, Executive Life in the United States, Mannheimer in Germany, or Scottish Widows in the United Kingdom.
When considering financial assets, because of a finite variance, a normal approximation is often chosen in practice for the unknown distribution of the yearly log returns, justified by the use of the central limit theorem (CLT), when assuming independent and identically distributed (i.i.d.) observations. Such a choice of modeling, in particular using light-tailed distributions, has shown itself grossly inadequate during the last financial crisis when dealing with risk measures because it leads to underestimating the risk.
Recently, a study was done by Furrer (2012) on simulated i.i.d. Pareto random variables (r.v.'s) to measure the impact of the choice and the use of the limiting distribution of aggregated risks, in particular for the computation of standard risk measures (value-at-risk or expected shortfall). In this study, the standard general central limit theorem (GCLT) (see, e.g., (Samorodnitsky and Taqqu, 1994)) is recalled, providing a limiting stable distribution or a normal one, depending on the value of the shape parameter of the Pareto r.v.'s. Then, considering Pareto samples of various sizes and for different values of the shape parameter, Furrer compared the distance between the empirical distribution and the theoretical limiting distribution; then computed the empirical value-at-risk (denoted VaR) and expected shortfall, called also tail value-at-risk (denoted ES or TVaR); and compared them with the ones computed from the limiting distribution. It appeared clearly that not only the choice of the limiting distribution but also the rate of convergence matters, hence the way of aggregating the variables. From this study, we also notice that the normal approximation appears really inadequate when considering aggregated risks coming from a moderately heavy-tailed distribution, that is, a Pareto with a shape parameter or tail index larger than 2, but below 4.
A few comments can be added to this study. First, the numerical results obtained in Furrer (2012) confirm what is already known in the literature. In particular, there are two main drawbacks when using the CLT for moderate heavy-tailed distributions (e.g., Pareto with a shape parameter larger than 2). On one hand, if the CLT may apply to the sample mean because of a finite variance, we also know that it provides a normal approximation with a very slow rate of convergence, which may be improved when removing extremes from the sample (see, e.g., (Hall, 1984)). Hence, even if we are interested only in the sample mean, samples of small or moderate sizes will lead to a bad approximation. To improve the rate of convergence, existence of moments of order larger than 2 is necessary (see, e.g., Section 3.2 in Embrechts et al. (1997) or, for more details, Petrov (1995)). On the other hand, we know that it has also been proved theoretically (see, e.g., (Pictet et al., 1998)) as well as empirically (see, e.g., (Dacorogna et al., 2001), Section 5.4.3) that the CLT approach applied to a heavy-tailed distributed sample does not bring any information on the tail and therefore should not be used to evaluate risk measures. Indeed, a heavy tail may appear clearly on high-frequency data (e.g., daily ones) but become not visible anymore when aggregating them in, for example, yearly data (i.e., short samples), although it is known, by Fisher theorem, that the tail index of the underlying distribution remains constant under aggregation. It is a phenomenon on which many authors insisted, as, for example, in Dacorogna et al. (2001). Figure 11.1 on the S&P 500 returns illustrate very clearly this last issue.
Based on these figures above, the plot of the S&P 500 daily returns from 1987 to 2007 helps to detect a heavy tail. When aggregating the daily returns into monthly returns, the plot looks more as a normal one, and the very few observations appearing above the threshold of , such as the financial crises of 1998 and 1987, could almost be considered as outliers, as it is well known that financial returns are symmetrically distributed.
Now, look at Figure 11.2. When adding data from 2008 to 2013, the plot looks pretty the same, that is, normal, except that another “outlier” appears ... with the date of October 2008! Instead of looking again on daily data for the same years, let us consider a larger sample of monthly data from 1791 to 2013.1 With a larger sample size, the heavy tail becomes again visible. And now we see that the financial crisis of 2008 does belong to the heavy tail of the distribution and cannot be considered anymore as an outlier. So we clearly see the importance of the sample size when dealing with moderately heavy tails to estimate the risk. Thus we need a method that does not depend on the sample size, but looks at the shape of the tail.
The main objective is to obtain the most accurate evaluation of the distribution of aggregated risks and of risk measures when working on financial data under the presence of fat tail. We explore various approaches to handle this problem, theoretically, empirically, and numerically. The application on log returns, which motivated the construction of this method, illustrates the case of time aggregation, but the method is general and concerns any type of aggregation, for example, of assets.
After reviewing briefly the existing methods, from the GCLT to extreme value theory (EVT), we will propose and develop two new methods, both inspired by the work of Zaliapin et al. (2005) in which the sum of i.i.d. r.v.'s is rewritten as the sum of the associated order statistics.
The first method, named Normex, answers the question of how many largest order statistics would explain the divergence between the underlying moderately heavy-tailed distribution and the normal approximation, whenever the CLT applies, and combines a normal approximation with the exact distribution of this number (independent of the size of the sample) of largest order statistics. It provides in general the sharpest results among the different methods, whatever the sample size is and for any heaviness of the tail.
The second method is empirical and consists of a weighted normal approximation. Of course, we cannot expect such a sharp result as the one obtained with Normex. However it provides a simple tool allowing to remain in the Gaussian realm. We introduce a shift in the mean and a weight in the variance as correcting terms for the Gaussian parameters.
Then we will proceed to an analytical comparison between the exact distribution of the Pareto sum and its approximation given by Normex before turning to the application of evaluating risk measures.
Finally a numerical study will follow, applying the various methods on simulated samples to compare the accuracy of the estimation of extreme quantiles, used as risk measures in solvency calculation.
In the rest of the chapter, with financial/actuarial applications in mind, and without loss of generality, we will use power law models for the marginal distributions of the risks such as the Pareto distribution.
will denote the integer part of any nonnegative real such that .
Let be the probability space on which we will be working.
Let and denote, respectively, the cumulative distribution function (cdf) and the probability density function (pdf) of the standard normal distribution and and the cdf and pdf of the normal distribution with mean and variance .
Let be a random variable (r.v.) Pareto (type I) distributed with shape parameter , pdf denoted by and cdf defined by
and probability density function (pdf) denoted by .
Note that the inverse function of is given by
Recall that for , and for , .
We will consider i.i.d. Pareto r.v.'s in this study and denote by the Pareto sum being an -sample with parent r.v. and associated order statistics .
When dealing with financial assets (market risk data), we define the returns as
being the daily price and representing the aggregation factor.
Note that we can also write
In what follows, we will denote by .
Further comments or questions
Is it still worth considering i.i.d. r.v.'s, whereas most recent research focus on dependent ones?
Concerning the i.i.d. condition, note that this study fills up a gap in the literature on the sum of i.i.d. moderate heavy r.v.'s (see, e.g., (Feller, 1966); (Hahn et al., 1991), and (Petrov, 1995)). Moreover, in our practical example of log returns (the motivation of this work), the independence condition is satisfied (see, e.g., (Taylor, 1986); (Dacorogna et al., 2001)) and hence is not a restriction in this case of time aggregation.
Another theoretical reason comes from the EVT; indeed we know that the tail index of the aggregated distribution corresponds to the one of the marginal with the heaviest tail and hence does not depend on considering the issue of dependence.
Finally, there was still mathematically a missing “brick” when studying the behavior of the sum of i.i.d. r.v.'s with a moderately heavy tail, for which the CLT applies (for the center of distribution!) but with a slow convergence for the mean behavior and certainly does not provide satisfactory approximation for the tail. With this work, we aim at filling up the gap by looking at an appropriate limit distribution.
Why considering Pareto distribution?
It is justified by the EVT (see, e.g., (Leadbetter et al., 1983); (Resnick, 1987), and (Embrechts et al., 1997)). Indeed recall the Pickands theorem (see (Pickands, 1975) for the seminal work) proving that for sufficiently high threshold , the generalized Pareto distribution (GPD) (with tail index and scale parameter ) is a very good approximation to the excess cdf of a r.v. defined by :
if and only if the distribution of is in the domain of attraction of one of the three limit laws. When considering risks under the presence of heavy tail, it implies that the extreme risks follow a GPD with a positive tail index (called also extreme value index) , which corresponds to say that the risks belong to the Fréchet maximum domain of attraction (see, e.g., (Galambos, 1978); (Leadbetter et al., 1983); (Resnick, 1987), or (Embrechts et al., 1997)). In particular, for ,
for some constant . It is then natural and quite general to consider a Pareto distribution (with shape parameter ) for heavy-tailed risks.
A natural extension would then be considering r.v.'s with other distributions belonging to the Fréchet maximum domain of attraction.
Limit theorems for the sum of i.i.d. r.v.'s are well known. Nevertheless, they can be misused in practice for various reasons such as a too small sample size, as we have seen. As a consequence, it leads to wrong estimations of the risk measures for aggregated data. To help practitioners to be sensitive to this issue, we consider the simple example of aggregated heavy-tailed risks, where the risks are represented by i.i.d. Pareto r.v.'s. We start by reviewing the existing methods, from the GCLT to EVT, before applying them on simulated Pareto samples to show the pros and cons of those methods.
with
Note that the tail distribution of satisfies (see (Samorodnitsky and Taqqu, 1994)):
When focusing on the tail of the distribution, in particular for the estimation of the risk measures, the information on the entire distribution is not necessary, hence the alternative of the EVT approach.
Recall the Fisher–Tippett theorem (see (Fisher and Tippett, 1928)) which states that the limiting distribution for the rescaled sample maximum can only be of three types: Fréchet, Weibull, and Gumbel. The three types of extreme value distribution have been combined into a single three-parameter family ((Jenkinson, 1955); (von Mises, 1936); 1985) known as the generalized extreme value (GEV) distribution given by
with (scale parameter), (location parameter), and (tail index or extreme value index). The tail index determines the nature of the tail distribution:
: Fréchet, : Gumbel, : Weibull.
Under the assumption of regular variation of the tail distribution, the tail of the cdf of the sum of i.i.d. r.v.'s is mainly determined by the tail of the cdf of the maximum of these r.v.'s. Indeed, we have the following lemma.
It applies of course to Pareto r.v.'s.
Combining (11.8) with the GEV limiting distribution in the case of -Pareto r.v.'s provides that the tail distribution of the rescaled sum of Pareto r.v.'s is asymptotically Fréchet:
where is defined as in (2.1).
An alternative approach to the GCLT one has been proposed by Zaliapin et al. (see (Zaliapin et al., 2005)) when the Pareto shape parameter satisfies , a case where the variance of the Pareto r.v.'s does not exist. The neat idea of the method is to rewrite the sum of the 's as the sum of the order statistics and to separate it into two terms, one with the first order statistics having finite variance and the other as the complement
They can then treat these two subsums separately. Even if not always rigorously developed in this paper, or, say, quite approximative, as we will see later, their method provides better numerical results than the GCLT does for any number of summands and any quantile. Nevertheless, there are some mathematical issues in this paper. One of them is that the authors consider these two subsums as independent. Another one is that they approximate the quantile of the total (Pareto) sum with the direct summation of the quantiles of each subsum, although the quantiles are not additive. For the case , they reduce the behavior of the sum arbitrarily to the last two upper order statistics.
Another drawback of this method would be, when considering the case , to remain with one sum of all terms with a finite variance, hence in general with a poor or slow normal approximation.
We are mainly interested in the case of a shape parameter larger than 2, since it is the missing part in the literature and of practical relevance when studying market risk data, for instance. For such a case, the CLT applies because of the finiteness of the second moment, but using it to obtain information on something else than the average is simply wrong in presence of fat tails, even if in some situations (e.g., when working on aggregated data or on short samples), the plot of the empirical distribution fits a normal one. The CLT only concentrates on the mean behavior; it is equivalent to the CLT on the trimmed sum (i.e., minus a given number of the largest order statistics (or tail)) (see (Mori, 1984)), for which the rate of convergence improves (see, e.g., (Hahn et al., 1991); (Hall, 1984)).
Inspired by Zaliapin et al.'s paper, we go further in the direction of separating mean and extreme behaviors in order to improve approximations, for any , and we build two alternative methods, called Normex and the weighted normal limit, respectively. It means to answer rigorously the question of how many largest order statistics , would explain the divergence between the underlying distribution and the normal approximation when considering a Pareto sum with or the stable approximation when considering a Pareto sum with .
Both methods rely initially on Zaliapin et al.'s approach of splitting the Pareto sum into a trimmed sum to which the CLT applies and another sum with the remaining largest order statistics. The main idea of the two methods is to determine in an “optimal way” (in order to improve at most the distribution approximation), which we are going to explain, the number that corresponds to a threshold when splitting the sum of order statistics into two subsums, with the second one constituted by the largest order statistics. We will develop these methods under realistic assumptions, dropping in particular Zaliapin's et al.'s assumption of independence between the two subsums. Our two methods differ from each other in two points:
Our study is developed on the Pareto example, but its goal is to propose a method that may be applied to any heavy-tailed distribution (with positive tail index) and to real data, hence this choice of looking for limit theorems in order to approximate the true (and most of the time unknown) distribution.
Let us start by studying the behavior of the trimmed sum when writing down the sum of the i.i.d. -Pareto r.v.'s (with ), , as
Much literature, since the 1980s, has been concerned with the behavior of trimmed sums by removing extremes from the sample; see, for example, Hall (1984), Mori (1984), and Hahn et al. (1991).
The main issue is the choice of the threshold , in order to use the CLT but also to improve its fit since we want to approximate the behavior of by a normal one.
We know that a necessary and sufficient condition for the CLT to apply on is to require the summands , , to be -r.v.'s. But we also know that requiring only the finitude of the second moment may lead to a poor normal approximation, if higher moments do not exist, as occurs, for instance, with financial market data. In particular, including the finitude of the third moment provides a better rate of convergence to the normal distribution in the CLT (Berry–Esséen inequality). Another information that might be quite useful to improve the approximation of the distribution of with its limit distribution is the Fisher index, defined by the ratio , which is a kurtosis index. The skewness of and measures the closeness of the cdf to . Hence we will choose based on the condition of existence of the fourth moment of the summands of (i.e., the first order statistics).
The following Edgeworth expansion involving the Hermite polynomials points out that requiring the finitude of the fourth moments appears as what we call the “optimal” solution (of course, the higher order moments exist, the finer the normal approximation becomes, but it would imply too strong conditions and difficult to handle). If denotes the cdf of the standardized defined by , then
uniformly in , with
The rate of convergence appears clearly as whenever , .
Note that in our Pareto case, the skewness and the excess kurtosis are, respectively,
Therefore we set (but prefer to keep the notation so that it remains general) to obtain what we call an “optimal” approximation. Then we select the threshold such that
which when applied to our case of -Pareto i.i.d. r.v.'s (using (11.17)) gives
This condition allows then to determine a fixed number as a function of the shape parameter of the underlying heavy-tailed distribution of the 's but not of the size of the sample. We can take it as small as possible in order to fit for the best both the mean and tail behaviors of . Note that we look for the smallest possible to be able to compute explicitly the distribution of the last upper order statistics appearing as the summands of the second sum . For this reason, based on condition (11.12), we choose
Let us summarize in Table 11.1 the necessary and sufficient condition on (for ) to have the existence of the th moments for the upper order statistics for , respectively, using (11.12) written as .
Table 11.1 Necessary and sufficient condition on for having ,
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
We deduce the value of the threshold satisfying (11.13) for which the fourth moment is finite according to the set of definition of : We notice from Table 11.2 that we would use Zaliapin et al.'s decomposition only when . When considering, as they do, , we would rather introduce the decomposition , with varying from 2 to 5 depending on the value of , to improve the approximation of the distribution of , if we omit the discussion on their conditions.
Table 11.2 Value of for having up to
with | ]2,4] | ||||||
= | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
First we apply known results on distribution of order statistics (see, e.g., (David and Nadaraja, 2003)) when considering Pareto distributions. Next we compute conditional distributions of order statistics, as well as conditional moments, to apply them to the Pareto case.
Distribution of Pareto order statistics
For -Pareto r.v.'s, the pdf of () and the pdf of the order statistics , (), with , are expressed, respectively, as
and, for ,
When considering successive order statistics, for , for , , with , we obtain
Moments of -Pareto order statistics satisfy (see also, e.g., (Zaliapin et al., 2005); Theorem 1)
and, for ,
Conditional distribution of order statistics. Application to Pareto r.v.'s
Now straightforward computations lead to new properties that will be needed to build Normex. We express them in the general case (and labeled), with the notation and , and then for Pareto r.v.'s.
We deduce from (11.14) and (11.15) that the pdf of given , for , is, for ,
and that the joint pdf of given , for , is, for ,
Using (11.14) and (11.16) provides, for ,
Then we can compute the first conditional moments. We obtain, using (11.18) and the change of variables ,
For , via (11.19) and the change of variables and , it comes
Moreover, the joint conditional distribution of given , for , denoted by , or when no ambiguity exists, is, for ,
from which we get back the well-known result that are independent of and when and are given and that the order statistics form a Markov chain.
(see (Kratz, 2014))
Whatever the size of the sample is, because of the small magnitude of , we are able to compute explicitly the distribution of the last upper order statistics appearing as the summands of the second sum defined in (11.10). The choice of allows also to obtain a good normal approximation for the distribution of the trimmed sum . Nevertheless, since and are not independent, we decompose the Pareto sum in a slightly different way than in (11.10) (but keeping the same notation), namely,
and use the property of conditional independence (recalled in Section 11.1.2) between the two subsums and conditional on (for ).
Then we obtain the following approximation of the distribution of , for (i.e., when the th moment of the largest order statistics does not exist).
Comments
and, for
where the convolution product can be numerically evaluated using either the recursive convolution equation , for , (it will be fast, being small) and or, if , the explicit expression (12) (replacing by ) given in Ramsay (2006).
Note that this lemma implies the result given in (11.7), and as a consequence in the Pareto case, we have
To estimate the quality of the approximation of the distribution of the Pareto sum , we compare analytically the exact distribution of with the distribution defined in Theorem 11.2. It could also be done numerically, as, for instance, in Furrer (2012) with the distance between two distributions and defined by , with . We will proceed numerically only when considering the tail of the distributions and estimating the distance in the tails through the VaR measure (see Section 11.4.3). When looking at the entire distributions, we will focus on the analytical comparison mainly for the case (with some hints for the case ). Note that it is not possible to compare directly the expressions of the VaR corresponding to, respectively, the exact and approximative distributions, since they can only be expressed as the inverse function of a cdf. Nevertheless, we can compare the tails of these two distributions to calibrate the accuracy of the approximative VaR since
Moreover, we will compare analytically our result with a normal approximation made on the entire sum (and not the trimmed one) since, for , the CLT applies and, as already noticed, is often used in practice.
Since Normex uses the exact distribution of the last upper order statistics, comparing the true distribution of with its approximation simply comes back to the comparison of the true distribution of i.i.d. r.v.'s with the normal distribution (when applying the CLT). Note that, when extending Normex to any distribution, an error term should be added to this latter evaluation; it comes from the approximation of the extreme distribution by a Pareto one.
Suppose . Applying the CLT gives the normal approximation , with and , where in the case of a Pareto sum, , and . We know that applying the CLT directly to leads to nonsatisfactory results even for the mean behavior, since, for any , the quantity , involving the third moment of and appearing in the error (11.11) made when approximating the exact distribution of by a normal one, is infinite for any . The rate of convergence in is reduced to . When , even if the rate of convergence improves because , we still have (because the fourth moment of does not exist), which means that we cannot get a rate of order .
Now let us look at the rate of convergence when approximating with .
Considering the exact distribution of the Pareto sum means taking, at given and for any , with i.i.d. r.v.'s with parent r.v. with finite th moment and pdf defined, for , by
Let us look at the three first moments of . The direct dependence is on (and ) and indirectly on since . We have
(note that , for any that we consider, and any ) and
using the expressions of and given in Theorem 11.2. A straightforward computation of the third centered moment of provides
where denotes the antiderivative of the function , that is, if ,
whereas, if ,
and, if ,
For simplicity, let us look at the case and consider the Berry–Esséen inequality. For , we would use the Edgeworth expansion, with similar arguments as developed later. Various authors have worked on this type of Berry–Esséen inequality, in particular to sharpen the accuracy of the constant appearing in it. In the case of Berry–Esséen bounds, the value of the constant factor has decreased from 7.59 by Esséen (1942) to 0.4785 by Tyurin (2010), to 0.4690 by Shevtsova (2013) in the i.i.d. case, and to 0.5600 in the general case. Note also that these past decades, much literature ((Stein, 1972, 1986); (Chen and Shao, 2004); (Cai, 2012); (Pinelis, 2013), etc.) has been dedicated to the generalization of this type of inequality, such as the remarkable contribution by Stein.
We can propose the following bound.
Note that depends on the two parameters and . We represent this function on a same plot for a given value of but for various values of , namely, , and 1000, respectively, to compare its behavior according to the parameter . Then we repeat the operation for different , namely, for , respectively (Figure 11.3).
We observe that the bound is an increasing then decreasing function of , with a maximum less than , which is decreasing with and . The -coordinate of the maximum is proportional to , with the proportion decreasing with . The interval on the -axis for which the error is larger than has a small amplitude, which is decreasing with .
We show in Table 11.3 the values of the coordinates of the maximum of computed on R for and (corresponding to aggregating weekly returns to obtain yearly returns), 100, 250 (corresponding to aggregating daily returns to obtain yearly returns), 500, 1000, respectively.
Table 11.3 Coordinates of the maximum of (defined in (11.35)), as a function of and
52 | 101 | 4.9 | 86 | 4.9 | 78 | 4.9 |
100 | 196 | 4.6 | 166 | 4.6 | 150 | 4.6 |
250 | 494 | 4.2 | 417 | 4.1 | 376 | 4.0 |
500 | 990 | 3.9 | 834 | 3.7 | 751 | 3.5 |
1000 | 1984 | 3.6 | 1667 | 3.3 | 1501 | 3.0 |
Hence the result of Proposition 11.2.
Indeed, we have
Note that the Berry–Esséen inequality has been proved by Petrov to hold also for probability density functions (see (Petrov, 1956) or (Petrov, 1995)). It has been refined by Shevtsova (2007), and we will use her result to evaluate . We need to go back to the pdf of the standardized sum of i.i.d. r.v.'s with pdf , which can be expressed as
It is straightforward to show by induction that
Then, since , we can write
Since we consider a sum of i.i.d. r.v.'s () with parent r.v. having a finite th moment, we obtain via (Petrov, 1956) and (Shevtsova, 2007) that there exists a constant such that
where is defined in (11.34).
Hence, combining (11.36) and (11.37) gives
from which we deduce that
As in the case (Proposition 11.2), this bound could be computed numerically.
In this method, we go back to the first decomposition (11.10) of and use limit theorems for both terms and instead of proceeding via conditional independence and considering a small given . It means that we need to choose as a function of such that as for the approximation of the distribution of via its limit to be relevant.
First we consider a normal approximation for the trimmed sum , which implies some conditions on the threshold (see (Csörgö et al., 1986)). We need to select a threshold such that
Note that the condition (11.12) will be implied by the condition . Hence, for this method, does not depend directly on the value of .
We can then enunciate the following.
Note that is chosen in such a way that is finite. The case corresponds to the one developed in Zaliapin et al. (but with a different set of definition for ).
Let us turn now to the limit behavior of the partial sum . The main idea of this method relies on using an estimation (involving the last order statistics) of the expected shortfall of defined for an -Pareto r.v. by , being the confidence level (see Section 11.4.1), in order to propose an approximation for the second term . So it implies to assume , that is, .
Let us recall the following result (see (Acerbi and Tasche, 2002) for the proof or (Embrechts et al., 1997)) that we are going to use.
In other words, expected shortfall at confidence level can be thought of as the limiting average of the upper order statistics from a sample of size from the loss distribution.
Now we can enunciate the main empirical result.
Comments
Variance and standard deviation were historically the dominating risk measures in finance. However, they require the underlying distribution to have a finite second moment and are appropriate for symmetric distributions. Because of this restricted frame, they have often been replaced in practical applications by VaR, which was, until recently, the most popular downside risk measure in finance. VaR started to be criticized for a number of different reasons. Most important are its lack of the subadditivity property and the fact that it completely ignores the severity of losses in the far tail of the loss distribution. The coherent risk measure expected shortfall was introduced to solve these issues. Two years ago, ES has been shown not to be elicitable (Gneiting, 2012). Hence the search, meantime, of coherent and elicitable alternatives, as, for instance, expectiles (Bellini et al., 2013); (Ziegel, 2014). Properties of these popular risk measures, like coherence, comonotonic additivity, robustness, and elicitability, as well as their impact on important issues in risk management like diversification benefit and capital allocation, have been discussed in a recent paper (Emmer et al., 2015).
Here we are going to consider only the risk measures used in solvency calculations (the other risk measures would be treated in the same way), namely, the value-at-risk, denoted VaR, and the expected shorfall (named also tail value-at-risk) ES (or TVaR), of an r.v. with continuous cdf (and inverse function denoted by ):
We will simplify the notation of those risk measures writing or when no confusion is possible.
Note that, in the case of an -Pareto distribution, analytical expressions of those two risk measures can be deduced from (11.2), namely,
Recall also that the shape parameter totally determines the ratio when we go far enough out into the tail:
Note that this result holds also for the GPD with shape parameter .
When looking at aggregated risks , it is well known that the risk measure ES is coherent (see (Artzner et al., 1999)). In particular it is subadditive, that is,
whereas VaR is not a coherent measure, because it is not subadditive. Indeed many examples can be given where VaR is superadditive, that is,
see, e.g., Embrechts et al., (2009), Daníelsson et al., (2005).
In the case of -Pareto i.i.d. r.v.'s, the risk measure VaR is asymptotically superadditive (subadditive, respectively) if (, respectively).
Recently, numerical and analytical techniques have been developed in order to evaluate the risk measures VaR and ES under different dependence assumptions regarding the loss r.v.'s. It certainly helps for a better understanding of the aggregation and diversification properties of risk measures, in particular of noncoherent ones such as VaR. We will not review these techniques and results in this report, but refer to Embrechts et al. (2013) for an overview and references therein. Let us add to those references some recent work by Mikosch and Wintenberger (2013) on large deviations under dependence which allows an evaluation of VaR. Nevertheless, it is worth mentioning a new numerical algorithm that has been introduced by Embrechts et al., (2013), which allows for the computation of reliable lower and upper bounds for the VaR of high-dimensional (inhomogeneous) portfolios, whatever the dependence structure is.
As an example, we treat the case of one of the two main risk measures and choose the VaR, since it is the main one used for solvency requirement. We would proceed in the same way for the expected shortfall.
It is straightforward to deduce, from the various limit theorems, the approximations of the VaR of order of the aggregated risks, , that is, the quantile of order of the sum defined by . The index indicates the chosen method, namely, (i) for the GCLT approach, (ii) for the CLT one, (iii) for the max one, (iv) for the Zaliapin et al.'s method, (v) for Normex, and (vi) for the weighted normal limit. We obtain the following:
Since there is no explicit analytical formula for the true quantiles of , we will complete the analytical comparison of the distributions of and given in Section 11.3.2.2, providing here a numerical comparison between the quantile of and the quantiles obtained by the various methods seen so far.
Nevertheless, in the case , we can compare analytically the VaR obtained when doing a rough normal approximation directly on , namely, , with the one obtained via the shifted normal method, namely, . So, we obtain the correcting term to the CLT as
We simulate with parent r.v. -Pareto distributed, with different sample sizes, varying from (corresponding to aggregating weekly returns to obtain yearly returns) through (corresponding to aggregating daily returns to obtain yearly returns) to representing a large size portfolio.
We consider different shape parameters, namely, , respectively. Recall that simulated Pareto r.v.'s 's ( can be obtained simulating a uniform r.v. on and then applying the transformation .
For each and each , we aggregate the realizations 's (). We repeat the operation times, thus obtaining realizations of the Pareto sum , from which we can estimate its quantiles.
Let denote the empirical quantile of order of the Pareto sum (associated with the empirical cdf and pdf ) defined by
Recall, for completeness, that the empirical quantile of converges to the true quantile as and has an asymptotic normal behavior, from which we deduce the following confidence interval at probability a for the true quantile: , where can be empirically estimated for such a large . We do not compute them numerically: being very large, bounds are close.
We compute the values of the quantiles of order , ( indicating the chosen method), obtained by the main methods, the GCLT method, the Max one, Normex, and the weighted normal method, respectively. We do it for various values of and . We compare them with the (empirical) quantile obtained via Pareto simulations (estimating the true quantile). For that, we introduce the approximative relative error:
We consider three possible order : , (threshold for Basel II) and (threshold for Solvency 2).
We use the software R to perform this numerical study, with different available packages. Let us particularly mention the use of the procedure Vegas in the package R2Cuba for the computation of the double integrals. This procedure turns out not to be always very stable for the most extreme quantiles, mainly for low values of . In practice, for the computation of integrals, we would advise to test various procedures in R2Cuba (Suave, Divonne, and Cuhre, besides Vegas) or to look for other packages. Another possibility would be implementing the algorithm using altogether a different software, as, for example, Python.
All codes and results are obtained for various and are given in Kratz (2013) (available upon request) and will draw conclusions based on all the results.
We start with a first example when to illustrate our main focus, when looking at data under the presence of moderate heavy tail. We present here the case in Table 11.4.
Table 11.4 Approximations of extreme quantiles (95%; 99%; 99.5%) by various methods (CLT, Max, Normex, weighted normal) and associated approximative relative error to the empirical quantile , for n = 52, 100, 250, 500 respectively, and
(%) | |||||
95% | 103.23 | 104.35 | 102.60 | 103.17 | 109.25 |
1.08 | 0.61 | 0.06 | 5.83 | ||
99% | 119.08 | 111.67 | 117.25 | 119.11 | 118.57 |
6.22 | 1.54 | 0.03 | 0.43 | ||
99.5% | 128.66 | 114.35 | 127.07 | 131.5 | 121.98 |
11.12 | 1.24 | 2.21 | 5.19 | ||
(%) | |||||
95% | 189.98 | 191.19 | 187.37 | 189.84 | 197.25 |
0.63 | 1.38 | 0.07 | 3.83 | ||
99% | 210.54 | 201.35 | 206.40 | 209.98 | 209.74 |
4.36 | 1.96 | 0.27 | 0.38 | ||
99.5% | 222.73 | 205.06 | 219.14 | 223.77 | 214.31 |
7.93 | 1.61 | 0.47 | 3.78 | ||
(%) | (%) | (%) | |||
95% | 454.76 | 455.44 | 446.53 | 453.92 | 464.28 |
0.17 | 1.81 | 0.18 | 2.09 | ||
99% | 484.48 | 471.5 | 473.99 | 483.27 | 483.83 |
2.68 | 2.17 | 0.25 | 0.13 | ||
99.5% | 501.02 | 477.38 | 492.38 | 501.31 | 490.98 |
4.72 | 1.73 | 0.06 | 2.00 | ||
(%) | (%) | (%) | |||
95% | 888.00 | 888.16 | 872.74 | 886.07 | 900.26 |
0.02 | 1.72 | 0.22 | 1.38 | ||
99% | 928.80 | 910.88 | 908.97 | 925.19 | 927.80 |
1.93 | 2.14 | 0.39 | 0.11 | ||
99.5% | 950.90 | 919.19 | 933.23 | 948.31 | 937.89 |
3.33 | 1.86 | 0.27 | 1.37 |
Let us also illustrate in table 11.5 the heavy tail case, choosing, for instance, , which means that . Take, for example, respectively, to illustrate the fit of Normex even for small samples. Note that the weighted normal does not apply here since .
Table 11.5 Approximations of extreme quantiles (95%; 99%; 99.5%) by various methods (GCLT, Max, Normex) and associated approximative relative error to the empirical quantile , for , respectively, and for
(%) | ||||
95% | 246.21 | 280.02 | 256.92 | 245.86 |
13.73 | 4.35 | 0.14 | ||
99% | 450.74 | 481.30 | 455.15 | 453.92 |
6.78 | 0.97 | 0.71 | ||
99.5% | 629.67 | 657.91 | 631.66 | 645.60 |
4.48 | 0.31 | 2.53 | ||
(%) | ||||
95% | 442.41 | 491.79 | 456.06 | 443.08 |
11.16 | 3.09 | 0.15 | ||
99% | 757.82 | 803.05 | 762.61 | 761.66 |
5.97 | 0.63 | 0.51 | ||
99.5% | 1031.56 | 1076.18 | 1035.58 | 1032.15 |
4.33 | 0.39 | 0.06 |
From the results we obtained, Normex appears as the best method among the ones we studied, applicable for any and . This comparison was done on simulated data. A next step will be to apply it on real data.
Let us sketch up a step-by-step procedure on how Normex might be used and interpreted in practice on real data when considering aggregated heavy-tailed risks.
We dispose of a sample , with unknown heavy-tailed cdf having positive tail index . We order the sample as and consider the aggregated risks that can be rewritten as .
The main motivation of this study was to propose a sharp approximation of the entire distribution of aggregate risks when working on financial or insurance data under the presence of fat tails. It corresponds to one of the daily duties of actuaries when modeling investment or insurance portfolios. In particular the aim is to obtain the most accurate evaluations of risk measures. After reviewing the existing methods, we built two new methods, Normex and the weighted normal method. Normex is a method mixing a CLT and the exact distribution for a small number (defined according to the range of and the choice of the number of existing moments of order ) of the largest order statistics. The second approach is based on a weighted normal limit, with a shifted mean and a weighted variance, both expressed in terms of the tail distribution.
In this study, Normex has been proved, theoretically as well as numerically, to deliver a sharp approximation of the true distribution, for any sample size and for any positive tail index , and is generally better than existing methods. The weighted normal method consists of trimming the total sum by taking away a large number of extremes and approximating the trimmed sum with a normal distribution and then shifting it by the (almost sure) limit of the average of the extremes and correcting the variance with a weight depending on the shape of the tail. It is a simple and reasonable tool, which allows to express explicitly the tail contribution to be added to the VaR when applying the CLT to the entire sample. It has been developed empirically in this work and still requires further analytical study. It constitutes a simple and exploratory tool to remediate the underestimation of extreme quantiles over 99% .
An advantage of both methods, Normex and the weighted normal, is their generality. Indeed, trimming the total sum by taking away extremes having infinite moments (of order ) is always possible and allows to better approximate the distribution of the trimmed sum with a normal one (via the CLT). Moreover, fitting a normal distribution for the mean behavior can apply, not only for the Pareto distribution but for any underlying distribution, without having to know about it, whereas for the extreme behavior, we pointed out that a Pareto type is standard in this context.
Normex could also be used from another point of view. We could apply it for a type of inverse problem to find out a range for the tail index when fitting this explicit mixed distribution to the empirical one. Note that this topic of tail index estimation has been studied extensively in the literature on the statistics of extremes (see, e.g., (Beirlant et al., 2004); (Reiss and Thomas, 2007), and references therein). Approaches to this estimation may be classified into two classes, supervised procedures in which the threshold to estimate the tail is chosen according to the problem (as, e.g., for seminal references, the weighted moments (Hosking and Wallis, 1987)), the MEP (Davison and Smith, 1990); (Hill, 1975), (Kratz and Resnick, 1996); (Beirlant et al., 1996)) methods, and unsupervised ones, where the threshold is algorithmically determined (as, e.g., in (Bengio and Carreau, 2009) and references therein, (Debbabi and Kratz, 2014)). Normex would then be classified as a new unsupervised approach, since the is chosen algorithmically for a range of .
Other perspectives concern the application of this study to real data, its extension to the dependent case, using CLT under weak dependence and some recent results on stable limits for sums of dependent infinite variance r.v. from Bartkiewicz et al. (2012) and large deviation principles from (Mikosch and Wintenberger, 2013).
Finally this study may constitute a first step in understanding the behavior of VaR under aggregation and be helpful in analyzing the scaling behavior of VaR under aggregation, next important problem that we want to tackle.