Isabel Fraga Alves1 and Cláudia Neves2
1CEAUL, University of Lisbon, Portugal
2CEAUL, Portugal and Department of Mathematics and Statistics, University of Reading, United Kingdom
“It seems that the rivers know the theory. It only remains to convince the engineers of the validity of this analysis.”
–Emil Julius Gumbel (1891–1966)
In this chapter we give an introduction to the most important results in extreme value theory (EVT) with a flavor of how they can be applied in practice. EVT is the theory underpinning the study of the asymptotic distribution of extreme or those rare events, which can be considered huge relatively to the bulk of observations. Relying on well-founded theory on which parametric or semiparametric statistical models are built for handling with rare events, EVT is the adequate theory for modeling and measuring events which occur with a very small probability. EVT has proven to be a powerful and useful tool to describe atypical situations that may have a significant impact in many application areas, where knowledge of the behavior of the tail of the actual distribution is in demand. The main objective is to tackle the problem of modeling rare phenomena with large magnitude, hence lying outside the range of the available observations (out-of-sample).
The typical question we would like to answer is
If things go wrong, how wrong can they go?
which in a certain sense is the mitigation attitude to Murphy's law:
If anything can go wrong, it will!
In fact, the statistical analysis of extremes is the key step in the analysis of many risk management problems related not only to insurance, reinsurance, and finance in general but also in other fields as geophysics and environment, where the analysis of extremes is of primordial importance, as it happens with sea levels, river levels, snow avalanches, wind speeds, temperatures, rainfall, snow, air pollution, storms, hurricanes, earthquakes, or even other areas as Internet traffic, reliability, and athletics. One should not forget natural hazards with extreme consequences for the society, often entailing big fatalities with the loss of human lives. For instance, one learns from catastrophic events such as the 9 min of Lisbon earthquake and tsunami in 1755 (Figures 4.1 and 4.2).
When we are dealing with financial or even meteorological data, there are two situations which matter to differentiate: the case where the data highly concentrates around the average value, with none of these observed values being dominant, and the case where a few observations overpower the remainder of the sample by their large (or low) magnitude. Since the latter can have a very negative impact, it is important to quantify its occurrence. Typically, one is interested in the analysis of maximal (or minimal) observations and records over time, since these may entail the negative consequences. Reinsurance is also a good example of this: the reinsurance premium needs to be computed to withstand the extremal behavior of the claims process. Another problem concerns the so-called return period (or waiting time period) for high levels , and corresponds to average return period, for a random variable (r.v.) of interest exceeds the high level ; close to this, the dual problem of return levels is also most important in applications. In hydrology, design levels typically correspond to return periods of 100 years or more; however, time series of 100 or more years are rare. A model for extrapolation is required and here intervenes the EVT. More precisely, suppose the problem consists in estimating the tail probability associated with an r.v. with cumulative distribution function (c.d.f.) :
with being small, that is, a near-zero probability. This entails a large () quantile , so that is approaching the right endpoint of defined as . On the other hand, and in the context of financial variables, for instance, a primary tool for assessment of financial risks is the value-at-risk, VaR(), which is nothing more than a -quantile for the distribution of returns for very small probability of an adverse extreme price movement that is expected to occur. Bearing the previous estimation purpose in mind, suppose that is an ordered sample of observations from the distribution function . One can use the empirical distribution function (e.d.f.), defined by
, where and . For small , the e.d.f. , defined as the proportion of values , can however lead us to a null estimated probability, and, clearly, we cannot assume that these extreme values are simply “impossible”! With the purpose of VaR estimation, this is the same to say that the historical simulation fails. On the other hand, the classical theory allows a possibly inadequate methodology in such a way that a specific probabilistic model would be fitted to the whole sample, for instance, the normal , and use that model to estimate tail probability as (notation: is the c.d.f. of a r.v.), with estimated mean value and standard deviation . But what if the variance or even the mean value does not exist? Then the central limit theorem (CLT) does not apply, and the classical theory, dominated by the normal distribution, is no more pertinent. These types of problems associated with rare events are very important, since the consequences can be catastrophic. When we deal with log returns in finance, for instance, most of the observations are central, and a global fitted distribution will rely mainly in those central observations, while extreme observations will not play a very important role because of their scarcity (see Figure 4.3 for illustration); those extreme values are exactly the ones that constitute the focus for traders, investors, asset managers, risk managers, and regulators. Hence EVT reveals useful in modeling the impact of crashes or situations of extreme stress on investor portfolios. The classical result in EVT is Gnedenko's theorem (Gnedenko, 1943). It establishes that there are three types of possible limiting distributions (max-stable) for maxima of blocks of observations—annual maxima (AM) approach—which are unified in a single representation, the generalized extreme value (GEV) distribution. The second theorem in EVT is the so-called Pickands–Balkema–de Haan theorem (Balkema and de Haan, 1974 and Pickands, 1975). Loosely speaking, it allows us to approach the generalized Pareto (GP) distribution to the excesses of high thresholds—peaks-over-threshold (POT) approach—for distributions in the domain of a GEV distribution. Complementary to these parametric approaches, we also pick up a possible semiparametric approach, comparing it with the previous ones. In order to present some of the basic ideas underlying EVT, in the next section we discuss the most important results on the univariate case under the simplifying independent and identically distributed (“i.i.d.”) assumption; for instance, in insurance context, losses will be i.i.d., as in risk models for aggregated claims, and most of the results can be extended to more general models.
Extreme value analysis (EVA) can be broadly described as the branch of statistics that focuses on inference for a c.d.f., , near the endpoint of its support. In the univariate case, one usually considers the upper tail, that is, the survival function , in the neighborhood of the right endpoint of the distribution, . The most powerful feature of EVT results is the fact that the type of limiting distribution for extreme values does not depend on the exact common c.d.f. , but depend only on the tail; this allows us to “neglect” the precise form of the unknown c.d.f. and pay attention only to the tail. Then, a semiparametric approach enables inference for rare events. It is possible to apply large sample results in EVT by assuming the sample size toward infinity.
Let be a sample of i.i.d. r.v.'s, with c.d.f. , and let the corresponding nondecreasing order statistics (o.s.'s) be . In particular, and represent the sample minimum and the sample maximum, respectively. We will focus only on the results about the sample maximum, since analogous results for the sample minimum can be obtained from those of the sample maximum using the device
The exact distribution of can be obtained from the c.d.f. , as follows:
Notice that, as , the c.d.f. of the partial maxima converges to a degenerate distribution on the right endpoint , that is,
Figure 4.4 illustrates this behavior of the c.d.f. of the sample maximum for several beta distributions. 1 From the two top rows of Figure 4.4, we clearly see that as the sample size increases, the c.d.f. of the maximum approaches a degenerate distribution on the right endpoint equal to one. The following theorem expounds this result in a slightly stronger statement.
Moreover, the strong convergence also holds (notation: almost sure convergence). Since has a degenerate asymptotic distribution, a suitable normalization for is thus required in order to attain a real limiting distribution, which constitutes one key step for statistical inference on rare events. We henceforth consider a linear normalization for the partial maxima of the sequence of i.i.d. r.v.'s, , for real sequences and , with positive scale , . Then
If we look at the two bottom rows of Figure 4.4, it is clear that for beta models with different shapes, it is possible that the linearized maximum has exactly the same asymptotic distribution. Indeed, what is determinant for that is the shape of the probability density function next to the right endpoint.
We now explore the possible limiting distributions for the linearized maxima. In this sequence, we assume there exist real constants and such that
(notation: convergence in distribution) where is a nongenerate r.v. with c.d.f. , that is, we have that
for every continuity point of . The first problem is to determine which c.d.f.'s may appear as the limit in (4.1)—extreme value distributions (EVD). First, we introduce the notion of “type.”
It means that and are the same, apart from location and scale parameters, that is, they belong to the same location/scale family. The class of EVD essentially involves three types of extreme value distributions, types I, II, and III, defined as follows.
The three types can be expressed by the corresponding location/scale families, with location and scale , with :
Among these three families of distribution functions, the type I is the most commonly referred in discussions of extreme values (see also Figures 4.5 and 4.6). Indeed, the Gumbel distribution is often coined “the” extreme value distribution (see Figure 4.7).
The following short biographical notes are borrowed from an entry in International Encyclopedia of Statistical Science (Lovric, 2011).
The Gumbel distribution, named after one of the pioneer scientists in practical applications of the EVT, the German mathematician Emil Gumbel (1891–1966), has been extensively used in various fields including hydrology for modeling extreme events. Gumbel applied EVT on real-world problems in engineering and in meteorological phenomena such as annual flood flows2 (Gumbel, 1958).
The EVD of type II was named after Maurice Fréchet (1878–1973), a French mathematician who devised one possible limiting distribution for a sequence of maxima, provided convenient scale normalization3 (Fréchet, 1927). In applications to finance, the Fréchet distribution has been of great use apropos the adequate modeling of market returns which are often heavy tailed.
The EVD of type III was named after Waloddi Weibull (1887–1979), a Swedish engineer and scientist well known for his work on strength of materials and fatigue analysis4 (Weibull, 1939). Even though the Weibull distribution was originally developed to address the problems for minima arising in material sciences, it is widely used in many other areas thanks to its flexibility. If , the Weibull distribution function for minima, , reduces to the exponential model, whereas for it mimics the Rayleigh distribution which is mainly used in the telecommunications field. Furthermore, resembles the normal distribution when .
Richard von Mises (1883–1953) studied the EVT in 1936 (see von Mises, 19361936), establishing the well-known von Mises sufficient conditions on the hazard rate (assuming the density exists), leading to one of the aforementioned three types of limit law, while providing one extreme domain of attraction . Later on, and motivated by a storm surge in the North Sea (31 January–1 February 1953) which caused extensive flooding and many causalities, the Netherlands government gave top priority to understanding the causes of such tragedies with a view to risk mitigation. The study of the sea-level maxima projected EVT to a Netherlands scientific priority. A celebrated work in the field is the doctoral thesis of Laurens de Haan (1970). The fundamental extreme value theorem, worked out by Fisher–Tippett (1928) and Gnedenko (1943), ascertains the GEV distribution in the von Mises–Jenkinson parametrization (von Mises, 1936; Jenkinson, 1955) as a unified version of all possible nondegenerate weak limits of the partial maxima of a sequence of i.i.d. random variables.
Notice that for , and , the c.d.f. reduces to Weibull, Gumbel and Fréchet distributions, respectively. More precisely,
The EVI is closely related to the tail heaviness of the distribution. In that sense, the value concerns exponential tails, with finite or infinite right endpoint and can be regarded as a change point: refers to short tails with finite right endpoint , whereas for the c.d.f.'s have a polynomial decay, that is, are heavy tailed with infinite right endpoint .
In many applied sciences where extremes come into play, it is assumed that the EVI of the underlying c.d.f. F is equal to zero, and all subsequent statistical inference procedures concerning rare events on the tail of , such as the estimation of high quantiles, small exceedance probabilities or return periods, bear on this assumption. Moreover, Gumbel and exponential models are also preferred because of the greater simplicity of inference associated with Gumbel or exponential populations. For other details on EV models see Chapter 22 of Johnson et al. (1995) and a brief entry of Fraga Alves and Neves in the International Encyclopedia of Statistical Science (Fraga Alves and Neves, 2011).
The class GEV, up to location and scale parameters, that is,
represents the only possible max-stable distributions. The GEV model is used as an approximation to model the maxima of large (finite) random samples. In applications the GEV distribution is also known as the Fisher–Tippett distribution, named after Sir Ronald Aylmer Fisher (1890–1962) and Leonard Henry Caleb Tippett (1902–1985) who proved that these are the only three possible types of limiting functions as in Definition 4.3.
At this stage, a pertinent question is:
What is the limiting distribution (if there is one) that is obtained for the maximum from a given ?
One research topic in EVT comprehends the characterization of the max-domains of attraction; this means to characterize the class of c.d.f.'s that belong to a certain max-domain and to find the suitable sequences e such that . We consider first the case of absolutely continuous c.d.f.'s .
The next theorem presents necessary and sufficient conditions for .
The function is denominated as mean excess function. The following result is also useful to obtain the normalizing constants for the EVDs: , , .
There are distributions that do not belong to any max-domain of attraction.
Note: (Super-heavy tails) A c.d.f. such that its tail is of slow variation, that is, , is called superheavy tail, which does not belong to any max-domain of attraction. For more information about superheavy tails, see Fraga Alves et al. (2009).
It is possible also to characterize the max-domains of attraction in terms the tail quantile function . The following result constitutes a necessary and sufficient condition for , .
The following result gives necessary conditions for , in terms of .
The following result encloses necessary and sufficient conditions for , , involving the tail quantile function .
A brief catalog of some usual distributions concerning the respective max-domain of attraction is listed as follows.
Fréchet domain: The following models belong to with :
Weibull domain: The following models belong to with :
Gumbel domain: These distributions are from :
When we are interested in modeling large observations, we are usually confronted with two extreme value models: the GEV c.d.f. introduced in (4.4) and the GP c.d.f. defined as
The GP c.d.f. is defined more generally with the incorporation of location/scale parameters, and , for values , as
Statistical inference about rare events can clearly be deduced only from those observations which are extreme in some sense. There are different ways to define such observations and respective alternative approaches to statistical inference on extreme values: classical Gumbel method of maxima per blocks of size , also designated AM (see Figure 4.8), a parametric approach that uses GEV c.d.f. to approximate the c.d.f. of the maximum, , and the (POT) parametric method, which picks up the excesses of the observations (exceedances), above a high threshold (see Figure 4.9(a)), using GP class of c.d.f.'s to approximate , for , if , and for , if .
Pickands (1975) and Balkema and de Haan (1974) established the duality between the GEV() and GP(), in a result summarized as follows. Given an r.v. with c.d.f. , it is important to characterize the distribution of the excesses above a threshold ,
that is,
Another parametric approach for statistical inference is to fit a parametric model to the largest observations (LO), as sketched in Figure 4.10(a). Consider now that those largest observations, after properly normalized with suitable location and scale real parameters and , , , are reasonably modeled by the joint p.d.f. given by
where is the p.d.f. associated with GEV c.d.f. In general, is the form of the p.d.f. of the nondegenerate limiting distributions of the top o.s.'s from a set of i.i.d. r.v.'s, as stated in the result.
Although in some practical cases only the annual maxima are available, and constituting AM approach a natural method, there are other situations for which the data are more complete, with the registration of the largest values per year. For such cases, a possible parametric approach combines AM and LO methods, which considers a blocking split of the sample data and the largest observations in each of the blocks through what is called the multidimensional model, as follows: a set of i.i.d. -dimensional random vectors , normalized for and , where the common p.d.f. of the vectors is given by defined in (4.8). Note that both AM and LO approaches can be particular cases of this multidimensional model, taking and , respectively. Some references on these two last approaches are Gomes (1981), Smith (1986), Gomes and Alpuim (1986), Gomes (1989), Fraga Alves and Gomes (1996), and Fraga Alves (1999).
In a semiparametric context, rather than fitting a model to the whole sample, built on whatever chosen extreme values as described before, the only assumption on , the c.d.f. underlying the original random sample , is that condition
is satisfied. In this setup, any inference concerning the tail of the underlying distribution can be based on the largest observations above a random threshold (see Figure 4.10b). Theoretically, the designated threshold corresponds to an intermediate o.s., , letting increase to infinity at a lower rate than the sample size ; formally, is an intermediate sequence of positive integers such that
In the context of statistical choice of extreme models, Neves and Alves (2006) and Neves et al. (2006) proposed testing procedures which depend on the observations from the sample lying above a random threshold, with test statistics that are only based on the excesses over :
This setup represents an analogy to the POT approach, but here the random threshold plays the role of the deterministic threshold . This motivates the peaks-over-random-threshold (PORT) methodology, as drafted in Figure 4.9(b). Another publication related with PORT methodology, in the context of high quantile estimation with relevance to VaR in finance, is Araújo e Santos et al. (2013).
For the previous presented results in EVT, with special relevance to the main EV Theorem 4.7, the main assumption is that the observed values can be fairly considered as outcomes of an i.i.d. sample ; however, in many real-world applications, dependence and/or nonstationarity is inherent to the actual processes generating the data. In particular, for statistical inference of rare events, it is of interest to account for dependence at high levels, seasonality, or trend. A simple approach for the latter is given by de Haan et al. (2015). Altogether, the EVT presented so far has to be adapted, and, for instance in AM and POT approaches, it is important to analyze how the respective GEV and GP distributions need to be modified in order to incorporate those features.
For the case of temporal dependence, a case of utmost importance in financial applications, the EV theorem can be extended by assuming the existence of a condition that controls the long-range dependence at extreme levels of a target process. This is known in the literature as the condition, rigorously defined by Leadbetter et al. (1983). For stationary sequences , for which the local weak dependence mixing condition holds, it is still possible to obtain a limiting distribution of GEV type. More precisely, let be the an i.i.d. associated sequence to , that is, with the same marginal . The limiting distributions of partial maxima and for both sequences, respectively, and , are related by the so-called extremal index parameter through the equality
Consequently, a GEV c.d.f. is still present in this case, due to the max-stability property for GEV, defined in (4.5); the respective parameters in (4.12) satisfy
The extremal index , verifying , is a measure of the tendency of the process to cluster at extreme levels, and its existence is guaranteed by a second condition defined by Leadbetter and Nandagopalan (1989). For independent sequences, , but the converse is not necessarily true. Smaller values of imply stronger local dependence, and clusters of extreme values appear; moreover, the concept of the extremal index is identified as the reciprocal of the mean cluster size for high levels. Summing up, in the case of block of maxima, provided long-range independence conditions, inference is similar to that of the i.i.d. case, but in this case the AM approach is adapted to for location/scale parameters as in (4.13). For details on the weak dependence approach, please see Leadbetter (2017).
The absence of stationarity is another situation common in most applications, with process of interest for which the marginal distribution does not remain the same as time changes (seasonality, trends, and volatility, for instance). For these cases, extreme value models are still useful, and parameters dependent of can be the answer to some specific problems. In the adapted POT approach, for instance, the GP model incorporates then parameters with functional forms on time , as dictated by the data, . With the main goal of estimating one-day-ahead VaR forecast, and within this adapted POT framework, Araújo Santos and Fraga Alves (2013), Araújo Santos et al. 2013) proposed the presence of durations between excesses over high thresholds (DPOT) as covariates. For a general overview of EVT and its application to VaR, including the use of explanatory variables, see Tsay (2010), for instance. Recent works providing inference for nonstationary extremes are Gardes (2015), de Haan et al. (2015) and Einmahl et al. (2017).
This section devoted to the illustration of how statistical inference for extreme values develops from EVT, using some of the approaches presented before. Two data sets, worked out by Beirlant et al. (2004), will be used with this purpose:
maasmax.txt
—Annual maximal river discharges of the Meuse river from year 1911 to 1995 at Borgharen in Holland.
Available at http://lstat.kuleuven.be/Wiley/Data/maasmax.txt.
soa.txt
—SOA Group Medical Insurance Large Claims Database; claim amounts of at least 25,000 USD in the year 1991.
Available at http://lstat.kuleuven.be/Wiley/Data/soa.txt.
This is a data set of annual maxima, considered here with the objective of illustrating the AM methodology. Figure 4.11(a) is the time series plot of the annual maxima. From figure 4.11(b) it seems clear that a positive asymmetrical distribution underlies the sample of maxima. With the main goal of making statistical inference on interesting rare events in the field of hydrology, EVT supports the GEV approximation of the c.d.f. of the annual maximum , with for monthly maxima river discharges (or daily records),
and the subsequent estimation of the EVI, , jointly with location/scale parameters . Then, for the annual maximum , the interesting parameters are
R package (R Development Core Team, 2011) incorporates several libraries aimed to work with statistics for extreme values, for instance, ismev, evir, evd, fExtremes
. For Meuse data set, the ML parameter estimates obtained by evir
for the EVI and location and scale parameters from GEV are (remember that ; Proposition 4.9),
and the estimated 100-years return level is . Notice that a nonparametric estimation for this high () quantile of the annual maxima, , is given by the empirical quantile of the sample of maxima, , and the answer remains the same for any . The evir
still has the possibility of returning confidence intervals (CI) for the parameters involved. For the return level , for instance, the CI based on profile likelihood is . As the EVI is negative, the right endpoint for the annual maximum of the river discharges is estimated by , a value beyond the largest value in the sample of annual maxima. Since EVI is close to zero, it is also pertinent to fit the Gumbel model to the sample of 85 annual maxima; the estimated location and scale parameters are then
leading to an estimated 100-years return level of . In this case study, it is observed that a small change in the value of the EVI has big repercussion on the estimated high quantile. So, it seems important to make beforehand a statistical choice between Gumbel model and the other GEV distributions, Weibull and Fréchet. This can be accomplished by a statistical test for the EVI on the hypothesis
Overviews on testing extreme value conditions can be found in Hüsler and Peng (2008) and Neves and Fraga Alves (2008). For other useful preliminary statistical analysis, like QQ-quantile plot or mean excess plot, see Coles (2001), Beirlant (2004), or Castillo et al. (2005), for instance.
This data set comprises large claims6 registered in 1991 from Group Medical Insurance Large Claims Database. The box plot in Figure 4.12(b) indicates a substantial right skewness.
Keeping in mind the main goal of estimating a high quantile and a probability of exceedance of a high level, the POT approach will be considered, supported by the Pickands–Balkema–de Haan theorem 4.19. In Figure 4.12(a) a number of exceedances is provided by a high threshold and the respective observed excesses of , replicas of the excess , with c.d.f. in (4.7). Denoting by the c.d.f. of the large claim , the probability of exceedance of the high level is
for the maximum observed excess . From now on we simplify the notation, with and . Consider now the approximation of the distribution of the excesses to GP distribution Fu(y0)≈Hξ(y0;σu); with ML estimates of EVI and scale returned by evir
library, respectively, and , it obtained
consequently, estimating the probability of a large claim exceeding the threshold, by , the target small probability of exceedance of a high level is estimated by
consequently,
For SOA data is , if we assume in (4.14).
Consider a high quantile of , the c.d.f. of a large claim , that is, a value , small, such that . The estimator of the high quantile, , is obtained by similar arguments to the ones for the probability of exceedance, and it is given by
which is accomplished by making in expression (4.14) and inverting. For SOA data, with as before, expression (4.15) provides the estimate for a high (1-) quantile, , the value USD. In Figure 4.13 it represented the sample path for the estimation, with POT approach, of the high quantile, , for a decreasing value of the threshold.
For SOA data, the previous chosen threshold is such that it is between the and largest data values, respectively, and , that is, .
which can be easily checked by making in expression (4.15).
Indeed, for admissibility of any right endpoint estimator , one should take
It is assumed that the random sample is i.i.d. from c.d.f.
or, equivalently, by Theorem 4.16, the first-order condition for some positive function :
Statistical inference is based on the top sample
with an intermediate o.s., that is,
Estimation of EVI and scale : In a semiparametric setup, the EVI is the crucial parameter to be estimated.
with and
the moments EVI estimator is defined by
This EVI estimator only involves three observations from the top
and is defined by
Under extra conditions on the rate of and on the tail of , a normal asymptotic distributional behavior is attained for Hill (), moments (), and Pickands () estimators:
with
In Figure 4.14 the asymptotic variances are compared for Hill, Pickands, and moments estimators as functions of , .
For finite, these semiparametric estimators exhibit the following pattern: for small less bias, and big variance, and the other way around for large .
Probability of exceedance of a high level : .
Theoretically, the results for estimating are established for high levels
A consistent estimator for , in the sense that , is
with and consistent estimators of EVI and scale . In particular,
For EVI positive, the following simpler version of (4.20) is valid:
Obs: Compare the semiparametric estimation (4.20) with expressions in (4.14) under parametric POT approach.
High quantile : with .
with and consistent estimators of EVI and scale , in particular for estimators in (4.21).
Obs: Compare the semiparametric estimation (4.22) with expressions (4.15) under parametric POT approach.
For EVI positive, the simpler version of (4.22) is valid:
introduced by Weissman (1978).
All the classical presented semiparametric estimators are asymptotic normal, under convenient extra conditions on the rate of and on the tail of , which enables the construction of CI for the target parameters. Details can be found, for instance, in de Haan and Ferreira (2006). Another area of current and future research, closely related with semiparametric methodology in EV, is the estimation of the right endpoint for distributions in the Gumbel domain of attraction, which has been innovated by Fraga Alves and Neves (2014). Therein, an application was pursued to statistical EVA of Anchorage International Airport Taxiway Centerline Deviations for Boeing 747 aircraft. For SOA data from 1991 of Group Medical Insurance Large Claims Database, the semiparametric estimation of the EVI and of the high quantile are represented in Figures 4.15 and 4.16, respectively, in comparison with POT estimation. Close to this subject on statistical analysis on extreme values, see also Beirlant et al. (2017) and Gomes et al. (2017).
There is no obvious ordering in multivariate observations, but there are too many possibilities. Hence, the interest is not in extraordinary high levels but rather in extreme probabilities or probability of an extreme or a failure set. A fruitful approach in multivariate extreme value (MEV) theory is the modeling of component-wise maxima. We define the vector of component-wise maxima (and minima) as follows. Let
be a random sample of -variate outcomes from
with the same joint distribution function . The pertaining random vector of component-wise maxima is defined as
Analogously for the vector of component-wise minima, we observe that
It is worthy of note that the sample maximum may not be an observed sample value. Hence, there is not a direct transfer of the block maxima method from the univariate to the multivariate case. Nevertheless, a rich theory emanates from looking at maximal components individually. Let
be the distribution function of the component-wise maximum . As in the univariate case, the usual approach is to find sequences of constants and such that we get a nontrivial limit for sufficiently large , that is, such that
for every continuity point of , with a c.d.f. with nondegenerate margins . Any distribution function arising in the limit is called a MEV distribution, and we then say that belongs to the max-domain of attraction of (notation: ). It is important to note that (4.24) implies convergence of the pertaining marginal distributions,
which entails, in turn, a known parametric structure in the limit of the corresponding sequence of marginal distributions, that is, , for all such that. Similarly to the univariate case, the parameters , , are called (marginal) extreme value indices. Defining , , then the extended regular variation (see Theorem 4.16) of each marginal tail quantile function holds with auxiliary functions , , that is,
for all (cf. de Haan and Ferreira, 2006, p. 209). Furthermore, since is monotone and itself is continuous, because its components are continuous, then the convergence in (4.24) holds locally uniformly. Considering for all , the sequences
by (4.25), then we may write
Therefore, for a suitable choice of constants and in Eq. (4.24), we write
for all . This leads to the statement in Theorem 6.1.1 of de Haan and Ferreira (2006). We now go back to the MEV condition (4.24): suppose that the random vector of dimension belongs to the max-domain of attraction of the random vector . That is, there exist constants and such that
where is a nontrivial random vector and are independent copies of . Unlike the univariate case (4.1), the MEV distribution of cannot be represented as a parametric family in the form of a finite-dimensional parametric vector. Instead, the family of MEV distributions is characterized by a class of finite measures. To this effect we reformulate the domain of attraction condition (4.27) as follows: suppose that the marginal distribution functions , are all continuous functions. Define the random vector
By virtue of (4.26), we have, as ,
where has joint distribution function , meaning that the marginal distributions no longer intervene in the limiting sense. Hence it is possible to disentangle the marginal distributions from the inherent dependence structure. This process of transformation to standard marginals (Pareto in this case; another popular choice is the tail equivalent Fréchet marginals) does not pose theoretical difficulties (see, e.g., Resnick, 1987; Deheveuls, 1984). From a practical viewpoint, margins may be estimated via the e.d.f and then standardized into a unit Fréchet or standard Pareto distributions. This approach is also well established in the literature; see, for example, Genest et al. (1995). For a motivation and implications of choosing other standardized marginals, see Section 8.2.6 in Beirlant et al. (2004).
We proceed with the study of the dependence structure in the limit. Like in the univariate case, we may apply logarithm everywhere in order to find that (4.26) is equivalent to
With some effort (see Corollary 6.1.4 of de Haan and Ferreira, 2006), we can replace with in the foregoing, ending up with a variable running through the real line, in a continuous path, that is, for any such that ,
If we take and multiply this scalar with the vector , we obtain
and
Therefore, a measure characterizing the distribution of in (4.27) should satisfy the homogeneity relation
This is particularly true in the case of the exponent measure (see, e.g., Definition 6.1.7 of de Haan and Ferreira, 2006; p. 256 of Beirlant et al., 2004). The exponent measure is concentrated on such that
with , for all and a Borel subset of . This homogeneity property suggests transformation using pseudopolar coordinates, yielding the spectral measure with respect to the sum-norm :
the unit simplex (notation: stands for ), with
Section 6.1.4 of de Haan and Ferreira (2006) contains results that expound a direct link between convergence in distribution and convergence of exponent measures for sequences of max-stable distributions, in the sense of closure with respect to convergence in distribution. Section 8.2.3 in Beirlant et al. (2004) is fully dedicated to the spectral measure starting from arbitrary norms. Another way of characterizing max-stable distributions is by the stable tail dependence function
The exponent measure and the stable tail dependence function are related via , . Here the marginal distributions are featured through their extreme value indices . Among the properties of the function , listed in Proposition 6.1.21 of de Haan and Ferreira (2006), we mention that is a convex function, satisfies the homogeneity property of order 1, and is such that , for all . We also note that only the bivariate is straightforward in this respect (cf. p. 257 of Beirlant et al., 2004). Pickands dependence function is also a common tool in the bivariate context (Pickands, 1981). On the unit simplex, it is defined as
where denotes again the stable dependence function. By homogeneity of the function , Pickands dependence function completely determines the limit . Important properties of the function are that (P1) is convex, (P2) , and (P3) . Moreover, is related with the spectral measure via
A similar relation with respect to an arbitrary choice of norms, possibly other than the sum-norm, is given in Eq. (8.49) of Beirlant et al. (2004). The -variate extension of Pickands dependence function also relies on the homogeneity of the tail dependence function , entailing the restriction to the unit simplex:
We now turn to conditions fulfilled by a distribution function in the domain of attraction of a max-stable distribution (notation: ) in the -variate setting. In the bivariate case, Lemmas 2.2 and 2.3 of Barão et al. (2007) can be used to generate distributions which are in the domain of attraction of a multivariate extreme distribution . An extension of the latter to higher dimension is detailed in Section 3.1 of Segers (2012). As expected at this point, MEV conditions approach the marginal distributions and the dependence structure in a separate way. In order to be prepared for applying at least one of the previous measures, we consider the random vector with standard Pareto margins, provided the transformation
Denote the joint distribution function of with , that is, . If , then the following are equivalent:
for all continuity set for the spectral measure .
Points 1 and 2 reveal nonparametric estimation procedures in the sense that probabilities involved can be translated and replaced by their empirical analogs. The latter also entails that empirical measures are in demand. An estimator for the spectral measure is introduced by Einmahl et al. (1997). We also refer the reader to Einmahl et al. (2001) in this respect. The problem of estimating the dependence structure is tackled in depth by de Haan and Ferreira (2006) (see their Chapter 7 and references therein). Parametric estimators evolve from point 3 by means of likelihood statistical inference. In this respect we refer to Coles and Tawn (1991, 1994). Other parametric threshold estimation methods, evolving from point 1, are presented by Ledford and Tawn (1996) and Smith et al. (1997). In these works, the sum-norm is in order. Others have used the (Einmahl et al., 1993) and (Einmahl et al., 2001) norms in nonparametric estimation. Estimation of the probability of a failure set is expounded in Chapter 8 of de Haan and Ferreira (2006). A class of corrected bias estimators for the stable tail dependence function is proposed by Fougères et al. (2014). Finite sample comparison of several estimators by means of a simulation study is laid out in Barão et al. (2007) for the bivariate case. Altogether, the class of MEV distributions, being infinite dimensional, yields modeling and statistical inference a cumbersome task in practice. When we are dealing with realizations of stochastic processes, any difficulties in this task can be aggravated, although de Haan and Ferreira (2006, p. 293) point out that the theory of infinite-dimensional extremes is quite analogous to the MEV addressed in this chapter. For a review of the existing estimation techniques for max-stable processes, see, for example, Padoan et al. (2010), Reich and Shaby (2012), Einmahl et al. (2012), and Yuen and Stoev (2014a). Within the scope of finance and actuarial applications, Yuen and Stoev (2014b) advocate the use of a specific finite-dimensional max-stable model for extreme risks, which can be effectively estimated from the data, rather than to proceed in the infinite-dimensional setting.
At the two opposite ends of the dependence spectrum of max-stable or MEV distributions are the cases of asymptotic independence and complete dependence. Here, the stable tail dependence function proves to be useful. The main advantage of the stable function arises from the possibility of setting levels in order to get a graphical depict of the dependence structure. Setting yields independent components of the limit vector . On the opposite end, means that the -components are the same r.v.'s. There are several accounts on that the asymptotic independence assumption fails to provide a satisfactory way to estimate joint tails using MEV distributions (see, e.g., de Haan and Ferreira, 2006; Eastoe et al., 2014). A test for independence is constructed in Genest and Rémillard (2004). A comprehensive essay on the tail dependence function intertwined with the tail copula function is the work by Gudendorf and Segers (2010). The copula function represents the dependence structure of a multivariate random vector; hence the description of extreme or tail dependence does not escape its grasp. In fact, copula theory (cf. Nelsen, 1999; Joe, 1997) and copula estimation have been extensively used in financial applications. Concerning the estimation of general copula functions, several parametric, semiparametric, and nonparametric procedures have already been proposed in the literature (see, e.g., Stute, 1984; Genest and Rivest, 1993; Genest et al., 1995). Estimation of tail-related copula has been tackled, for instance, by Huang (1992), Peng (1998), and Durrleman et al. (2000).
A recent up-to-date review of univariate EVT and respective statistical inference can be found in Gomes and Guillou (2014). Reference books in EVT and in the field of real-world applications of EVDs and extremal domains of attraction are Embrechts et al. (2001), Beirlant et al. 2004, Coles (2001), de Haan and Ferreira (2006), David and Nagaraja (2003), Gumbel (1958), Castillo et al. (2005), and Reiss and Thomas (2007). Seminal works on MEV theory are the papers by Tiago de Oliveira (1958), Sibuya (1960), de Haan and Resnick (1977), Deheuvels (1978), and Pickands (1981). For books predicated on this subject, we refer to Resnick (1987, 2007), Coles (2001), Beirlant et al. (2004), de Haan and Ferreira (2006), and Salvadori et al. (2007). Applications of MEV theory range from environmental risk assessment (Coles and Tawn, 1991; Joe, 1994; de Haan and de Ronde, 1998; Schlather and Tawn, 2003), financial risk management (Embrechts, 2000; Longin, 1996; Longin, 2000; Longin and Solnik, 2001; Stărică, 1999; Poon et al., 2003), and Internet traffic modeling (Maulik et al., 2002; Resnick, 2002) to sports (Barão and Tawn, 1999).
This work was funded by FCT – Fundação para a Cincia e a Tecnologia, Portugal, through the project UID/MAT/00006/2013.