Chapter 6
Bootstrap Methods in Statistics of Extremes

M. Ivette Gomes1, Frederico Caeiro2, Lígia Henriques-Rodrigues3 and B.G. Manjunath4

1Universidade de Lisboa, FCUL, DEIO and CEAUL, Portugal

2Universidade Nova de Lisboa, FCT and CMA, Portugal

3Universidade de São Paulo, IME and CEAUL, Brazil

4Universidade de Lisboa, CEAUL, Portugal

AMS 2010 subject classification. Primary 62G32, 62E20; Secondary 65C05.
AMS 2000 subject classification. Primary 62G32, 62E20; Secondary 62G09, 62G30.

6.1 Introduction

Let c06-math-0002 be a random sample from an underlying cumulative distribution function (CDF) c06-math-0003. If we assume that c06-math-0004 is known, we can easily estimate the sampling distribution of any estimator c06-math-0005 of an unknown parameter c06-math-0006 through the use of a Monte Carlo simulation, described in the following algorithm:

  1. S1.  For c06-math-0007,
    1. S1.1  generate random samples c06-math-0008,
    2. S1.2  and compute c06-math-0009.
  2. S2.  On the basis of the output c06-math-0010, after the c06-math-0011 iterations in Step S1, use such a sample to estimate the sampling distribution of c06-math-0012, through either the associated empirical distribution function or any kernel estimate, among others.

If c06-math-0013 goes to infinity, something not achievable in practice, we should then get a perfect match to the theoretical calculation, if available, that is, the Monte Carlo error should disappear. But c06-math-0014 is usually unknown. How to proceed? The use of the bootstrap methodology is a possible way.

Bootstrapping (Efron, 1979) is essentially a computer-based and computer-intensive method for assigning measures of accuracy to sample estimates (see Efron and Tibshirani, 1994; Davison and Hinkley, 1997, among others). Concomitantly, this technique also allows estimation of the sampling distribution of almost any statistic using only very simple resampling methods, based on the observed value of the empirical distribution function, given by

We can replace in the previously sketched algorithm c06-math-0016 by c06-math-0017, the empirical distribution function associated with the original observed data, c06-math-0018, which puts mass c06-math-0019 on each of the c06-math-0020, generating with replacement c06-math-0021, in Step S1.1 of the algorithm in the preceding text, computing c06-math-0022, c06-math-0023, in Step S1.2, and using next such a sample in Step S2.

The main goal of this chapter is to enhance the role of the bootstrap methodology in the field of statistics of univariate extremes, where the bootstrap has been commonly used in the choice of the number c06-math-0024 of top order statistics or of the optimal sample fraction, c06-math-0025, to be taken in the semiparametric estimation of a parameter of extreme events. For an asymptotically consistent choice of the threshold to use in the adaptive estimation of a positive extreme value index (EVI), c06-math-0026, the primary parameter in statistics of extremes, we suggest and discuss a double-bootstrap algorithm. In such algorithm, apart from the classical Hill (1975) and peaks over random threshold (PORT)-Hill EVI estimators (Araújo Santos et al., 2006), we consider a class of minimum-variance reduced-bias (MVRB), the simplest one in Caeiroet al. (2005), and associated PORT-MVRB (Gomes et al., 2011a, 2013) EVI estimators. Other bootstrap methods for the choice of c06-math-0027 can be found in Hall (1990), Longin (1995), Caers et al. (1999), Draisma et al. (1999), Danielsson et al. (2001), and Gomes and Oliveira (2001), among others. For a recent comparison between the simple-bootstrap and the double-bootstrap methodology, see Caeiro and Gomes (2014b), where an improved version of Hall's bootstrap methodology was introduced.

After providing, in Section 6.2, a few technical details in the area of extreme value theory (EVT), related to the EVI estimators under consideration in this chapter, we shall briefly discuss, in Section 6.3, the main ideas behind the bootstrap methodology and optimal sample fraction estimation. In the lines of Gomes et al. (2011b–2012, 2015a), we propose an algorithm for the adaptive consistent estimation of a positive EVI, through the use of resampling computer-intensive methods. The Algorithm is described for the Hill EVI estimator and associated PORT-Hill, MVRB, and PORT-MVRB EVI estimators, but it can work similarly for the estimation of other parameters of extreme events, like a high quantile, the probability of exceedance, or the return period of a high level. The associated code in R language for the adaptive EVI estimation is available upon request. Section 6.4 is entirely dedicated to the application of the Algorithm to three simulated samples. Finally, in Section 6.5, we draw some overall conclusions.

6.2 A Few Details on EVT

The key results obtained by Fisher and Tippett (1928) on the possible limiting laws of the sample maxima, formalized by Gnedenko (1943), and used by Gumbel (1958) for applications of EVT in engineering subjects are some of the key tools that led to the way statistical EVT has been exploding in the last decades. In this chapter, we focus on the behavior of extreme values of a data set, dealing with maximum values and other top order statistics in a univariate framework, working thus in the field of statistics of extremes.

Let us assume that we have access to a random sample c06-math-0028 of independent, identically distributed, or possibly stationary and weakly dependent random variables from an underlying model c06-math-0029, and let us denote by c06-math-0030 the sample of associated ascending order statistics. As usual, let us further assume that it is possible to linearly normalize the sequence of maximum values, c06-math-0031, so that we get a nondegenerate limit. Then (Gnedenko, 1943), that limiting random variable has a CDF of the type of the extreme value distribution, given by

and c06-math-0033 is the so-called EVI, the primary parameter in statistics of extremes. We then say that c06-math-0034 is in the max-domain of attraction of c06-math-0035, in (6.2), and use the notation c06-math-0036. The EVI measures essentially the weight of the right tail function, c06-math-0037. If c06-math-0038, the right tail is short and light, since c06-math-0039 has compulsory a finite right end point, that is, c06-math-0040 is finite. If c06-math-0041, the right tail is heavy and of a negative polynomial type, and c06-math-0042 has an infinite right end point. A positive EVI is also often called tail index. If c06-math-0043, the right tail is of an exponential type, and the right end point can then be either finite or infinite.

Slightly more restrictively than the full max-domain of attraction of the extreme value distribution, we now consider a positive EVI, that is, we work with heavy-tailed models c06-math-0044 in c06-math-0045. Heavy-tailed models appear often in practice in fields like bibliometrics, biostatistics, finance, insurance, and telecommunications. Power laws, such as the Pareto distribution and Zipf's law, have been observed a few decades ago in some important phenomena in economics and biology and have seriously attracted scientists in recent years. As usual, we shall further use the notations c06-math-0046 for the generalized inverse function of c06-math-0047 and c06-math-0048 for the class of regularly varying functions at infinity with an index of regular variation c06-math-0049, that is, positive Borel measurable functions c06-math-0050 such that c06-math-0051, as c06-math-0052, for all c06-math-0053 (see Bingham et al., 1987, for details on regular variation). Let us further use the notation c06-math-0054 for the tail quantile function. For heavy-tailed models we have the validity of the following first-order conditions:

The first necessary and sufficient condition in the preceding text, related to the right tail function behavior, was proved by Gnedenko (1943), and the second one, related to the tail quantile function behavior, was proved by de Haan (1984).

For these heavy-tailed models, and given a sample c06-math-0056, the classical EVI estimators are Hill estimators (Hill, 1975), with the functional expression

They are thus the average of the c06-math-0058 log-excesses, c06-math-0059, above the random level or threshold c06-math-0060. To have consistency of Hill EVI estimators, we need to have c06-math-0061, and such a random threshold c06-math-0062 needs further to be an intermediate order statistic, that is, we need to have

if we want to have consistent EVI estimation in the whole c06-math-0064. Indeed, under any of the first-order frameworks in (6.3), the log-excesses, c06-math-0065, are approximately the c06-math-0066 order statistics of an exponential sample of size c06-math-0067, with mean value c06-math-0068, hence the reason for the EVI estimators in (6.4).

Under adequate second-order conditions that rule the rate of convergence in any of the first-order conditions in (6.3), Hill estimators, c06-math-0069, have usually a high asymptotic bias, and recently, several authors have considered different ways of reducing bias (see the overviews in Gomes et al., 2007b, Chapter 6 of Reiss and Thomas, 2007; Gomes et al., 2008a; Beirlant et al., 2012; Gomes and Guillou, 2015). A simple class of MVRB EVI estimators is the class studied in Caeiro et al. (2005), to be introduced in Section 6.2.2. These MVRB EVI estimators depend on the adequate estimation of second-order parameters, and the kind of second-order parameter estimation that enables the building of MVRB EVI estimators, that is, EVI estimators that outperform the Hill estimator for all c06-math-0070, is sketched in Sections 6.2.1 and 6.2.2.

Both Hill and MVRB EVI estimators are invariant to changes in scale but not invariant to changes in location. And particularly the Hill EVI estimators can suffer drastic changes when we induce an arbitrary shift in the data, as can be seen in Figure 6.1.

c06f001

Figure 6.1 Hill plots, denoted c06-math-0071, associated with unit Pareto samples of size c06-math-0072, from the model c06-math-0073, for c06-math-0074 and c06-math-0075.

Indeed, even if a Hill plot (a function of c06-math-0076 vs c06-math-0077) looks stable, as happens in Figure 6.1, with the c06-math-0078 sample path, where data, c06-math-0079, c06-math-0080, come from a unit standard Pareto CDF, c06-math-0081, for c06-math-0082), we easily come to the so-called Hill horror plots, a terminology used in Resnick (1997), when we induce a shift to the data. This can be seen also in Figure 6.1 (look now at c06-math-0083), where we present the Hill plot associated with the shifted sample c06-math-0084, from the CDF c06-math-0085, now for c06-math-0086. This led Araújo Santos et al. (2006) to introduce the so-called PORT methodology, to be sketched in Section 6.2.3. The asymptotic behavior of the EVI estimators under consideration is discussed in Section 6.2.4.

6.2.1 Second-Order Reduced-Bias EVI Estimation

For consistent semiparametric EVI estimation, in the whole c06-math-0087, we have already noticed that we merely need to work with adequate functionals, dependent on an intermediate tuning or control parameter c06-math-0088, the number of top order statistics involved in the estimation, that is, (6.5) should hold. To obtain full information on the nondegenerate asymptotic behavior of semiparametric EVI estimators, we often need to further assume a second-order condition, ruling the rate of convergence in any of the first-order conditions, in (6.3). It is often assumed that there exists a function c06-math-0089, such that

Then, we have c06-math-0091. Moreover, if the limit in the left-hand side of (6.6) exists, we can choose c06-math-0092 so that such a limit is compulsory equal to the previously defined c06-math-0093 function (Geluk and de Haan, 1987).

Whenever dealing with reduced-bias estimators of parameters of extreme events, and essentially due to technical reasons, it is common to slightly restrict the domain of attraction, c06-math-0094, and to consider a Pareto-type class of models, assuming that, with c06-math-0095, c06-math-0096, c06-math-0097, and as c06-math-0098,

The class in (6.7) is however a wide class of models that contains most of the heavy-tailed parents useful in applications, like the Fréchet, the generalized Pareto, and the Student-c06-math-0100, with c06-math-0101 degrees of freedom. For Fréchet parents, we get c06-math-0102 and c06-math-0103 in (6.7). For a generalized Pareto distribution, c06-math-0104, with c06-math-0105 given in (6.2), we get c06-math-0106 and c06-math-0107. For Student-c06-math-0108 parents, we get c06-math-0109 and c06-math-0110. For further details and an explicit expression of c06-math-0111 as a function of c06-math-0112, see Caeiro and Gomes (2008), among others. Note that the validity of (6.6) with c06-math-0113 is equivalent to (6.7). To obtain information on the bias of MVRB EVI estimators, it is even common to slightly restrict the class of models in (6.7), further assuming the following third-order condition:

as c06-math-0115, with c06-math-0116. All the aforementioned models still belong to this class. Slightly more generally, we could have assumed a general third-order condition, ruling now the rate of convergence in the second-order condition in (6.6), which guarantees that, for all c06-math-0117,

where c06-math-0119 must then be in c06-math-0120. Equation (6.8) is equivalent to equation (6.9) with c06-math-0121. Further details on the topic can be found in de Haan and Ferreira (2006).

Provided that (6.5) and (6.6) hold, Hill EVI estimators, c06-math-0122, have usually a high asymptotic bias. The adequate accommodation of this bias has recently been extensively addressed. Among the pioneering papers, we mention Peng (1998), Beirlant et al. (1999), Feuerverger and Hall (1999), and Gomes et al. (2000). In these papers, authors are led to reduced-bias EVI estimators, with asymptotic variances larger than or equal to c06-math-0123, where c06-math-0124 is the aforementioned “shape” second-order parameter in (6.6). Recently, as sketched in Section 6.2.2, Caeiro et al. (2005) and Gomes et al. (2007a, 2008c) have been able to reduce the bias without increasing the asymptotic variance, kept at c06-math-0125, just as happens with the Hill EVI estimator.

6.2.2 MVRB EVI Estimation

To reduce bias, keeping the asymptotic variance at the same level, we merely need to use an adequate “external” and a bit more than consistent estimation of the pair of second-order parameters, c06-math-0126, in (6.7). The MVRB EVI estimators outperform the classical Hill EVI estimators for all c06-math-0127, and among them, we now consider the simplest class by Caeiro et al. (2005), used for value-at-risk (VaR) estimation by Gomes and Pestana (2007b). Such a class, denoted by c06-math-0128, has the functional form

where c06-math-0130 is an adequate consistent estimator of c06-math-0131, with c06-math-0132 and c06-math-0133 based on a number of top order statistics c06-math-0134 usually of a higher order than the number of top order statistics c06-math-0135 used in the EVI estimation, as explained in Sections 6.2.1 and 6.2.2. For different algorithms for the estimation of c06-math-0136, see Gomes and Pestana (2007a,b).

6.2.2.1 Estimation of the “shape” Second-order Parameter

We consider the most commonly used c06-math-0137-estimators, the ones studied by Fraga Alves et al. (2003), briefly introduced in the sequel. Given the sample c06-math-0138, the c06-math-0139-estimators by Fraga Alves et al. (2003) are dependent on the statistics

defined for any tuning parameter c06-math-0141 and where

equation

Under mild restrictions on c06-math-0143, that is, if (6.5) holds and c06-math-0144, with c06-math-0145 the function in (6.7), the statistics in (6.11) converge toward c06-math-0146, independently of the tuning parameter c06-math-0147, and we can consequently consider the class of admissible c06-math-0148-estimators:

Under adequate general conditions, and for an appropriate tuning parameter c06-math-0150, the c06-math-0151-estimators in (6.12) show highly stable sample paths as functions of c06-math-0152, the number of top order statistics used, for a range of large c06-math-0153-values. Again, it is sensible to advise practitioners not to choose blindly the value of c06-math-0154 in (6.12). Sample paths of c06-math-0155, as functions of c06-math-0156, for a few values of c06-math-0157, should be drawn, in order to elect the value of c06-math-0158, which provides higher stability for large c06-math-0159, by means of any stability criterion. For the most common stability criterion, see Gomes and Pestana (2007b) and Remark 6.6. The value c06-math-0160, considered in the description of the Algorithm in Section 6.3.2, has revealed to be the most adequate choice whenever we are in the region c06-math-0161, a common region in applications, and the region where bias reduction is indeed needed. Distributional properties of the estimators in (6.12) can be found in Fraga Alves et al. (2003). Interesting alternative classes of c06-math-0162-estimators have recently been introduced by Goegebeur et al. (2008, 2010), Ciuperca and Mercadier (2010), and Caeiro and Gomes (2014a, 2015).

6.2.2.2 Estimation of the “scale” Second-order Parameter

For the estimation of the scale second-order parameter c06-math-0163, on the basis of

equation

we shall consider the estimator in Gomes and Martins (2002):

dependent on an adequate c06-math-0166-estimator, c06-math-0167. It has been advised the computation of these second-order parameter estimators at a c06-math-0168-value given by

The estimator c06-math-0170, to be plugged in (6.13), is thus c06-math-0171, with c06-math-0172 and c06-math-0173 given in (6.12) and (6.14), respectively.

Details on the distributional behavior of the estimator in (6.13) can be found in Gomes and Martins (2002) and more recently in Gomes et al. (2008c) and Caeiro et al. (2009). Again, consistency is achieved for models in (6.7) and c06-math-0208-values such that (6.5) holds and c06-math-0209, as c06-math-0210. Alternative estimators of c06-math-0211 can be found in Caeiro and Gomes (2006) and Gomes et al. (2010). Due to the fact that c06-math-0212 and c06-math-0213, with c06-math-0214, c06-math-0215, and c06-math-0216 given in (6.12)–(6.14), respectively, depending on c06-math-0217, we often use the notation c06-math-0218. But when we work with c06-math-0219 only, as happens in Section 6.3.2, we shall not use the subscript c06-math-0220. Note however that the Algorithm in Section 3.2 can also be used for another fixed choice of c06-math-0221, as well as for a data-driven choice of c06-math-0222 provided by any of the algorithms in Gomes and Pestana (2007a,b), among others.

6.2.3 PORT EVI Estimation

The estimators in (6.4) and (6.10) are scale invariant but not location invariant. In order to achieve location invariance for a class of modified Hill EVI estimators and adequate properties for VaR estimators, Araújo Santos et al. (2006) introduced the so-called PORT methodology. The estimators are then functionals of a sample of excesses over a random level c06-math-0223, c06-math-0224, that is, functionals of the sample

Generally, we can have c06-math-0226, for any c06-math-0227 (the random level is an empirical quantile). If the underlying model c06-math-0228 has a finite left end point, c06-math-0229, we can also use c06-math-0230 (the random level can then be the minimum).

If we think, for instance, on Hill EVI estimators, in (6.4), the new classes of PORT-Hill EVI estimators, theoretically studied in Araújo Santos et al. (2006), and for finite samples in Gomes et al. (2008b), are given by

Similarly, if we think on the MVRB EVI estimators, in (6.10), the new classes of PORT-MVRB EVI estimators, studied for finite samples in Gomes et al. (2011a, 2013), are given by

with c06-math-0233 in (6.16), c06-math-0234 and c06-math-0235 any adequate estimator of c06-math-0236, the vector of second-order parameters associated with the shifted model, based on the sample c06-math-0237, in (6.15).

These PORT EVI estimators are thus dependent on a tuning parameter c06-math-0238, 0⩽q<1, that makes them highly flexible. Moreover, they are invariant to changes in both location and scale. Just as in Gomes et al. (2013, 2015a), we shall further include in the algorithm the value c06-math-0239, so that with c06-math-0240, c06-math-0241, c06-math-0242, and c06-math-0243, given in (6.4), (6.10), (6.16), and (6.17), respectively, we can consider that c06-math-0244 and c06-math-0245 for c06-math-0246 (with the notations c06-math-0247, c06-math-0248, so that c06-math-0249, c06-math-0250, c06-math-0251).

Further applications of the PORT methodology can be found in Henriques-Rodrigues et al. (2014, 2015), Caeiro et al. (2016) and Gomes et al. (2016), among others.

6.2.4 Asymptotic Properties of the EVI Estimators

The Hill estimator reveals usually a high asymptotic bias. Indeed, from the results of de Haan and Peng (1998), and with c06-math-0266 denoting a normal random variable with mean value c06-math-0267 and variance c06-math-0268, there exists c06-math-0269 such that

where the bias c06-math-0271 under condition (6.8) can be very large, moderate, or small, going, respectively, to c06-math-0272, a nonnull constant, or 0, as c06-math-0273. This nonnull asymptotic bias, together with a rate of convergence of the order of c06-math-0274, leads to sample paths with a high variance for small c06-math-0275, a high bias for large c06-math-0276, and a very sharp mean square error (MSE) pattern, as a function of c06-math-0277. Under the same conditions as before, c06-math-0278 is asymptotically normal with variance also equal to c06-math-0279 but with a null mean value. Indeed, under the validity of the aforementioned third-order condition in (6.8), related to Pareto-type class of models, we can adequately estimate the vector of second-order parameters c06-math-0280 so that c06-math-0281 outperforms c06-math-0282 for all c06-math-0283. Indeed, and for an adequate c06-math-0284, computed by Caeiro et al. (2009), we can write

We can further summarize the aforementioned results in the following theorem.

For the asymptotic behavior of the PORT-Hill EVI estimators, we refer to Araújo Santos et al. (2006). The full asymptotic behavior of the PORT-MVRB EVI estimators is still under development. It is known that the rate of convergence and asymptotic variance do not change. There are however big changes in the bias but for adequate c06-math-0304-values the PORT-MVRB EVI estimators are indeed MVRB EVI estimators. Contrarily to what has been done by Gomes et al. (2015a), we shall thus consider for them the same double-bootstrap Algorithm we used for the MVRB EVI estimation.

6.3 The Bootstrap Methodology in Statistics of Univariate Extremes

The use of bootstrap resampling methodologies has revealed to be promising in the choice of the nuisance tuning or control parameter c06-math-0305 or equivalently of the optimal sample fraction, c06-math-0306, in the semiparametric estimation of any parameter of extreme events. If we ask how to choose the tuning parameter c06-math-0307 in the EVI estimation, either through c06-math-0308 or c06-math-0309 or c06-math-0310 or c06-math-0311, c06-math-0312, generally denoted c06-math-0313, we usually consider the estimation of

To obtain estimates of c06-math-0315, one can use the so-called double-bootstrap method based on two related bootstrap samples of size c06-math-0316 and c06-math-0317. Such a method is applied to an adequate auxiliary statistic like

which tends to the well-known value zero and has an asymptotic behavior similar to the one of c06-math-0319 (see Gomes and Oliveira, 2001, among others, for the estimation through c06-math-0320 and Gomes et al., 2012, for the estimation through c06-math-0321). See also Gomes et al. (2015a,b) and Section 6.3.2.

On the basis of (6.20) and (6.21), and with AMSE standing for “asymptotic MSE,” the sum of the asymptotic variance and the squared dominant component of the bias, we get

with c06-math-0323 defined in (6.22). See Theorem 1 of Draisma et al. (1999), for a proof of this result, in the case of c06-math-0324. The proof is similar for the cases of c06-math-0325, as already mentioned by Gomes et al. (2012). Things work more intricately for the PORT-MVRB EVI estimators, and as mentioned in the preceding text, we shall consider an algorithm similar to the one devised for the MVRB EVI estimators in case we are working with c06-math-0326, c06-math-0327, since we are interested in the possible specific value of c06-math-0328 that makes these PORT estimators MVRB EVI estimators. The bootstrap methodology enables us to estimate c06-math-0329, in (6.22), in a way similar to the one used for the classical EVI estimators, on the basis of a consistent estimator of c06-math-0330, in (6.24), and now through the use of an auxiliary statistic like the one in (6.23), a method detailed in Gomes et al. (2011b–2012) for the MVRB EVI estimation. For the sake of simplicity, we shall next describe the methodology for c06-math-0331, but similar formulas work for c06-math-0332 provided that we replace c06-math-0333 by c06-math-0334, c06-math-0335. Indeed, under the aforementioned third-order framework in (6.8),

equation

with c06-math-0337 asymptotically standard normal.

Consequently, denoting c06-math-0338, we have

6.3.1 The Resampling Methodology in Action

How does the resampling methodology then work? Given the sample c06-math-0340 from an unknown model c06-math-0341, and the functional in (6.23), c06-math-0342, c06-math-0343, consider for any c06-math-0344, c06-math-0345, the bootstrap sample

equation

from c06-math-0347, in (6.1), the empirical distribution function associated with the available random sample, c06-math-0348.

Next, associate with the bootstrap sample the corresponding bootstrap auxiliary statistic, c06-math-0349, c06-math-0350. Then, with c06-math-0351,

equation

Consequently, for another sample size c06-math-0353, and for every c06-math-0354,

equation

It is then enough to choose c06-math-0356, in order to have independence of c06-math-0357. If we consider c06-math-0358, that is, c06-math-0359, we have

On the basis of (6.26), we are now able to consistently estimate c06-math-0361 and next c06-math-0362 through (6.25), on the basis of any estimate c06-math-0363 of the second-order parameter c06-math-0364. With c06-math-0365 denoting the sample counterpart of c06-math-0366 and c06-math-0367, an adequate c06-math-0368-estimate, we thus have the c06-math-0369 estimate

with

equation

The adaptive estimate of c06-math-0372 is then given by

equation

6.3.2 Adaptive EVI Estimation

In the following Algorithm we include the Hill, the MVRB, the PORT-Hill and the PORT-MVRB EVI estimators in the overall selection.

6.4 Applications to Simulated Data

To enhance the importance of the PORT-Hill and PORT-MVRB EVI estimation in the field of finance, we refer to Gomes and Pestana (2007b) and Gomes et al. (2013), where, respectively, the MVRB and the PORT-MVRB EVI estimation have been applied to log returns associated with a few sets of financial data. Due to the specificity of such real data sets and to the fact that log returns have often been modeled by a Student-c06-math-0493 or its skewed versions (see Jones and Faddy, 2003, among others), we have simulated a random sample of size c06-math-0494, from a Student's c06-math-0495 model with c06-math-0496 degrees of freedom (c06-math-0497 and c06-math-0498). Due to the specificity of the data (infinite left end point), we have considered for both the PORT-Hill and the PORT-MVRB EVI estimation c06-math-0499-values from 0.15 to 1, with step 0.05. When c06-math-0500, we elect the Hill or the MVRB EVI estimates. If c06-math-0501, the PORT methodology is elected. We have further considered c06-math-0502, with c06-math-0503 from 0.950 to 0.995, with step 0.0025, and c06-math-0504.

Figure 6.3 is related to the Student-c06-math-0505 generated sample, and we there present as an illustration of the obtained results the PORT-Hill/Hill and PORT-MVRB/MVRB EVI estimates (a), the c06-math-0506-estimates (b), and the RMSE estimates (c). The notation PORT •/• means that we are playing with both the PORT-• c06-math-0507 and the • c06-math-0508 EVI estimators. We have however been led to a PORT estimator, that is, to c06-math-0509-estimates smaller than 1.

c06f003

Figure 6.3 PORT-Hill/Hill and PORT-MVRB/MVRB adaptive EVI estimates (a), the c06-math-0510-estimates (b), and the RMSE estimates (c) for the generated Student-c06-math-0511 sample.

6.5 Concluding Remarks

  • For the previous simulated sample, we know the true value of c06-math-0512 and the value 0.25, and we can easily assess the reliability of the estimates provided by the Algorithm in Section 6.3.2, immediately coming to the conclusion that, as expected, the PORT-MVRB methodology provides the more reliable EVI estimation.
  • It is clear that, similarly to what usually happens with the Hill EVI estimators, even the PORT-Hill EVI estimation leads to an overestimation of the EVI. The adaptive PORT-MVRB are generally closer to the target.
  • Moreover, the RMSE estimates associated with the adaptive PORT-MVRB EVI estimates are always below the RMSE estimates associated with the adaptive PORT-Hill, another point in favor of the PORT-MVRB methodology.
  • The performed case studies, including the one used here for illustration, claim obviously for a simulation study of the Algorithm and its application to real data sets. These are however topics out of the scope of this chapter.
  • As a general conclusion, we advise the use of the PORT-MVRB methodology for the estimation of a heavy right tail function.

Acknowledgments

Research partially supported by national funds through FCT—Fundação para a Ciência e a Tecnologia, projects UID/MAT/UI0006/2013 (CEA/UL) and UID/MAT/0297/2013 (CMA/UNL) and postdoc grants SFRH/BPD/77319/2011 and SFRH/BPD/72184/2010.

References

  1. Araújo Santos, P., Fraga Alves, M.I., Gomes, M.I. Peaks over random threshold methodology for tail index and quantile estimation. Revstat 2006;4(3):227–247.
  2. Beirlant, J., Dierckx, G., Goegebeur, Y., Matthys, G. Tail index estimation and an exponential regression model. Extremes 1999;2:177–200.
  3. Beirlant, J., Caeiro, F., Gomes, M.I. An overview and open research topics in the field of statistics of univariate extremes. Revstat 2012;10(1):1–31.
  4. Bingham, N.H., Goldie, C.M., Teugels, J.L. Regular Variation. Cambridge: Cambridge University Press; 1987.
  5. Caeiro, F., Gomes, M.I. A new class of estimators of a “scale” second order parameter. Extremes 2006;9:193–211, 2007.
  6. Caeiro, F., Gomes, M.I. Minimum-variance reduced-bias tail index and high quantile estimation. Revstat 2008;6(1):1–20.
  7. Caeiro, F., Gomes, M.I. A semi-parametric estimator of a shape second order parameter. In: Pacheco, A., Oliveira, M.R., Santos, R., Paulino, C.D., editors. New Advances in Statistical Modeling and Application, Studies in Theoretical and Applied Statistics, Selected Papers of the Statistical Societies. Berlin and Heidelberg: Springer-Verlag; 2014a. 137–144.
  8. Caeiro, F., Gomes, M.I. On the bootstrap methodology for the estimation of the tail sample fraction. In: Gilli, M., Gonzalez-Rodriguez, G., Nieto-Reyes, A., editors. Proceedings of COMPSTAT 2014. Genéva, Switzerland; The International Statistical Institute/International Association for Statistical Computing 2014b. 545–552.
  9. Caeiro, F., Gomes, M.I. Bias reduction in the estimation of a shape second-order parameter of a heavy tailed model. J. Stat Comput Simul 2015;85(17):3405–3419.
  10. Caeiro, F., Gomes, M.I., Henriques-Rodrigues, L. Reduced-bias tail index estimators under a third order framework. Commun Stat Theory Methods 2009;38(7):1019–1040.
  11. Caeiro, F., Gomes, M.I., Pestana, D. Direct reduction of bias of the classical Hill estimator. Revstat 2005;3(2):111–136.
  12. Caeiro, F., Gomes, M.I., Henriques-Rodrigues, L. A location invariant probability weighted moment EVI estimator. International Journal of Computer Mathematics 2016;93(4):676–695.
  13. Caers, J., Beirlant, J., Maes, M.A. Statistics for modeling heavy tailed distributions in geology: Part I. Methodology. Math Geol 1999;31:391–410.
  14. Ciuperca, G., Mercadier, C. Semi-parametric estimation for heavy tailed distributions. Extremes 2010;13(1):55–87.
  15. Danielsson, J., de Haan, L., Peng, L., de Vries, C.G. Using a bootstrap method to choose the sample fraction in tail index estimation. Journal of Multivariate Analysis 2001;76:226–248.
  16. Davison, A., Hinkley, D.V. Bootstrap Methods and their Application. Cambridge: Cambridge University Press; 1997.
  17. de Haan, L. Slow variation and characterization of domains of attraction. In: de Oliveira, T., editor. Statistical Extremes and Applications. Dordrecht: D. Reidel; 1984. 31–48.
  18. de Haan, L., Ferreira, A. Extreme Value Theory: An Introduction. New York: Springer Science+Business Media, LLC; 2006.
  19. de Haan, L., Peng, L. Comparison of tail index estimators. Stat Neerl 1998;52:60–70.
  20. Draisma, G., de Haan, L., Peng, L., Pereira, M.T. A bootstrap-based method to achieve optimality in estimating the extreme value index. Extremes 1999;2(4):367–404.
  21. Efron, B. Bootstrap methods: another look at the jackknife. Ann Stat 1979;7(1):1–26.
  22. Efron, B., Tibshirani, R.J. An Introduction to the Bootstrap. Boca Raton (FL): CRC Press; 1994.
  23. Feuerverger, A., Hall, P. Estimating a tail exponent by modelling departure from a Pareto distribution. Ann Stat 1999;27:760–781.
  24. Fisher, R.A., Tippett, L.H.C. Limiting forms of the frequency of the largest or smallest member of a sample. Proc Cambridge Philos Soc 1928;24:180–190.
  25. Fraga Alves, M.I., Gomes, M.I., de Haan, L. A new class of semi-parametric estimators of the second order parameter. Portugaliae Mathematica 2003;60(2):194–213.
  26. Geluk, J., de Haan, L. Regular Variation, Extensions and Tauberian Theorems. Amsterdam, Netherlands: CWI Tract 40, Center for Mathematics and Computer Science; 1987.
  27. Gnedenko, B.V. Sur la distribution limite du terme maximum d'une série aléatoire. Ann Math 1943;44:423–453.
  28. Goegebeur, Y., Beirlant, J., de Wet, T. Linking Pareto-tail kernel goodness-of-fit statistics with tail index at optimal threshold and second order estimation. Revstat 2008;6(1):51–69.
  29. Goegebeur, Y., Beirlant, J., de Wet, T. Kernel estimators for the second order parameter in extreme value statistics. J Stat Plann Inference 2010;140(9):2632–2652.
  30. Gomes, M.I., Guillou, A. Extreme value theory and statistics of univariate extremes: a review. Int Stat Rev 2015;83(2):263–292.
  31. Gomes, M.I., Martins, M.J. “Asymptotically unbiased” estimators of the tail index based on external estimation of the second order parameter. Extremes 2002;5(1):5–31.
  32. Gomes, M.I., Oliveira, O. The bootstrap methodology in Statistical Extremes—choice of the optimal sample fraction. Extremes 2001;4(4):331–358.
  33. Gomes, M.I., Pestana, D. A simple second order reduced-bias' tail index estimator. J Stat Comput Simul 2007a;77(6):487–504.
  34. Gomes, M.I., Pestana, D. A sturdy reduced-bias extreme quantile (VaR) estimator. J Am Stat Assoc 2007b;102(477):280–292.
  35. Gomes, M.I., Martins, M.J., Neves, M.M. Alternatives to a semi-parametric estimator of parameters of rare events—the Jackknife methodology. Extremes 2000;3(3):207–229.
  36. Gomes, M.I., Martins, M.J., Neves, M.M. Improving second order reduced-bias tail index estimation. Revstat 2007a;5(2):177–207.
  37. Gomes, M.I., Reiss, R.-D., Thomas, M. Reduced-bias estimation. In: Reiss, R.-D., Thomas, M., editors. Statistical Analysis of Extreme Values with Applications to Insurance, Finance, Hydrology and Other Fields. 3rd ed., Chapter 6. Basel, Boston (MA), Berlin: Birkhäuser Verlag; 2007b. p 189–204.
  38. Gomes, M.I., Canto e Castro, L., Fraga Alves, M.I., Pestana, D. Statistics of extremes for IID data and breakthroughs in the estimation of the extreme value index: Laurens de Haan leading contributions. Extremes 2008a;11(1):3–34.
  39. Gomes, M.I., Fraga Alves, M.I., Araújo Santos, P. PORT hill and moment estimators for heavy-tailed models. Commun Stat Simul Comput 2008b;37:1281–1306.
  40. Gomes, M.I., de Haan, L., Henriques-Rodrigues, L. Tail index estimation for heavy-tailed models: accommodation of bias in weighted log-excesses. J R Stat Soc B 2008c;70(1):31–52.
  41. Gomes, M.I., Pestana, D., Caeiro, F. A note on the asymptotic variance at optimal levels of a bias-corrected Hill estimator. Stat Probab Lett 2009;79:295–303.
  42. Gomes, M.I., Henriques-Rodrigues, L., Pereira, H., Pestana, D. Tail index and second order parameters' semi-parametric estimation based on the log-excesses. J Stat Comput Simul 2010;80(6):653–666.
  43. Gomes, M.I., Henriques-Rodrigues, L., Miranda, C. Reduced-bias location-invariant extreme value index estimation: a simulation study. Commun Stat Simul Comput 2011a;40(3):424–447.
  44. Gomes, M.I., Mendonça, S., Pestana, D. Adaptive reduced-bias tail index and VaR estimation via the bootstrap methodology. Commun Stat Theory Methods 2011b;40(16):2946–2968.
  45. Gomes, M.I., Figueiredo, F., Neves, M.M. Adaptive estimation of heavy right tails: resampling-based methods in action. Extremes 2012;15:463–489.
  46. Gomes, M.I., Henriques-Rodrigues, L., Fraga Alves, M.I., Manjunath, B.G. Adaptive PORT-MVRB estimation: an empirical comparison of two heuristic algorithms. J Stat Comput Simul 2013;83(6):1129–1144.
  47. Gomes, M.I., Henriques-Rodrigues, L., Figueiredo, F. Resampling-based methodologies in statistics of extremes: environmental and financial applications. In: Bourguignon, J.-P., Jeltsch, R., Adrega Pinto, A., Viana, M., editors. Mathematics of Planet Earth: Energy and Climate, CIM Series in Mathematical Sciences, Chapter 8. Switzerland: Springer-Verlag; 2015a. p 197–215.
  48. Gomes, M.I., Figueiredo, F., Martins, M.J., Neves, M.M. Resampling methodologies and reliable tail estimation. S Afr Stat J 2015b;49:1–20.
  49. Gomes, M.I., Henriques-Rodrigues, L., Manjunath, B.L. Mean-of-order-p location-invariant extreme value index estimation. Revstat 2016;14(3):273–296.
  50. Gumbel, E.J. Statistics of Extremes. New York: Columbia University Press; 1958.
  51. Hall, P. Using the bootstrap to estimate mean squared error and select smoothing parameter in nonparametric problems. J Multivariate Anal 1990;32:177–203.
  52. Hill, B.M. A simple general approach to inference about the tail of a distribution. Ann Stat 1975;3:1163–1174.
  53. Henriques-Rodrigues, L., Gomes, M.I., Fraga Alves, M.I., Neves, C. PORT estimation of a shape second-order parameter. Revstat 2014;12(3):299–328.
  54. Henriques-Rodrigues, L., Gomes, M.I., Manjunath, B.G. Estimation of a scale second-order parameter related to the PORT methodology. Journal of Statistical Theory and Practice 2015;9(3):571–599.
  55. Jones, M.C., Faddy, M.J. A skew extension of the c06-math-0513-distribution, with applications. J R Stat Soc B 2003;65(1):159–174.
  56. Longin, F. Le choix de la loi des rentabilités d'actifs financiers: les valeurs extrémes peuvent aider. Finance 1995;16:25–47.
  57. Peng, L. Asymptotically unbiased estimator for the extreme-value index. Stat Probab Lett 1998;38(2):107–115.
  58. Peng, L., Qi, Y. Estimating the first and second order parameters of a heavy tailed distribution. Aust N Z J Stat 2004;46:305–312.
  59. Reiss, R.-D., Thomas, M. Statistical Analysis of Extreme Values with Applications to Insurance, Finance, Hydrology and Other Fields. 3rd ed. Basel, Boston (MA), Berlin: Birkhäuser Verlag; 2007.
  60. Resnick, S. Heavy tail modelling and teletraffic data. Ann Stat 1997;25(5):1805–1869.
  61. Weissman, I. Estimation of parameters and large quantiles based on the c06-math-0514 largest observations. J Am Stat Assoc 1978;73:812–815.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset