Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 6
Bootstrap Methods in Statistics of Extremes

M. Ivette Gomes¹, Frederico Caeiro², Lígia Henriques-Rodrigues³ and B.G. Manjunath⁴

¹Universidade de Lisboa, FCUL, DEIO and CEAUL, Portugal

²Universidade Nova de Lisboa, FCT and CMA, Portugal

³Universidade de São Paulo, IME and CEAUL, Brazil

⁴Universidade de Lisboa, CEAUL, Portugal

AMS 2010 subject classification. Primary 62G32, 62E20; Secondary 65C05.

AMS 2000 subject classification. Primary 62G32, 62E20; Secondary 62G09, 62G30.

6.1 Introduction

Let $c06-math-0002$ be a random sample from an underlying cumulative distribution function (CDF) $c06-math-0003$ . If we assume that $c06-math-0004$ is known, we can easily estimate the sampling distribution of any estimator $c06-math-0005$ of an unknown parameter $c06-math-0006$ through the use of a Monte Carlo simulation, described in the following algorithm:

S1. For ,
1. S1.1 generate random samples $c06-math-0008$ ,
2. S1.2 and compute $c06-math-0009$ .
S2. On the basis of the output $c06-math-0010$ , after the $c06-math-0011$ iterations in Step S1, use such a sample to estimate the sampling distribution of $c06-math-0012$ , through either the associated empirical distribution function or any kernel estimate, among others.

If $c06-math-0013$ goes to infinity, something not achievable in practice, we should then get a perfect match to the theoretical calculation, if available, that is, the Monte Carlo error should disappear. But $c06-math-0014$ is usually unknown. How to proceed? The use of the bootstrap methodology is a possible way.

Bootstrapping (Efron, 1979) is essentially a computer-based and computer-intensive method for assigning measures of accuracy to sample estimates (see Efron and Tibshirani, 1994; Davison and Hinkley, 1997, among others). Concomitantly, this technique also allows estimation of the sampling distribution of almost any statistic using only very simple resampling methods, based on the observed value of the empirical distribution function, given by

6.1

We can replace in the previously sketched algorithm $c06-math-0016$ by $c06-math-0017$ , the empirical distribution function associated with the original observed data, $c06-math-0018$ , which puts mass $c06-math-0019$ on each of the $c06-math-0020$ , generating with replacement $c06-math-0021$ , in Step S1.1 of the algorithm in the preceding text, computing $c06-math-0022$ , $c06-math-0023$ , in Step S1.2, and using next such a sample in Step S2.

The main goal of this chapter is to enhance the role of the bootstrap methodology in the field of statistics of univariate extremes, where the bootstrap has been commonly used in the choice of the number $c06-math-0024$ of top order statistics or of the optimal sample fraction, $c06-math-0025$ , to be taken in the semiparametric estimation of a parameter of extreme events. For an asymptotically consistent choice of the threshold to use in the adaptive estimation of a positive extreme value index (EVI), $c06-math-0026$ , the primary parameter in statistics of extremes, we suggest and discuss a double-bootstrap algorithm. In such algorithm, apart from the classical Hill (1975) and peaks over random threshold (PORT)-Hill EVI estimators (Araújo Santos et al., 2006), we consider a class of minimum-variance reduced-bias (MVRB), the simplest one in Caeiroet al. (2005), and associated PORT-MVRB (Gomes et al., 2011a, 2013) EVI estimators. Other bootstrap methods for the choice of $c06-math-0027$ can be found in Hall (1990), Longin (1995), Caers et al. (1999), Draisma et al. (1999), Danielsson et al. (2001), and Gomes and Oliveira (2001), among others. For a recent comparison between the simple-bootstrap and the double-bootstrap methodology, see Caeiro and Gomes (2014b), where an improved version of Hall's bootstrap methodology was introduced.

After providing, in Section 6.2, a few technical details in the area of extreme value theory (EVT), related to the EVI estimators under consideration in this chapter, we shall briefly discuss, in Section 6.3, the main ideas behind the bootstrap methodology and optimal sample fraction estimation. In the lines of Gomes et al. (2011b–2012, 2015a), we propose an algorithm for the adaptive consistent estimation of a positive EVI, through the use of resampling computer-intensive methods. The Algorithm is described for the Hill EVI estimator and associated PORT-Hill, MVRB, and PORT-MVRB EVI estimators, but it can work similarly for the estimation of other parameters of extreme events, like a high quantile, the probability of exceedance, or the return period of a high level. The associated code in R language for the adaptive EVI estimation is available upon request. Section 6.4 is entirely dedicated to the application of the Algorithm to three simulated samples. Finally, in Section 6.5, we draw some overall conclusions.

6.2 A Few Details on EVT

The key results obtained by Fisher and Tippett (1928) on the possible limiting laws of the sample maxima, formalized by Gnedenko (1943), and used by Gumbel (1958) for applications of EVT in engineering subjects are some of the key tools that led to the way statistical EVT has been exploding in the last decades. In this chapter, we focus on the behavior of extreme values of a data set, dealing with maximum values and other top order statistics in a univariate framework, working thus in the field of statistics of extremes.

Let us assume that we have access to a random sample $c06-math-0028$ of independent, identically distributed, or possibly stationary and weakly dependent random variables from an underlying model $c06-math-0029$ , and let us denote by $c06-math-0030$ the sample of associated ascending order statistics. As usual, let us further assume that it is possible to linearly normalize the sequence of maximum values, $c06-math-0031$ , so that we get a nondegenerate limit. Then (Gnedenko, 1943), that limiting random variable has a CDF of the type of the extreme value distribution, given by

6.2

and $c06-math-0033$ is the so-called EVI, the primary parameter in statistics of extremes. We then say that $c06-math-0034$ is in the max-domain of attraction of $c06-math-0035$ , in (6.2), and use the notation $c06-math-0036$ . The EVI measures essentially the weight of the right tail function, $c06-math-0037$ . If $c06-math-0038$ , the right tail is short and light, since $c06-math-0039$ has compulsory a finite right end point, that is, $c06-math-0040$ is finite. If $c06-math-0041$ , the right tail is heavy and of a negative polynomial type, and $c06-math-0042$ has an infinite right end point. A positive EVI is also often called tail index. If $c06-math-0043$ , the right tail is of an exponential type, and the right end point can then be either finite or infinite.

Slightly more restrictively than the full max-domain of attraction of the extreme value distribution, we now consider a positive EVI, that is, we work with heavy-tailed models $c06-math-0044$ in $c06-math-0045$ . Heavy-tailed models appear often in practice in fields like bibliometrics, biostatistics, finance, insurance, and telecommunications. Power laws, such as the Pareto distribution and Zipf's law, have been observed a few decades ago in some important phenomena in economics and biology and have seriously attracted scientists in recent years. As usual, we shall further use the notations $c06-math-0046$ for the generalized inverse function of $c06-math-0047$ and $c06-math-0048$ for the class of regularly varying functions at infinity with an index of regular variation $c06-math-0049$ , that is, positive Borel measurable functions $c06-math-0050$ such that $c06-math-0051$ , as $c06-math-0052$ , for all $c06-math-0053$ (see Bingham et al., 1987, for details on regular variation). Let us further use the notation $c06-math-0054$ for the tail quantile function. For heavy-tailed models we have the validity of the following first-order conditions:

6.3

The first necessary and sufficient condition in the preceding text, related to the right tail function behavior, was proved by Gnedenko (1943), and the second one, related to the tail quantile function behavior, was proved by de Haan (1984).

For these heavy-tailed models, and given a sample $c06-math-0056$ , the classical EVI estimators are Hill estimators (Hill, 1975), with the functional expression

6.4

They are thus the average of the $c06-math-0058$ log-excesses, $c06-math-0059$ , above the random level or threshold $c06-math-0060$ . To have consistency of Hill EVI estimators, we need to have $c06-math-0061$ , and such a random threshold $c06-math-0062$ needs further to be an intermediate order statistic, that is, we need to have

6.5

if we want to have consistent EVI estimation in the whole $c06-math-0064$ . Indeed, under any of the first-order frameworks in (6.3), the log-excesses, $c06-math-0065$ , are approximately the $c06-math-0066$ order statistics of an exponential sample of size $c06-math-0067$ , with mean value $c06-math-0068$ , hence the reason for the EVI estimators in (6.4).

Under adequate second-order conditions that rule the rate of convergence in any of the first-order conditions in (6.3), Hill estimators, $c06-math-0069$ , have usually a high asymptotic bias, and recently, several authors have considered different ways of reducing bias (see the overviews in Gomes et al., 2007b, Chapter 6 of Reiss and Thomas, 2007; Gomes et al., 2008a; Beirlant et al., 2012; Gomes and Guillou, 2015). A simple class of MVRB EVI estimators is the class studied in Caeiro et al. (2005), to be introduced in Section 6.2.2. These MVRB EVI estimators depend on the adequate estimation of second-order parameters, and the kind of second-order parameter estimation that enables the building of MVRB EVI estimators, that is, EVI estimators that outperform the Hill estimator for all $c06-math-0070$ , is sketched in Sections 6.2.1 and 6.2.2.

Both Hill and MVRB EVI estimators are invariant to changes in scale but not invariant to changes in location. And particularly the Hill EVI estimators can suffer drastic changes when we induce an arbitrary shift in the data, as can be seen in Figure 6.1.

c06f001 — **Figure 6.1** Hill plots, denoted $c06-math-0071$ , associated with unit Pareto samples of size $c06-math-0072$ , from the model $c06-math-0073$ , for $c06-math-0074$ and $c06-math-0075$ .

Indeed, even if a Hill plot (a function of $c06-math-0076$ vs $c06-math-0077$ ) looks stable, as happens in Figure 6.1, with the $c06-math-0078$ sample path, where data, $c06-math-0079$ , $c06-math-0080$ , come from a unit standard Pareto CDF, $c06-math-0081$ , for $c06-math-0082$ ), we easily come to the so-called Hill horror plots, a terminology used in Resnick (1997), when we induce a shift to the data. This can be seen also in Figure 6.1 (look now at $c06-math-0083$ ), where we present the Hill plot associated with the shifted sample $c06-math-0084$ , from the CDF $c06-math-0085$ , now for $c06-math-0086$ . This led Araújo Santos et al. (2006) to introduce the so-called PORT methodology, to be sketched in Section 6.2.3. The asymptotic behavior of the EVI estimators under consideration is discussed in Section 6.2.4.

6.2.1 Second-Order Reduced-Bias EVI Estimation

For consistent semiparametric EVI estimation, in the whole $c06-math-0087$ , we have already noticed that we merely need to work with adequate functionals, dependent on an intermediate tuning or control parameter $c06-math-0088$ , the number of top order statistics involved in the estimation, that is, (6.5) should hold. To obtain full information on the nondegenerate asymptotic behavior of semiparametric EVI estimators, we often need to further assume a second-order condition, ruling the rate of convergence in any of the first-order conditions, in (6.3). It is often assumed that there exists a function $c06-math-0089$ , such that

6.6

Then, we have $c06-math-0091$ . Moreover, if the limit in the left-hand side of (6.6) exists, we can choose $c06-math-0092$ so that such a limit is compulsory equal to the previously defined $c06-math-0093$ function (Geluk and de Haan, 1987).

Whenever dealing with reduced-bias estimators of parameters of extreme events, and essentially due to technical reasons, it is common to slightly restrict the domain of attraction, $c06-math-0094$ , and to consider a Pareto-type class of models, assuming that, with $c06-math-0095$ , $c06-math-0096$ , $c06-math-0097$ , and as $c06-math-0098$ ,

6.7

The class in (6.7) is however a wide class of models that contains most of the heavy-tailed parents useful in applications, like the Fréchet, the generalized Pareto, and the Student- $c06-math-0100$ , with $c06-math-0101$ degrees of freedom. For Fréchet parents, we get $c06-math-0102$ and $c06-math-0103$ in (6.7). For a generalized Pareto distribution, $c06-math-0104$ , with $c06-math-0105$ given in (6.2), we get $c06-math-0106$ and $c06-math-0107$ . For Student- $c06-math-0108$ parents, we get $c06-math-0109$ and $c06-math-0110$ . For further details and an explicit expression of $c06-math-0111$ as a function of $c06-math-0112$ , see Caeiro and Gomes (2008), among others. Note that the validity of (6.6) with $c06-math-0113$ is equivalent to (6.7). To obtain information on the bias of MVRB EVI estimators, it is even common to slightly restrict the class of models in (6.7), further assuming the following third-order condition:

6.8

as $c06-math-0115$ , with $c06-math-0116$ . All the aforementioned models still belong to this class. Slightly more generally, we could have assumed a general third-order condition, ruling now the rate of convergence in the second-order condition in (6.6), which guarantees that, for all $c06-math-0117$ ,

6.9

where $c06-math-0119$ must then be in $c06-math-0120$ . Equation (6.8) is equivalent to equation (6.9) with $c06-math-0121$ . Further details on the topic can be found in de Haan and Ferreira (2006).

Provided that (6.5) and (6.6) hold, Hill EVI estimators, $c06-math-0122$ , have usually a high asymptotic bias. The adequate accommodation of this bias has recently been extensively addressed. Among the pioneering papers, we mention Peng (1998), Beirlant et al. (1999), Feuerverger and Hall (1999), and Gomes et al. (2000). In these papers, authors are led to reduced-bias EVI estimators, with asymptotic variances larger than or equal to $c06-math-0123$ , where $c06-math-0124$ is the aforementioned “shape” second-order parameter in (6.6). Recently, as sketched in Section 6.2.2, Caeiro et al. (2005) and Gomes et al. (2007a, 2008c) have been able to reduce the bias without increasing the asymptotic variance, kept at $c06-math-0125$ , just as happens with the Hill EVI estimator.

6.2.2 MVRB EVI Estimation

To reduce bias, keeping the asymptotic variance at the same level, we merely need to use an adequate “external” and a bit more than consistent estimation of the pair of second-order parameters, $c06-math-0126$ , in (6.7). The MVRB EVI estimators outperform the classical Hill EVI estimators for all $c06-math-0127$ , and among them, we now consider the simplest class by Caeiro et al. (2005), used for value-at-risk (VaR) estimation by Gomes and Pestana (2007b). Such a class, denoted by $c06-math-0128$ , has the functional form

6.10

where $c06-math-0130$ is an adequate consistent estimator of $c06-math-0131$ , with $c06-math-0132$ and $c06-math-0133$ based on a number of top order statistics $c06-math-0134$ usually of a higher order than the number of top order statistics $c06-math-0135$ used in the EVI estimation, as explained in Sections 6.2.1 and 6.2.2. For different algorithms for the estimation of $c06-math-0136$ , see Gomes and Pestana (2007a,b).

6.2.2.1 Estimation of the “shape” Second-order Parameter

We consider the most commonly used $c06-math-0137$ -estimators, the ones studied by Fraga Alves et al. (2003), briefly introduced in the sequel. Given the sample $c06-math-0138$ , the $c06-math-0139$ -estimators by Fraga Alves et al. (2003) are dependent on the statistics

6.11

defined for any tuning parameter $c06-math-0141$ and where

Under mild restrictions on $c06-math-0143$ , that is, if (6.5) holds and $c06-math-0144$ , with $c06-math-0145$ the function in (6.7), the statistics in (6.11) converge toward $c06-math-0146$ , independently of the tuning parameter $c06-math-0147$ , and we can consequently consider the class of admissible $c06-math-0148$ -estimators:

6.12

Under adequate general conditions, and for an appropriate tuning parameter $c06-math-0150$ , the $c06-math-0151$ -estimators in (6.12) show highly stable sample paths as functions of $c06-math-0152$ , the number of top order statistics used, for a range of large $c06-math-0153$ -values. Again, it is sensible to advise practitioners not to choose blindly the value of $c06-math-0154$ in (6.12). Sample paths of $c06-math-0155$ , as functions of $c06-math-0156$ , for a few values of $c06-math-0157$ , should be drawn, in order to elect the value of $c06-math-0158$ , which provides higher stability for large $c06-math-0159$ , by means of any stability criterion. For the most common stability criterion, see Gomes and Pestana (2007b) and Remark 6.6. The value $c06-math-0160$ , considered in the description of the Algorithm in Section 6.3.2, has revealed to be the most adequate choice whenever we are in the region $c06-math-0161$ , a common region in applications, and the region where bias reduction is indeed needed. Distributional properties of the estimators in (6.12) can be found in Fraga Alves et al. (2003). Interesting alternative classes of $c06-math-0162$ -estimators have recently been introduced by Goegebeur et al. (2008, 2010), Ciuperca and Mercadier (2010), and Caeiro and Gomes (2014a, 2015).

6.2.2.2 Estimation of the “scale” Second-order Parameter

For the estimation of the scale second-order parameter $c06-math-0163$ , on the basis of

we shall consider the estimator in Gomes and Martins (2002):

6.13

dependent on an adequate $c06-math-0166$ -estimator, $c06-math-0167$ . It has been advised the computation of these second-order parameter estimators at a $c06-math-0168$ -value given by

6.14

The estimator $c06-math-0170$ , to be plugged in (6.13), is thus $c06-math-0171$ , with $c06-math-0172$ and $c06-math-0173$ given in (6.12) and (6.14), respectively.

Remark 6.1

Note that only the external estimation of both $c06-math-0174$ and $c06-math-0175$ at an adequately chosen level $c06-math-0176$ and the EVI-estimation at a level $c06-math-0177$ , or at a specific value $c06-math-0178$ , can lead to an MVRB EVI-estimator, with an asymptotic variance $c06-math-0179$ . Such a choice of $c06-math-0180$ is theoretically possible, as shown in Gomes et al. (2009) and Caeiro et al. (2009), but under conditions difficult to guarantee in practice. As a compromise between theoretical and practical results, we have so far advised any choice $c06-math-0181$ , with $c06-math-0182$ small $c06-math-0183$ see Caeiro et al., 2005, 2009; Gomes et al., 2007a,b, 2008c, among others $c06-math-0184$ . With the choice of $c06-math-0185$ in (6.14), we have obviously the validity of condition (6.5), and whenever $c06-math-0186$ , as $c06-math-0187$ $c06-math-0188$ an almost irrelevant restriction, from a practical point of view $c06-math-0189$ , we get $c06-math-0190$ , a condition needed, in order not to have any increase in the asymptotic variance of the bias-corrected Hill EVI-estimator in equation (6.10), comparatively with the one of the Hill EVI-estimator, in (6.4).

Details on the distributional behavior of the estimator in (6.13) can be found in Gomes and Martins (2002) and more recently in Gomes et al. (2008c) and Caeiro et al. (2009). Again, consistency is achieved for models in (6.7) and $c06-math-0208$ -values such that (6.5) holds and $c06-math-0209$ , as $c06-math-0210$ . Alternative estimators of $c06-math-0211$ can be found in Caeiro and Gomes (2006) and Gomes et al. (2010). Due to the fact that $c06-math-0212$ and $c06-math-0213$ , with $c06-math-0214$ , $c06-math-0215$ , and $c06-math-0216$ given in (6.12)–(6.14), respectively, depending on $c06-math-0217$ , we often use the notation $c06-math-0218$ . But when we work with $c06-math-0219$ only, as happens in Section 6.3.2, we shall not use the subscript $c06-math-0220$ . Note however that the Algorithm in Section 3.2 can also be used for another fixed choice of $c06-math-0221$ , as well as for a data-driven choice of $c06-math-0222$ provided by any of the algorithms in Gomes and Pestana (2007a,b), among others.

6.2.3 PORT EVI Estimation

The estimators in (6.4) and (6.10) are scale invariant but not location invariant. In order to achieve location invariance for a class of modified Hill EVI estimators and adequate properties for VaR estimators, Araújo Santos et al. (2006) introduced the so-called PORT methodology. The estimators are then functionals of a sample of excesses over a random level $c06-math-0223$ , $c06-math-0224$ , that is, functionals of the sample

6.15

Generally, we can have $c06-math-0226$ , for any $c06-math-0227$ (the random level is an empirical quantile). If the underlying model $c06-math-0228$ has a finite left end point, $c06-math-0229$ , we can also use $c06-math-0230$ (the random level can then be the minimum).

If we think, for instance, on Hill EVI estimators, in (6.4), the new classes of PORT-Hill EVI estimators, theoretically studied in Araújo Santos et al. (2006), and for finite samples in Gomes et al. (2008b), are given by

6.16

Similarly, if we think on the MVRB EVI estimators, in (6.10), the new classes of PORT-MVRB EVI estimators, studied for finite samples in Gomes et al. (2011a, 2013), are given by

6.17

with $c06-math-0233$ in (6.16), $c06-math-0234$ and $c06-math-0235$ any adequate estimator of $c06-math-0236$ , the vector of second-order parameters associated with the shifted model, based on the sample $c06-math-0237$ , in (6.15).

These PORT EVI estimators are thus dependent on a tuning parameter $c06-math-0238$ , 0⩽q<1, that makes them highly flexible. Moreover, they are invariant to changes in both location and scale. Just as in Gomes et al. (2013, 2015a), we shall further include in the algorithm the value $c06-math-0239$ , so that with $c06-math-0240$ , $c06-math-0241$ , $c06-math-0242$ , and $c06-math-0243$ , given in (6.4), (6.10), (6.16), and (6.17), respectively, we can consider that $c06-math-0244$ and $c06-math-0245$ for $c06-math-0246$ (with the notations $c06-math-0247$ , $c06-math-0248$ , so that $c06-math-0249$ , $c06-math-0250$ , $c06-math-0251$ ).

Further applications of the PORT methodology can be found in Henriques-Rodrigues et al. (2014, 2015), Caeiro et al. (2016) and Gomes et al. (2016), among others.

6.2.4 Asymptotic Properties of the EVI Estimators

The Hill estimator reveals usually a high asymptotic bias. Indeed, from the results of de Haan and Peng (1998), and with $c06-math-0266$ denoting a normal random variable with mean value $c06-math-0267$ and variance $c06-math-0268$ , there exists $c06-math-0269$ such that

6.20

where the bias $c06-math-0271$ under condition (6.8) can be very large, moderate, or small, going, respectively, to $c06-math-0272$ , a nonnull constant, or 0, as $c06-math-0273$ . This nonnull asymptotic bias, together with a rate of convergence of the order of $c06-math-0274$ , leads to sample paths with a high variance for small $c06-math-0275$ , a high bias for large $c06-math-0276$ , and a very sharp mean square error (MSE) pattern, as a function of $c06-math-0277$ . Under the same conditions as before, $c06-math-0278$ is asymptotically normal with variance also equal to $c06-math-0279$ but with a null mean value. Indeed, under the validity of the aforementioned third-order condition in (6.8), related to Pareto-type class of models, we can adequately estimate the vector of second-order parameters $c06-math-0280$ so that $c06-math-0281$ outperforms $c06-math-0282$ for all $c06-math-0283$ . Indeed, and for an adequate $c06-math-0284$ , computed by Caeiro et al. (2009), we can write

6.21

We can further summarize the aforementioned results in the following theorem.

For the asymptotic behavior of the PORT-Hill EVI estimators, we refer to Araújo Santos et al. (2006). The full asymptotic behavior of the PORT-MVRB EVI estimators is still under development. It is known that the rate of convergence and asymptotic variance do not change. There are however big changes in the bias but for adequate $c06-math-0304$ -values the PORT-MVRB EVI estimators are indeed MVRB EVI estimators. Contrarily to what has been done by Gomes et al. (2015a), we shall thus consider for them the same double-bootstrap Algorithm we used for the MVRB EVI estimation.

6.3 The Bootstrap Methodology in Statistics of Univariate Extremes

The use of bootstrap resampling methodologies has revealed to be promising in the choice of the nuisance tuning or control parameter $c06-math-0305$ or equivalently of the optimal sample fraction, $c06-math-0306$ , in the semiparametric estimation of any parameter of extreme events. If we ask how to choose the tuning parameter $c06-math-0307$ in the EVI estimation, either through $c06-math-0308$ or $c06-math-0309$ or $c06-math-0310$ or $c06-math-0311$ , $c06-math-0312$ , generally denoted $c06-math-0313$ , we usually consider the estimation of

6.22

To obtain estimates of $c06-math-0315$ , one can use the so-called double-bootstrap method based on two related bootstrap samples of size $c06-math-0316$ and $c06-math-0317$ . Such a method is applied to an adequate auxiliary statistic like

6.23

which tends to the well-known value zero and has an asymptotic behavior similar to the one of $c06-math-0319$ (see Gomes and Oliveira, 2001, among others, for the estimation through $c06-math-0320$ and Gomes et al., 2012, for the estimation through $c06-math-0321$ ). See also Gomes et al. (2015a,b) and Section 6.3.2.

On the basis of (6.20) and (6.21), and with AMSE standing for “asymptotic MSE,” the sum of the asymptotic variance and the squared dominant component of the bias, we get

6.24

with $c06-math-0323$ defined in (6.22). See Theorem 1 of Draisma et al. (1999), for a proof of this result, in the case of $c06-math-0324$ . The proof is similar for the cases of $c06-math-0325$ , as already mentioned by Gomes et al. (2012). Things work more intricately for the PORT-MVRB EVI estimators, and as mentioned in the preceding text, we shall consider an algorithm similar to the one devised for the MVRB EVI estimators in case we are working with $c06-math-0326$ , $c06-math-0327$ , since we are interested in the possible specific value of $c06-math-0328$ that makes these PORT estimators MVRB EVI estimators. The bootstrap methodology enables us to estimate $c06-math-0329$ , in (6.22), in a way similar to the one used for the classical EVI estimators, on the basis of a consistent estimator of $c06-math-0330$ , in (6.24), and now through the use of an auxiliary statistic like the one in (6.23), a method detailed in Gomes et al. (2011b–2012) for the MVRB EVI estimation. For the sake of simplicity, we shall next describe the methodology for $c06-math-0331$ , but similar formulas work for $c06-math-0332$ provided that we replace $c06-math-0333$ by $c06-math-0334$ , $c06-math-0335$ . Indeed, under the aforementioned third-order framework in (6.8),

with $c06-math-0337$ asymptotically standard normal.

Consequently, denoting $c06-math-0338$ , we have

6.25

6.3.1 The Resampling Methodology in Action

How does the resampling methodology then work? Given the sample $c06-math-0340$ from an unknown model $c06-math-0341$ , and the functional in (6.23), $c06-math-0342$ , $c06-math-0343$ , consider for any $c06-math-0344$ , $c06-math-0345$ , the bootstrap sample

from $c06-math-0347$ , in (6.1), the empirical distribution function associated with the available random sample, $c06-math-0348$ .

Next, associate with the bootstrap sample the corresponding bootstrap auxiliary statistic, $c06-math-0349$ , $c06-math-0350$ . Then, with $c06-math-0351$ ,

Consequently, for another sample size $c06-math-0353$ , and for every $c06-math-0354$ ,

It is then enough to choose $c06-math-0356$ , in order to have independence of $c06-math-0357$ . If we consider $c06-math-0358$ , that is, $c06-math-0359$ , we have

6.26

On the basis of (6.26), we are now able to consistently estimate $c06-math-0361$ and next $c06-math-0362$ through (6.25), on the basis of any estimate $c06-math-0363$ of the second-order parameter $c06-math-0364$ . With $c06-math-0365$ denoting the sample counterpart of $c06-math-0366$ and $c06-math-0367$ , an adequate $c06-math-0368$ -estimate, we thus have the $c06-math-0369$ estimate

6.27

with

The adaptive estimate of $c06-math-0372$ is then given by

6.3.2 Adaptive EVI Estimation

In the following Algorithm we include the Hill, the MVRB, the PORT-Hill and the PORT-MVRB EVI estimators in the overall selection.

Algorithm

Adaptive bootstrap estimation of $c06-math-0374$

Consider a finite set $c06-math-0375$ with values in $c06-math-0376$ and define $c06-math-0377$ . For example, if $c06-math-0378$ has a finite left end point, we can select $c06-math-0379$ . On the other hand, if $c06-math-0380$ has an infinite left end point, we should not select values close to zero.
Given an observed sample , execute the following steps, for each :
1. 2.1 Obtain the sample $c06-math-0383$ , in (6.15). If $c06-math-0384$ , $c06-math-0385$ .
2. 2.2 Compute, for the tuning parameter $c06-math-0386$ , the observed values of $c06-math-0387$ , with $c06-math-0388$ defined in (6.12).
3. 2.3 Work with $c06-math-0389$ and $c06-math-0390$ , with $c06-math-0391$ and $c06-math-0392$ given in (6.13) and (6.14), respectively.
4. 2.4 Compute $c06-math-0393$ , in (6.16), and $c06-math-0394$ , in (6.17), for $c06-math-0395$ .
5. 2.5 Consider subsamples of size $c06-math-0396$ and $c06-math-0397$ .
6. 2.6 For $c06-math-0398$ from 1 to $c06-math-0399$ , independently generate from the observed empirical distribution function, $c06-math-0400$ , associated with the observed sample $c06-math-0401$ , $c06-math-0402$ bootstrap samples
  
  with sizes $c06-math-0404$ and $c06-math-0405$ , respectively.
7. 2.7 Generally denoting by $c06-math-0406$ any of the estimators under study, let us denote by $c06-math-0407$ the bootstrap counterpart of the auxiliary statistic in (6.23), obtain
  
  the observed values of the statistics $c06-math-0409$ , compute
  
  and obtain $c06-math-0411$ , $c06-math-0412$ .
8. 2.8 Compute $c06-math-0413$ in (6.27).
9. 2.9 Obtain $c06-math-0414$ and $c06-math-0415$ .
With $c06-math-0416$ , compute for $c06-math-0417$ and all values $c06-math-0418$ ,

as well as
Compute $c06-math-0421$ and $c06-math-0422$ .
Obtain the adaptive EVI estimates:

Remark 6.8

A few practical questions may be raised under the setup developed: How does the asymptotic method work for moderate sample sizes? What is the type of the sample path of the new estimator for different values of $c06-math-0450$ ? What is the dependence of the method on the choice of $c06-math-0451$ ? What is the sensitivity of the method with respect to the choice of the $c06-math-0452$ -estimator? Although aware of the theoretical need of $c06-math-0453$ , what happens if we choose $c06-math-0454$ ? Answers to these questions were given by Gomes and Oliveira (2001) for the estimation of $c06-math-0455$ through the Hill estimator and can be addressed here. Quite often, the method is only moderately dependent on the choice of the nuisance parameter $c06-math-0456$ , in Step 5 of the algorithm, particularly for the MVRB EVI-estimators. This enhances the practical value of the method. Moreover, although aware of the need of $c06-math-0457$ , it seems that we get good results up till $c06-math-0458$ , again particularly for the MVRB EVI-estimator, $c06-math-0459$ , in (6.10). To detect the sensitivity of the algorithm to changes of $c06-math-0460$ , we have run it for $c06-math-0461$ and values of $c06-math-0462$ , $c06-math-0463$ , different values of $c06-math-0464$ , and different models. In Figure 6.2, as an illustration, we present for a Fréchet underlying parent, from a CDF $c06-math-0465$ , $c06-math-0466$ , with $c06-math-0467$ , the bootstrap $c06-math-0468$ -estimates $c06-math-0469$ and $c06-math-0470$ as a function of $c06-math-0471$ , for $c06-math-0472$ and $c06-math-0473$ .

Figure 6.2 Bootstrap adaptive EVI estimates, $c06-math-0444$ and $c06-math-0445$ , as a function of $c06-math-0446$ , in $c06-math-0447$ , for $c06-math-0448$ (a) and $c06-math-0449$ (b).

A few comments on the results:

As expected, and due to the fact that the method works asymptotically, there is a general improvement in the estimation as the sample size, $c06-math-0474$ , increases.
The sensitivity of the Algorithm in Section 6.3.2 to the nuisance parameter $c06-math-0475$ is quite weak for both $c06-math-0476$ and $c06-math-0477$ , particularly if $c06-math-0478$ is large.

6.4 Applications to Simulated Data

To enhance the importance of the PORT-Hill and PORT-MVRB EVI estimation in the field of finance, we refer to Gomes and Pestana (2007b) and Gomes et al. (2013), where, respectively, the MVRB and the PORT-MVRB EVI estimation have been applied to log returns associated with a few sets of financial data. Due to the specificity of such real data sets and to the fact that log returns have often been modeled by a Student- $c06-math-0493$ or its skewed versions (see Jones and Faddy, 2003, among others), we have simulated a random sample of size $c06-math-0494$ , from a Student's $c06-math-0495$ model with $c06-math-0496$ degrees of freedom ( $c06-math-0497$ and $c06-math-0498$ ). Due to the specificity of the data (infinite left end point), we have considered for both the PORT-Hill and the PORT-MVRB EVI estimation $c06-math-0499$ -values from 0.15 to 1, with step 0.05. When $c06-math-0500$ , we elect the Hill or the MVRB EVI estimates. If $c06-math-0501$ , the PORT methodology is elected. We have further considered $c06-math-0502$ , with $c06-math-0503$ from 0.950 to 0.995, with step 0.0025, and $c06-math-0504$ .

Figure 6.3 is related to the Student- $c06-math-0505$ generated sample, and we there present as an illustration of the obtained results the PORT-Hill/Hill and PORT-MVRB/MVRB EVI estimates (a), the $c06-math-0506$ -estimates (b), and the RMSE estimates (c). The notation PORT •/• means that we are playing with both the PORT-• $c06-math-0507$ and the • $c06-math-0508$ EVI estimators. We have however been led to a PORT estimator, that is, to $c06-math-0509$ -estimates smaller than 1.

c06f003 — **Figure 6.3** PORT-Hill/Hill and PORT-MVRB/MVRB adaptive EVI estimates (a), the $c06-math-0510$ -estimates (b), and the RMSE estimates (c) for the generated Student- $c06-math-0511$ sample.

6.5 Concluding Remarks

For the previous simulated sample, we know the true value of $c06-math-0512$ and the value 0.25, and we can easily assess the reliability of the estimates provided by the Algorithm in Section 6.3.2, immediately coming to the conclusion that, as expected, the PORT-MVRB methodology provides the more reliable EVI estimation.
It is clear that, similarly to what usually happens with the Hill EVI estimators, even the PORT-Hill EVI estimation leads to an overestimation of the EVI. The adaptive PORT-MVRB are generally closer to the target.
Moreover, the RMSE estimates associated with the adaptive PORT-MVRB EVI estimates are always below the RMSE estimates associated with the adaptive PORT-Hill, another point in favor of the PORT-MVRB methodology.
The performed case studies, including the one used here for illustration, claim obviously for a simulation study of the Algorithm and its application to real data sets. These are however topics out of the scope of this chapter.
As a general conclusion, we advise the use of the PORT-MVRB methodology for the estimation of a heavy right tail function.

Acknowledgments

Research partially supported by national funds through FCT—Fundação para a Ciência e a Tecnologia, projects UID/MAT/UI0006/2013 (CEA/UL) and UID/MAT/0297/2013 (CMA/UNL) and postdoc grants SFRH/BPD/77319/2011 and SFRH/BPD/72184/2010.

References

Araújo Santos, P., Fraga Alves, M.I., Gomes, M.I. Peaks over random threshold methodology for tail index and quantile estimation. Revstat 2006;4(3):227–247.
Beirlant, J., Dierckx, G., Goegebeur, Y., Matthys, G. Tail index estimation and an exponential regression model. Extremes 1999;2:177–200.
Beirlant, J., Caeiro, F., Gomes, M.I. An overview and open research topics in the field of statistics of univariate extremes. Revstat 2012;10(1):1–31.
Bingham, N.H., Goldie, C.M., Teugels, J.L. Regular Variation. Cambridge: Cambridge University Press; 1987.
Caeiro, F., Gomes, M.I. A new class of estimators of a “scale” second order parameter. Extremes 2006;9:193–211, 2007.
Caeiro, F., Gomes, M.I. Minimum-variance reduced-bias tail index and high quantile estimation. Revstat 2008;6(1):1–20.
Caeiro, F., Gomes, M.I. A semi-parametric estimator of a shape second order parameter. In: Pacheco, A., Oliveira, M.R., Santos, R., Paulino, C.D., editors. New Advances in Statistical Modeling and Application, Studies in Theoretical and Applied Statistics, Selected Papers of the Statistical Societies. Berlin and Heidelberg: Springer-Verlag; 2014a. 137–144.
Caeiro, F., Gomes, M.I. On the bootstrap methodology for the estimation of the tail sample fraction. In: Gilli, M., Gonzalez-Rodriguez, G., Nieto-Reyes, A., editors. Proceedings of COMPSTAT 2014. Genéva, Switzerland; The International Statistical Institute/International Association for Statistical Computing 2014b. 545–552.
Caeiro, F., Gomes, M.I. Bias reduction in the estimation of a shape second-order parameter of a heavy tailed model. J. Stat Comput Simul 2015;85(17):3405–3419.
Caeiro, F., Gomes, M.I., Henriques-Rodrigues, L. Reduced-bias tail index estimators under a third order framework. Commun Stat Theory Methods 2009;38(7):1019–1040.
Caeiro, F., Gomes, M.I., Pestana, D. Direct reduction of bias of the classical Hill estimator. Revstat 2005;3(2):111–136.
Caeiro, F., Gomes, M.I., Henriques-Rodrigues, L. A location invariant probability weighted moment EVI estimator. International Journal of Computer Mathematics 2016;93(4):676–695.
Caers, J., Beirlant, J., Maes, M.A. Statistics for modeling heavy tailed distributions in geology: Part I. Methodology. Math Geol 1999;31:391–410.
Ciuperca, G., Mercadier, C. Semi-parametric estimation for heavy tailed distributions. Extremes 2010;13(1):55–87.
Danielsson, J., de Haan, L., Peng, L., de Vries, C.G. Using a bootstrap method to choose the sample fraction in tail index estimation. Journal of Multivariate Analysis 2001;76:226–248.
Davison, A., Hinkley, D.V. Bootstrap Methods and their Application. Cambridge: Cambridge University Press; 1997.
de Haan, L. Slow variation and characterization of domains of attraction. In: de Oliveira, T., editor. Statistical Extremes and Applications. Dordrecht: D. Reidel; 1984. 31–48.
de Haan, L., Ferreira, A. Extreme Value Theory: An Introduction. New York: Springer Science+Business Media, LLC; 2006.
de Haan, L., Peng, L. Comparison of tail index estimators. Stat Neerl 1998;52:60–70.
Draisma, G., de Haan, L., Peng, L., Pereira, M.T. A bootstrap-based method to achieve optimality in estimating the extreme value index. Extremes 1999;2(4):367–404.
Efron, B. Bootstrap methods: another look at the jackknife. Ann Stat 1979;7(1):1–26.
Efron, B., Tibshirani, R.J. An Introduction to the Bootstrap. Boca Raton (FL): CRC Press; 1994.
Feuerverger, A., Hall, P. Estimating a tail exponent by modelling departure from a Pareto distribution. Ann Stat 1999;27:760–781.
Fisher, R.A., Tippett, L.H.C. Limiting forms of the frequency of the largest or smallest member of a sample. Proc Cambridge Philos Soc 1928;24:180–190.
Fraga Alves, M.I., Gomes, M.I., de Haan, L. A new class of semi-parametric estimators of the second order parameter. Portugaliae Mathematica 2003;60(2):194–213.
Geluk, J., de Haan, L. Regular Variation, Extensions and Tauberian Theorems. Amsterdam, Netherlands: CWI Tract 40, Center for Mathematics and Computer Science; 1987.
Gnedenko, B.V. Sur la distribution limite du terme maximum d'une série aléatoire. Ann Math 1943;44:423–453.
Goegebeur, Y., Beirlant, J., de Wet, T. Linking Pareto-tail kernel goodness-of-fit statistics with tail index at optimal threshold and second order estimation. Revstat 2008;6(1):51–69.
Goegebeur, Y., Beirlant, J., de Wet, T. Kernel estimators for the second order parameter in extreme value statistics. J Stat Plann Inference 2010;140(9):2632–2652.
Gomes, M.I., Guillou, A. Extreme value theory and statistics of univariate extremes: a review. Int Stat Rev 2015;83(2):263–292.
Gomes, M.I., Martins, M.J. “Asymptotically unbiased” estimators of the tail index based on external estimation of the second order parameter. Extremes 2002;5(1):5–31.
Gomes, M.I., Oliveira, O. The bootstrap methodology in Statistical Extremes—choice of the optimal sample fraction. Extremes 2001;4(4):331–358.
Gomes, M.I., Pestana, D. A simple second order reduced-bias' tail index estimator. J Stat Comput Simul 2007a;77(6):487–504.
Gomes, M.I., Pestana, D. A sturdy reduced-bias extreme quantile (VaR) estimator. J Am Stat Assoc 2007b;102(477):280–292.
Gomes, M.I., Martins, M.J., Neves, M.M. Alternatives to a semi-parametric estimator of parameters of rare events—the Jackknife methodology. Extremes 2000;3(3):207–229.
Gomes, M.I., Martins, M.J., Neves, M.M. Improving second order reduced-bias tail index estimation. Revstat 2007a;5(2):177–207.
Gomes, M.I., Reiss, R.-D., Thomas, M. Reduced-bias estimation. In: Reiss, R.-D., Thomas, M., editors. Statistical Analysis of Extreme Values with Applications to Insurance, Finance, Hydrology and Other Fields. 3rd ed., Chapter 6. Basel, Boston (MA), Berlin: Birkhäuser Verlag; 2007b. p 189–204.
Gomes, M.I., Canto e Castro, L., Fraga Alves, M.I., Pestana, D. Statistics of extremes for IID data and breakthroughs in the estimation of the extreme value index: Laurens de Haan leading contributions. Extremes 2008a;11(1):3–34.
Gomes, M.I., Fraga Alves, M.I., Araújo Santos, P. PORT hill and moment estimators for heavy-tailed models. Commun Stat Simul Comput 2008b;37:1281–1306.
Gomes, M.I., de Haan, L., Henriques-Rodrigues, L. Tail index estimation for heavy-tailed models: accommodation of bias in weighted log-excesses. J R Stat Soc B 2008c;70(1):31–52.
Gomes, M.I., Pestana, D., Caeiro, F. A note on the asymptotic variance at optimal levels of a bias-corrected Hill estimator. Stat Probab Lett 2009;79:295–303.
Gomes, M.I., Henriques-Rodrigues, L., Pereira, H., Pestana, D. Tail index and second order parameters' semi-parametric estimation based on the log-excesses. J Stat Comput Simul 2010;80(6):653–666.
Gomes, M.I., Henriques-Rodrigues, L., Miranda, C. Reduced-bias location-invariant extreme value index estimation: a simulation study. Commun Stat Simul Comput 2011a;40(3):424–447.
Gomes, M.I., Mendonça, S., Pestana, D. Adaptive reduced-bias tail index and VaR estimation via the bootstrap methodology. Commun Stat Theory Methods 2011b;40(16):2946–2968.
Gomes, M.I., Figueiredo, F., Neves, M.M. Adaptive estimation of heavy right tails: resampling-based methods in action. Extremes 2012;15:463–489.
Gomes, M.I., Henriques-Rodrigues, L., Fraga Alves, M.I., Manjunath, B.G. Adaptive PORT-MVRB estimation: an empirical comparison of two heuristic algorithms. J Stat Comput Simul 2013;83(6):1129–1144.
Gomes, M.I., Henriques-Rodrigues, L., Figueiredo, F. Resampling-based methodologies in statistics of extremes: environmental and financial applications. In: Bourguignon, J.-P., Jeltsch, R., Adrega Pinto, A., Viana, M., editors. Mathematics of Planet Earth: Energy and Climate, CIM Series in Mathematical Sciences, Chapter 8. Switzerland: Springer-Verlag; 2015a. p 197–215.
Gomes, M.I., Figueiredo, F., Martins, M.J., Neves, M.M. Resampling methodologies and reliable tail estimation. S Afr Stat J 2015b;49:1–20.
Gomes, M.I., Henriques-Rodrigues, L., Manjunath, B.L. Mean-of-order-p location-invariant extreme value index estimation. Revstat 2016;14(3):273–296.
Gumbel, E.J. Statistics of Extremes. New York: Columbia University Press; 1958.
Hall, P. Using the bootstrap to estimate mean squared error and select smoothing parameter in nonparametric problems. J Multivariate Anal 1990;32:177–203.
Hill, B.M. A simple general approach to inference about the tail of a distribution. Ann Stat 1975;3:1163–1174.
Henriques-Rodrigues, L., Gomes, M.I., Fraga Alves, M.I., Neves, C. PORT estimation of a shape second-order parameter. Revstat 2014;12(3):299–328.
Henriques-Rodrigues, L., Gomes, M.I., Manjunath, B.G. Estimation of a scale second-order parameter related to the PORT methodology. Journal of Statistical Theory and Practice 2015;9(3):571–599.
Jones, M.C., Faddy, M.J. A skew extension of the $c06-math-0513$ -distribution, with applications. J R Stat Soc B 2003;65(1):159–174.
Longin, F. Le choix de la loi des rentabilités d'actifs financiers: les valeurs extrémes peuvent aider. Finance 1995;16:25–47.
Peng, L. Asymptotically unbiased estimator for the extreme-value index. Stat Probab Lett 1998;38(2):107–115.
Peng, L., Qi, Y. Estimating the first and second order parameters of a heavy tailed distribution. Aust N Z J Stat 2004;46:305–312.
Reiss, R.-D., Thomas, M. Statistical Analysis of Extreme Values with Applications to Insurance, Finance, Hydrology and Other Fields. 3rd ed. Basel, Boston (MA), Berlin: Birkhäuser Verlag; 2007.
Resnick, S. Heavy tail modelling and teletraffic data. Ann Stat 1997;25(5):1805–1869.
Weissman, I. Estimation of parameters and large quantiles based on the $c06-math-0514$ largest observations. J Am Stat Assoc 1978;73:812–815.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.