M.R. Leadbetter
Department of Statistics and Operation Research, University of North Carolina, Chapel Hill, North Carolina
I first encountered the field of extreme value theory (EVT) as a young mathematician when it had become an essentially complete and major discipline for independent, identically distributed (i.i.d.) random variables (r.v.'s) and widely used though often with seemingly little thought given to the validity of the i.i.d. assumptions. I was aware that sequential dependence of data was intrinsic to very many classic common time series situations (daily high temperatures, sea levels, stock prices) and found it fascinating that the i.i.d. theory of extreme values seemed to apply to such data without change. Interest was indeed developing in extension to dependence as a natural mathematical undertaking stimulated by corresponding central limit theory (CLT) results as I will indicate (e.g., Watson, 1954) and the landmark 1956 introduction of mixing conditions by Rosenblatt providing a general framework for discussion of long-range dependence.
In any case the time was ripe for a period of high activity by many researchers to investigate EVT under more general assumptions (particularly stationarity and Gaussian modeling). I was personally highly privileged to work with outstanding mentors and collaborators, among those seeking extension of the theory toprovide greater realism in EVT applications. It turned out that under wide conditions, the same central results were found to apply to stationary series as if the data were i.i.d., requiring just a simple adjustment of constants in the limiting distributional results for maxima and explaining the early success of the classical theory when applied to non-i.i.d. data. This was also a precursor of some of the extremal problems in financial settings which have seen tremendous developments and which are the main concern of this volume.
Our plan in this short contribution is to recall personal impressions of the development of EVT for stochastic sequences and processes from the existing i.i.d. results already in a satisfying detailed form in the 1950s. Of course extreme values have been of concern since time immemorial, for example, as observed by Tiago de Oliveira—one of the champions of EVT development and use—biblical accounts of maximum age (Methuselah) and extreme floods (Noah's ark and issues of its structural safety relying on divine guidance rather than mathematics). But formal development of what we know as classical EVT took place in the first half of the twentieth century. This primarily focused on limiting results for the distribution of the maximum of r.v.'s as , when the are assumed to be i.i.d.
The development of EVT is intertwined with that of CLT whose results motivated many of those of EVT. At the risk of possible appearance of some lack of continuity, we sketch a brief history of these two disciplines in parallel—typically alternating CLT with EVT results which they motivate. We first indicate some milestones in the early theories for i.i.d. sequences followed by the again parallel activity when dependence is introduced via stationarity. No attempt is made at completeness, and we focus only on the theory of EVT and not its applications—a reader wishing to learn both the structural theory of extremes and its use in application would be well advised to study one of a number of available excellent accounts such as the splendid volume of de Haan and Ferreira (2006).
A paper by Dodd (1923) is sometimes regarded as giving birth to EVT and primarily involves convergence in probability of for some sequence and various classes of the distribution functions (d.f.'s) of the i.i.d. r.v.'s . Its first result is that in probability where is the right end point of the d.f.'s of each (and hence also almost surely since its monotonicity implies the existence of a limit, finite or infinite). Thus has the almost sure limit . When , limits in probability for are shown for several classes of d.f. . For example, for a sequence of standard normal r.v.'s , it is shown that in probability. Also, for a sequence of Pareto r.v.'s with d.f. , in probability.
This is reminiscent of CLT where weak and strong laws of large numbers give the“degenerate” convergence of averages to with probability one. But it is found to be much more useful to consider distributional convergence of the normalized sums for appropriate constants , , where this is possible, and to determine what limits in distribution can occur and their domains of attraction.
The simplest example of such theory is of course the central limit theorem where is shown to have a standard normal distributional limit for (i.i.d.) r.v.'s having finite means and variances . This was greatly generalized (almost ad infinitum) in the study of a wide variety of “central limit” results for “array sums” where are i.i.d. for each in which the possible limits may be the class of self decomposable stable or infinitely divisible distributions.
It does not seem surprising, at least in hindsight, that the extensive CLT for sums should suggest the possibility of similar asymptotic distributional results for the maximum of i.i.d. , that is, results of the form for some constants and some distribution . This probability is clearly which is known exactly when the d.f. of each is known but changes with and may be difficult to calculate.
Following the model of CLT, obviously there would be great practical utility if one corresponded to many different 's aside from changes of normalizing constants. It was found in a series of papers (including Fréchet 1927; Fisher and Tippett 1928, and von Mises, 1936) that certain specific could be limits and in fact that they must have one of three general forms (extreme value “types”) to be limiting distributions for maxima in the sense given in the previous paragraph. These results were given a rigorous formulation and proof by Gnedenko (1943) and were refined by de Haan. This is the centerpiece of EVT and its application referred to by various names including Gnedenko's theorem, Fisher–Tippett–Gnedenko theorem, Gnedenko–de Haan theorem, and extremal types theorem (ETT). The theorem is stated as follows.
In these may be replaced by for any . In other words, the specific expressions listed are representatives of the types. Also, types II and III are really families of types, one type for each .
For each of one of these types, there will be a family of d.f.'s for which this applies as the limiting d.f. for (normalized) —referred to as the domain of attraction () for . Not all d.f.'s lead to a limiting distribution for a linearly normalized version of , (e.g., if is Poisson), that is, not all 's belong to any domain of attraction. However, most common continuous d.f.'s do belong to the domain of attraction of one of the types.
Note that the limiting distribution (2.1) for can be written as where and . The following almost trivially proved result is basic for classical EVT and a cornerstone for the natural extension when dependence is introduced.
It is seen at once from this that (2.1) holds for a given (i.e., ) and constants , if and only if
for each . In some cases for given , the search among the three types for which the previous equation holds for some constants (and hence ) is very simple. For example, for a uniform distribution on , it is immediate that giving , , a type III limit with , . On the other hand the determination of which (if any) applies for a given can be an intricate matter facilitated by domain of attraction criteria which have been developed. Our purpose here is not to review the extensive theory now available for extremes of i.i.d. r.v.'s but to indicate and motivate the extension to dependent cases with personal observation on some of its history.
One convenient view of the i.i.d. theory is that it (i) first involves result (2.2) and (ii) allows the determination of constants such that satisfies some extremal d.f. . As noted earlier success in this gives domain of attraction and much related detailed theory. Part (ii) of the activity is essentially unaltered under dependence assumptions, and hence the extension of the ETT to dependent cases depends on finding a modification to Lemma 2.1 for useful non-i.i.d. situations.
First we mention some interesting and useful implications of the choice of constants to satisfy . Regarding as a “level,” we say that has an exceedance of if . This clearly implies that the mean number of exceedances converges to the value . Further if the are i.i.d., then the events are independent in and have probability so that the number of exceedances of for is binomial in distribution, , which converges as to a Poisson r.v. with mean .
It is useful to regard the exceedance points as a point process: a series of events occurring in “time.” For this it is more convenient to normalize by the factor and consider the exceedance point process to be the points for which , . The points of all lie in the unit interval , and for any set , is the number of normalized points in the set , namely, the number of points , , for which . This is a point process on the “space” [0,1], consisting of (no more than ) normalized exceedance points and is simply shown to converge in distribution to a Poisson process with intensity on in the full sense of point process convergence. In particular this means that for any Borel set and corresponding joint distributional statements for for Borel subsets of . If the are disjoint, then the limits are independent Poisson r.v.'s with means where is the Lebesgue measure of .
Note that the probability may be written in terms of as . Similarly is just where is the th largest of , (the th order statistic). The use of the previous Poisson convergence of to with immediately gives the limiting distribution for , modifying (2.1) to read
with the same constants , , and d.f. as in (2.1). This shows one of the many uses of the point process in classical EVT. We will see later the interesting way this is modified to accommodate dependence.
As indicated earlier, i.i.d. theory for maxima followed similar patterns to those established in CLT—replacing the convolution for the d.f. of by the power for that of . This potentially simplifies the theory for maxima, but the situation is reversed for transforms where, for example, the characteristic function for the sum is the th power of that for each . In both cases one standard method of including dependence is to make use of the i.i.d. theory by restricting the dependence between two separated groups of in some way. In describing the principles we assume strict stationarity of the sequence —thus introducing dependence between the but leaving them identically distributed.
This originated from a suggestion of Markov (discussed in Bernstein, 1927) to the effect that one expects a CLT to hold if the r.v.'s of the sequence behave more like independent r.v.'s the more they are separated. Specifically, Bernstein introduced the very useful device of dividing the integers into alternating “big blocks” and “small blocks” of respective sizes , such that and . Under specific dependency conditions, he showed that the sums of the over each big block are approximately independent giving a normal limit for their sum, whereas the sum over all small blocks is small by comparison and hence may be discarded in the limit. In this way it is shown (albeit under complex conditions) that the CLT can hold under dependence assumptions.
Later Hoeffding and Robbins (1948) showed that this result holds for -dependent processes—a statistically useful class—under certain very simple conditions by using the block method with big blocks of length - alternating with small blocks of length , for some . Thus the groups of in two different big blocks are independent, and the classical CLT may be applied to their sums. Then showing that the total normalized sum from small blocks tends to zero in probability gives the desired CLT. The proof is straightforward and even simpler if stationarity is assumed.
The previous method of Bernstein was given considerable generality by Rosenblatt (1956) with the formal introduction of a hierarchy of the so-called “mixing conditions” differing in the degrees of dependence restrictions. The most used of these is strong mixing satisfied by a sequence if for some as when , ) (the -fields generated by past and future by the indicated r.v.'s for any and ). That is, any event based on the past up to time is “nearly independent” of any event based on the future from time onwards.
Rosenblatt obtained a CLT using Bernstein's method and strong mixing as its dependence assumption, initiating significant activity in that area (see, e.g., Ibragimov and Linnik, 1971; Bradley, 2007). In some cases strong mixing can be readily checked, for example, a stationary Gaussian sequence with continuous spectral density having no zeros on the unit circle—Ibragimov and Linnik (1971), Theorem 17.3.3. But in general it may be very difficult or impossible, and it has been suggested by a Swedish colleague that to start a theorem with “Let be a strongly mixing sequence” seems to be essentially assuming what one wants to prove! Nevertheless even if strong mixing cannot be fully verified, it may still be a reasonable assumption in useful cases.
We turn now from this tour of CLT history to the corresponding EVT it motivated. Perhaps the earliest result for dependent EVT was a paper by Watson (1954) generalizing the early paper of Dodd applicable to i.i.d. r.v.'s described earlier to -dependent sequences. In this it is shown that the basic lemma 2.2 of the i.i.d. theory holds for stationary -dependent sequences . This result was motivated by the paper of Hoeffding and Robbins (1948), showing the CLT under -dependence as discussed earlier.
Watson's result was straightforward probability calculations with a simple form of Bernstein's method. He obtains the basic Lemma 2.2 but does not discuss detailed extremal forms under linear normalization. However, it is readily shown that the limits for the maximum in this case are the same as would apply if the were independent with the same marginal d.f. as the stationary -dependent sequence. In fact this holds for any identically distributed sequence for which the basic lemma holds, regardless of the dependence structure as the following result holds. We term this a “proposition” at the risk of inflating its importance.
The basic lemma was proved for i.i.d. sequences, but as noted above it was shown by Watson to apply to stationary -dependent sequences. It also applies to other cases with strongly restricted dependence—for example, stationary normal sequences with correlations satisfying Berman's Condition to be discussed next indicating low correlations at large separations. One may thus conjecture that the basic lemma applies to sequences which are in some sense “close to being i.i.d.” One way of making this precise is to note that for i.i.d. sequences, exceedances of a high level tend to occur singly and not in clusters, whereas for significant (positive) dependence one high value will tend to be followed by another, initiating a cluster. For many stationary sequences the limiting mean number of exceedances in a cluster is a parameter which we denote by , and for i.i.d. sequences as well as “nearly i.i.d.” sequences such as stationary normal sequences satisfying Berman's condition stated above.
Another special class of sequences is considered by Berman (1962) in which the r.v.'s are assumed to be exchangeable and the possible limits for the maximum obtained. That paper also considers the classical i.i.d. framework but where a random number of terms are involved.
Berman is perhaps most recognized for his work on maxima of Gaussian sequences and continuous time processes. He shows (Berman, 1964) that for a standard stationary Gaussian sequence with correlation sequence satisfying , the maximum has a type I limit where , the same constants that apply to i.i.d. standard normal r.v.'s. This condition gives a sufficient condition for the limit, and while not necessary, it is close to being so, and known weaker sufficient conditions only differ slightly from it. As indicated earlier stationary Gaussian sequence satisfying Berman's condition exhibits no clustering and satisfies the basic lemma even though not i.i.d.
For more general stationary processes as noted earlier, Rosenblatt (1956) introduced the concept of strong mixing and used it in discussion of the CLT. Loynes (1965) used the strong mixing (albeit referred to there as “uniform mixing”) assumption in developing EVT for stationary sequences—including the ETT. He also gave a version of the extension of the basic i.i.d. result iff in which under strong mixing the limit is replaced by for some , , the parameter referred to earlier in the context of clustering ( mean cluster size). This foreshadowed the use of the parameter as the “extremal index” (EI) under weaker conditions than strong mixing. As discussed later, this provides a simple and natural link between the limiting distribution for maxima under i.i.d. assumptions and under stationarity.
In attempting to weaken the strong mixing condition for EVT, one notes that the events of interest for extremes are typically those of the form or their finite intersections. For example, the event is just . Hence it is natural to attempt to restrict the events and in strong mixing to have the form , where the indices are separated by some from the 's. For a level , note that , the joint d.f. of with all arguments equal to and similarly for and . This leads to the following weak dependence condition introduced in Leadbetter (1974) (see also Leadbetter et al., 1983). The stationary sequence is said to satisfy the condition for a sequence if for any choice of integers , ,
where as for some .
The ETT holds for a stationary sequence satisfying for appropriate . Specifically if converges to a nondegenerate and holds for , each real , then is one of the three extreme value types. This of course includes the result of Loynes under strong mixing which clearly implies .
The basic lemma however does not hold as stated under but may be modified in a very simple and useful way to relate limits under to those for i.i.d. sequences. Specifically, with a slight abuse of notation, write to denote a sequence such that as (which exists under wide conditions—certainly if is continuous). Then if converges for one , it may be shown to converge for all (e.g., Leadbetter et al., 1983) and for all and some fixed , . We term the “Extremal Index (EI)”. From the basic lemma it takes the value 1 for i.i.d. sequences and for some dependent sequences including -dependent stationary sequences and stationary normal sequences under Berman's conditions.
If is a stationary sequence, write for a sequence of i.i.d. r.v.'s with the same marginal d.f. F as each . has been termed “the independent sequence associated with the stationary sequence ” (Loynes, 1965; Leadbetter et al., 1983). Now if , then by the basic lemma if . If holds and has EI , then as earlier .
In particular if , then if holds for for each . That is, if has the normalized limit , has the limit with the same normalizing constants. For each extreme value d.f. , is easily seen to be of the same extremal type as , and indeed by a simple change of normalizing constants, it follows that for some , . Hence under assumptions the normalized maximum for the stationary sequence has a limiting distribution if (and only if) it would if the were independent, with the same distribution. Further, the form of the limit in the stationary case is trivially determined from the i.i.d. limit , either as with the same normalizing constants or as itself by a change of normalizers.
Finally in our personal tour of the development of EVT under dependence, we return to the discussion of exceedances of a level normalized to occur on the unit interval as the points for which . As already indicated these form a point process on which converge to a Poisson process with intensity if the are i.i.d. When the form a stationary sequence satisfying with and having EI , the exceedance points tend to coalesce in groups to become clusters, the locations of which form a Poisson process with intensity in the limit. The limiting cluster sizes cause multiple events in the point process which converges to a “compound Poisson process” if the dependence restriction is strengthened in a natural and modest way (see, e.g., Hsing, 1987). For the limiting point process is Poisson as discussed earlier. Other related Poisson processes are of considerable interest in addition to that of exceedances and of their locations. For example, the point process of sums of values in a cluster or the maximum values in a cluster are of interest, the latter generalizing the popular “peaks over thresholds” notions used in classical i.i.d. theory, with typically compound Poisson limits.
We have focused in our tour on some milestones in the historical development of EVT in its classical results for i.i.d. sequences and the evolution of the natural extensions to dependent (stationary) cases. These are more realistic since, for example, temporal data is almost always correlated in time at least at some smallspacing. We have not discussed statistical analysis at all—methods for which abound and are documented in many publications and books. But it should be noted that the recognition of the EI and the (extended) basic lemma can really facilitate the application of inference for i.i.d. situations to, for example, stationary sequences. As a simple example one traditional way of fitting an extremal distribution from a series of observed maxima is to graphically compare the empirical distribution with each EV type. For example, if is type 1, , then and so may be chosen by linear regression of on . If the fit of is good, one concludes that is the appropriate choice of extremal type and can estimate the normalizing constants by linear regression. This procedure is valid for a stationary sequence with some EI (its extremal limit is of the same type as ). One cannot therefore differentiate between stationarity and independence but can in either case hope to determine the correct limiting type and use the regression to estimate the normalizing constants giving the limit in standard form. It is thus by no means a test for stationarity but makes the method of determination of extremal type (and constants) valid whether or not the data is i.i.d. or stationary. This may account for success in determining extremal types for data by applying i.i.d. methods to (perhaps clearly) correlated data before the advent of the dependent theory!
In the foregoing we have focused on extremes in sequences (i.i.d. and stationary) which are traditionally basic for very many applications. However (stationary) processes in continuous time also have significant applications—for example, in continuous monitoring of values of a pollutant for environmental regulation. Some such cases may be approximated by high-frequency sampling to give a discrete series, but the consideration of continuous parameters can be natural and helpful.
In fact much of the continuous parameter theory parallels that for sequences at least under stationarity. For example, let where is a stationary process on . Then under weak dependence restrictions (akin to ), the ETT holds: If has a limit (for some , and nondegenerate ), then must be one of the EV types. If is stationary and Gaussian with correlation function , then the previous limiting distribution is of type 1, under the weak dependence restriction of Berman, as . This is entirely analogous to the sequence case described previously.
For a continuous parameter process , of course exceedances of a level occur in ranges rather than discrete points and hence do not form a point process. However, the closely related “upcrossings” of (points at which ), but for and for when is sufficiently close to , do form a useful point process.
Analogous (e.g., Poisson) results hold under appropriate conditions to those for exceedance in the sequence case described earlier with close connections to maxima. For example, if and only if either or has at least one upcrossing of in . A systematic study of upcrossings was initiated by the pioneering electrical engineer Rice (see, e.g., Rice, 1944) and is important for assisting with obtaining asymptotic distributional properties of but also in many other engineering applications. For example, the intensity of upcrossings (expected number per unit time) of ozone levels is of real interest in environmental (tropospheric) ozone regulation. Discussions of issues regarding maxima and level crossings by stationary stochastic processes may be found, for example, in Cramér and Leadbetter (1967) and Leadbetter et al. (1983) as well as other references cited.
A well developed useful theory for a class of one-dimensional problems of any kind often attracts interest in extensions to higher dimensions. Sometimes such extensions are not obviously useful and done because they are “there for the taking” and sometimes are too intricate, requiring too much effort in calculation, but often can lead to new and interesting theories which are not just obvious extensions of the one-dimensional case.
For a stochastic process or sequence , there are two obvious forms of introducing multidimensional versions of results in one dimension. One is to consider a finite family (vector) , for example, if , and may be the gross national products of China, the United States and Russia in year , to compare economies over a period of years.
There is a huge literature on the study of the vector of maxima , , known as multivariate EVT (see, e.g., de Haan and Ferreira, 2006). This does not yield the simple classification of possible limit distributions into the three forms as in one dimension but does give useful and interesting classification methods regarding families of possible limits.
The other extension of the classical theory to higher dimensions is to consider r.v.'s indexed by multidimensional parameters, for example, , a r.v. measured at a point of the plane with coordinates for, for example, , (a square area) or a discrete version for say. Such an (or ) is termed a random field (r.f.). A simple example is where is the coordinate location of a point on a map with -coordinate and -coordinate . may be measured levels at that location at a specified time, and one is interested in , that is, the maximum level at locations in the square area with and coordinates no more than .
A regulating agency, for example, may be interested in modeling the distribution of this maximum in an area (e.g., a county) in which measurements are made to determine compliance with environmental standards. Trends over time may be assessed by introducing a further (time) parameter to define a “spatio temporal” r.f. at spatial location and time .
In one dimension conditions such as or strong mixing really assert a degree of independence between past and future of a sequence or process . But in two dimensions there is no natural ordering of the pairs of points and so no natural definition of past and future. One can of course limit dependence between in regions separated by large distances, but this is far too restrictive and can require the terms to be almost independent for different points.
A promising approach is to not seek a single condition based on some measure of separation of two sets but rather require a “” type of condition applied sequentially in each coordinate direction, taking advantage of each past–future structure. This is explored in Leadbetter and Rootzen (1998) where, for example, an ETT is shown.
It is interesting to note the continuing intertwining of EVT and CLT, for example, the CLT result of Bolthausen (1982).
There are also many structural results even for i.i.d. situations which are important for inference but not included in our sketch of development. For example, we have barely referred to order statistics (of any kind—“extremal,” “central,” “intermediate”), associated point processes (e.g., exceedances of several levels), exceedance, and related point processes as marked point processes in the plane. See also Hsing (1987) for general results relevant to a number of these topics. In this chapter there has been no attempt to review the growing literature involving specific models for economic extremes. But the general methodology described earlier complements these, as should be clear. Economic data is certainly dependent and if not stationary can often be split into periods of stationarity which can be separately studied. High exceedances are certainly of considerable interest and can potentially be used to test for reality of changes, for example, of stock price levels. Study of exceedance clustering may well give insight into underlying causative mechanisms.
Our selection of papers considered has not been on the basis of mathematical depth or topic importance though some are universally considered to be pathbreaking. Rather we have attempted to give sign posts in the development of EVT from its i.i.d. beginnings, its pathway to and through stationarity, and its parallels with CLT from our own personal perspective.