In this chapter we deal with models that are useful for describing individual and collective claim size data and physical measurement data of natural hazards such as earthquakes and floods. Because we want to give special attention to the modelling of large claims we provide some general background on this topic. In what follows we will try to distinguish between claims that are considered small and claims that are large. It is intuitively clear that reinsurance contracts will depend heavily on whether or not the individual claims should be considered large.
One of the main reasons for taking reinsurance is the possible appearance of large claims. While this sounds like an obvious statement, a useful and acceptable definition of what is meant by a large claim is far from obvious. Among the possible examples of distributions, some are better suited to model large claims than others. In view of future applications to actuarial topics one definitely needs a way of making a difference between average claims and large claims. An acceptable guideline is to compare claim distributions with the exponential distribution. In some sense the exponential distribution acts as a splitting distribution between small and large. A first and rather vague criterion would be to check whether the claim distribution under consideration has a fatter tail than the exponential distribution or not. If it does not, then we could call the distribution super‐exponential; in the alternative case we might call the distribution sub‐exponential but unfortunately this term has already been standardized in the probabilistic literature.
We will call a distribution F super‐exponential if 1 − F(x) is bounded above by a decreasing exponential. A more quantifiable definition can be given in terms of , the Laplace transform of F, in that has a strictly negative abscissa of convergence σF. Following Taylor [731] we will indeed conclude that the super‐exponential class of distributions is a reliable family of light‐tailed claim size distributions. The exponential distribution itself also satisfies this criterion of super‐exponentiality. The aggregate claim distribution (but also other risk quantities such as ruin probabilities) often exhibit exact or approximate exponential behavior when the underlying claim size distribution is super‐exponential.
For a distribution F with a fatter tail than any exponential the abscissa of convergence σF of will be zero. Unfortunately this property is not sufficiently specific to be useful as a definition. Instead one uses some classes of distributions with σF = 0 that have a bit more structure. In particular we will deal later with sub‐exponential distributions, Pareto‐type distributions, extreme value distributions, etc. If we take the claim size distribution from such a class, the corresponding aggregate risk and ruin quantities will show no trace of exponential behavior.
Before we offer candidates for claim size distributions, we need to remind the reader that one of our main objectives is to provide adequate models for large claims. This section contains a few thoughts on what we might righteously call a large claim and how one can perhaps distinguish it from others. For an early general discussion on the role of large claims see the summary report by Albrecht [32]. For an attempt to define the even more difficult concept of a catastrophic claim, see Ajne et al. [10].
Consider all claims {X1, X2, …, Xn} related to a specific portfolio. Let be the total claim amount and consider the maximum value Xn, n. Under which conditions should we consider this largest claim to be actually large? More generally, which of the extreme order statistics could be considered to be large? For an attempt to define large, see Teugels [738]. Beirlant et al. [105] put large claims into a statistical and actuarial context. Here are some interpretations, the first two theoretically inspired.
The last equivalence follows from
The relation (3.2.1) is precisely the definition that F belongs to the class S of sub‐exponential distributions. A property of S, proved by Chistyakov [211], is that if F∈S then for any non‐negative integer n, as ,
In the sub‐exponential case, the tail 1 − F*n is (up to the quantity n) just as heavy as that of 1 − F.
Members of S are automatically members of the class of long‐tailed distributions1 denoted by L, which means that for all
The class S and its subclasses have been constantly used as candidates for claim size distributions with a heavy tail but also in other probabilistic contexts like branching processes, queueing theory, etc. The major drawback of S is that it is only defined in terms of a limiting property, which is hard to verify in practice, and that it is defined by a non‐parametric condition. Up to now the sub‐exponential class has defied a representation. Actually, it remains a challenging problem to decide whether a set of actuarial data comes from a sub‐exponential distribution or not. Further, it is known that S is not closed under convolution or under convex combinations. Cline and Samorodnitsky [195] have shown that nevertheless large subclasses of S are closed under product operations (see also the work by Rosinski et al. [657]). A variety of sufficient conditions for membership of S can be found in the literature, for example see Pitman [622], Teugels [737], Klüppelberg [493], Pinelis [619], Smith [704], and Foss et al. [356] for a recent survey. Refinements are available in Chover et al. [212] and Willekens [779].
Henceforth, practitioners avoid the class as such, going for distributions that are sub‐exponential but that at the same time contain enough parameters. Only then will they be able to use data. Among the many parametrized examples in S we mention the log‐normal distribution, the Pareto distribution and Pareto‐type distributions to be defined next, as well as non‐normal stable distributions.
are the only possible limits in (3.2.3), with called the extreme value index (EVI). It has to be understood that Gγ has to be a proper distribution. This means in particular that the range of Gγ extends over the interval if γ > 0 (the Fréchet–Pareto case), over if γ < 0 (the extremal Weibull case), or over if γ = 0 (the Gumbel case).
For any specific γ, the max‐domain of attraction, containing the distributions F for which there exist sequences an > 0 and bn such that (3.2.3) holds, have also been described. The class of distributions in the max‐domain of attraction can be defined in terms of the tail quantile function. Given that F is a distribution, its quantile function is defined by the inverse function
The tail quantile functionis defined and denoted by
The following condition is a necessary and sufficient condition for the existence of normalizing and centering constants for the weak convergence (3.2.3) of the maximum of a sample from the distribution F.
Here the restriction to u ≥ 1 can be broadened to u > 0. It follows from the general theory of regularly varying functions that a is automatically regularly varying at infinity with index γ: a(x) = xγℓ(x) where ℓ is slowly varying, that is, a measurable and ultimately positive function that satisfies
for all t > 0. For more information see Bingham et al. [135].
It has been shown by de Haan in [257] that the above definition can be equivalently stated in terms of the original distribution. The alternative condition is
for all u such that 1 + γu > 0 and where h ∘ U = a. For γ = 0 we read . The above relation holds locally uniformly in u. Furthermore if γ > 0, then as . However, if γ < 0, then F has a finite upper limit and . The limit distribution in (3.2.6), that is,
is called the generalized Pareto distribution (GPD).
In case γ > 0, the class Cγ equals the class of Pareto‐type distributionsdefined by
where α = 1/γ > 0 and ℓ is slowly varying. Note that 1 − F in (3.2.7) is regularly varying with index − α.
When γ ≤ 0, the underlying X has a tail that is lighter than Pareto‐type distributions. In case γ = 0 the tail of the distribution of X can have a finite endpoint or infinite support.
Seemingly the first attempt to model large claims with a parameterized distribution is due to Benckert et al. [109]. Here the authors assume that the claim size distribution starts out as a Pareto distribution, this means that for large x, 1 − F(x) ∼ c x−α for some positive α. The distribution is then “cut off” at the point corresponding to the sum insured, in which the remaining mass of the Pareto distribution is concentrated. This then yields a model with negative γ. A bit later, Benktander [110] pointed out how the Pareto distribution itself (or its variants) could be used to model large claims. In particular he considered the Pareto class as a dividing class between claim size distributions for which all moments are finite and those for which most moments diverge. Pareto‐type distributions have always been popular, for instance when modelling fire, storm and liability data, as will be illustrated in Chapter 4. For a survey of extreme value theory and relevant references, see Embrechts et al. [329], Beirlant et al. [100], and de Haan and Ferreira [258].
Before continuing we need to stress the difference between a large claim and an outlier. While the first is a genuine member of the sample of claim sizes, an outlier is considered an extraneous value. Next to clear misprints, events can occur which are completely unexpected in view of all data before such an event. Using methods from extreme value analysis (EVA) one can estimate how unlikely certain events are in view of all prior information. When events with an extremely low likelihood do occur, however, one has to be ready to change the statistical models.
In this section we state the traditional examples of claim size distributions that are commonly considered in the actuarial literature. Some of these examples are simple while others are more elaborated variations. For other surveys of common claim size distributions, see Kupper [517], Ammeter [40] and Klugman et al. [491].
In many cases, distributions can be derived from a simple original by a transformation. Among the most popular are the following:
Note that such transformations may dramatically change the tail behavior of the distribution.
Because of the importance of extreme values in reinsurance, the extreme value distribution Gγ, , and the generalized Pareto distribution are important candidates for modelling purposes in view of the limit results (3.2.3) and (3.2.6). The sets of extreme value distributions and generalized Pareto distributions are one‐parameter families of distributions ranging from light tails with a finite endpoint (with γ ≤ 0), up to Pareto‐type tails (when γ > 0). Applying a normalization to these families we obtain the location‐scale versions with and σ > 0:
and
where . The latter distribution has been used to model aggregate claim distributions in McNeil [567]. Condition (3.2.6) leads to the popular peaks‐over‐threshold (POT) approach in EVA, as discussed in Chapter 4.
We now list a number of examples of models with tails that are exponentially bounded and then turn to tails heavier than exponential.
A classical example of a light‐tailed distribution with finite endpount x+ is given by the beta distribution with x+ = 1 and distribution function
with extreme value index γ = −1/q (here B(p, q) = Γ(p)Γ(q)/Γ(p + q) denotes the beta function). The uniform distribution on (0, 1) is of course a special case with p = q = 1. This is then a possible model for loss degree data. Beta distributions can be constructed starting from a Pareto‐type random variable Y (cf. (3.2.7)) through the transformation X = x+ − 1/Y leading to an extreme value index γ = −1/α:
Another way to produce a light tail with finite endpoint from a heavy‐tailed distribution W is by conditioning on W < T for some value T:
Such an operation is called here upper‐truncation.2 A first reference in this respect is Benckert et al. [109]. See Clark [217] for a reference in enterprise risk management.
With T fixed, one can show that X is then light tailed with EVI γ = −1. When modelling large claims it appears appropriate to consider T sufficiently large, possibly with the meaning of a sum insured. Another example is found in the Gutenberg–Richter model for earthquake magnitudes, as will be discussed when treating earthquake data in Chapter 4.
Many other special Box–Cox forms are available. We mention here the transformed gamma distribution, obtained from the gamma distribution via a power transformation. We find
a distribution with three parameters.
When there is good reason to believe that a claim comes from one of several different risk classes and for each of these classes one has a good idea about the claim size distribution, then a mixing distribution will be a natural model. In this context, mixtures of Erlang distributions are very popular in claims modelling, for example see Willmot and Woo [790]. Such mixed Erlang distributions are used in Chapter 4 to produce global fits in combination with separate tail fits. A popular, tractable and more general class of super‐exponential type in such a probabilistic construction context are phase‐type distributions (see Bladt and Nielsen [139], [57, Ch. IX] and Asmussen et al. [65] for the statistical perspective). For a recent variant of infinite‐dimensional phase‐type distributions with finitely many parameters leading to a heavy‐tailed distribution, see Bladt et al. [140].
Hence σF = −β/2 and the distribution is super‐exponential. For further properties see Embrechts [323]. The closedness under convolution makes this distribution an interesting candidate for claim size modelling, probably Seal [692] was the first to consider it for this purpose. Later applications can be found in Gendron et al. [380], ter Berg [736] and Mack [555].
This class corresponds to the Pareto‐type distributions as defined in (3.2.7).
Hogg and Klugman [449] have suggested the log t‐distribution by applying first a logarithmic transformation to be followed by a normalization. There results a density with slightly more general parameters
As a special case one finds the one‐sided Cauchy distribution with density
The list of distributions discussed above is summarized in Table 3.1, where the models are ordered from light to heavy classes, mentioning the sign of γ. For γ = 0 we also indicate if the tail is HTE or LTE.
Sign of γ | Distribution | 1 − F(x) | (x−, x+) |
γ < 0 | Beta | (0, 1) | |
Reversed Burr | βα(β + (x+ − x)−τ)−α | (0, x+) | |
Upper‐truncated Pareto | (x0, T) | ||
γ = 0, LTE | Second Benktander | , b > 1 | |
Weibull | , τ > 1 | ||
γ = 0 | Inverse Gaussian | ||
Gamma | |||
Exponential | e −λx | ||
γ = 0, HTE | Second Benktander | , 0 < b < 1 | |
Weibull | , τ < 1 | ||
Log‐normal | |||
Quasi‐log‐normal | |||
γ > 0 | Strict Pareto | (x/x0)−α | |
GPD | |||
Burr | βα(β + xτ)−α | ||
Fréchet | |||
Log‐gamma | |||
One‐sided t | |||
GB2 | |||
First Benktander |
Under an unlimited XL treaty with retention u, the expected amount to be paid by the reinsurer is given by , where e(u) is the mean excess amount
Assuming , the mean excess function or mean residual life function e is well defined, and its calculation for a random variable with tail function starts from the formula
On the other hand, the distribution function F can also be calculated from e if it exists:
In fact the first Benktander distribution was derived by applying (3.4.11) to the mean excess function .
Thanks to its memoryloss property, the exponential distribution plays a central role when using e:
When the tail of the distribution of X is HTE, then we find that the mean excess function ultimately increases while for LTE tails e ultimately decreases. For example, for the Weibull distribution we obtain as
yielding an ultimately decreasing (respectively increasing) e in case τ > 1 (respectively τ < 1). In the case of a Pareto‐type distribution the function e ultimately has a linearly increasing behavior since when α > 1
Distributions with a finite endpoint x+ show a mean excess function that ultimately decreases and e(x+) = 0.
Hence the mean excess function can play an important role in deciding for a HTE tail. This will be exploited in Chapter 4.
A good fit of the severity model over the entire range of loss sizes, from the many smaller to the few large ones, is essential in many practical situations. The traditional models listed above are often not able to capture the entire severity range. If one is restricted to the very large losses, the Pareto‐like distributions frequently will be the best choice, but these heavy‐tailed distributions rarely have the right shape to fit well below the tail area. One way to deal with this problem is by splicing a tail fit to the right of some large threshold t, with a model which fits the bulk of the data. The basic idea here is to stick pieces of two (or more) different models together. This fits in with mixing models where, as in a classical actuarial collective model, different processes f1, …, fm act on different contracts with proportions p1, …, pm () so that
Splicing concerns a specific kind of mixing reflecting that insurance data exhibit different statistical behavior over some subintervals of the outcome set of loss amounts due to different scrutinies. An m‐component spliced distribution then has a density expressed as
with πj > 0 and , where fj, respectively Fj, (j = 1, …, m) denote densities and distribution functions of random variables. Restrictions on the parameters can be imposed, requiring continuity, or even differentiability, of the density f at the junction points c1, …, cm−1.
Several splicing models using m = 2 components have recently been proposed. Motivated by the methods from EVA, Beirlant et al. [100, Sec. 6.2.4] proposed a composite exponential Pareto model for a motor insurance data set of the type
where k is the number of extremes referring to the number of exceedances above an appropriate threshold t.
An alternative version based on EVA developed in the next chapter consists of splicing a generalized Pareto distribution with a bulk model:
where F1 is the distribution function of an appropriately chosen distribution for the modal part of a loss distribution. If F1 is chosen to have a continuous density f1, the density of (3.5.14) is given by
Lee et al. [531] considered a mixture of two exponentials
Cooray and Ananda [226] proposed a composite log‐normal Pareto model, which was suitably modified by Scollnik [687]. Scollnik and Sun [688] considered spliced Weibull–Pareto models, while Calderín‐Ojeda and Kwok [180] also introduce splicing log‐normal and Weibull models with a tail model. In Fackler [343] a classification of potential combinations for small and large losses is considered. Miljkovic and Grün [578] is another recent reference on this topic.
Following Scollnik [687], consider as an example splicing a log‐normal distribution with density function
and a Pareto distribution with density
The density of the composite model is then given by
with .
Some authors require smoothness at t. When splicing a log‐normal and a Pareto distribution, imposing continuity at t leads to
while differentiability at t leads to τ = ασ. Smoothness reduces parameters, which is appropriate in case data are scarce. On the other hand it links the geometries of the body and tail fits, reducing the flexibility the splicing is trying to offer.
Pigeon and Denuit [618] considered a mixed composite log‐normal Pareto model, where one assumes that every observation Xi may have its own threshold c1i (i = 1, …, n), which are realizations of some non‐negative random variable Θ. More specifically the case with Θ being gamma distributed was worked out in detail by these authors.
The max‐domain of attraction in the multivariate case has been worked out in detail for marginal ordering: for d‐dimensional vectors x = (x1, …, xd) and y = (y1, …, yd) the relation x ≤ y is defined as xj ≤ yj, j = 1, …, d. Moreover we use the notations xy = (x1y1, …, xdyd), and x + y = (x1 + y1, …, xd + yd). Considering a sample of d‐dimensional observations Xi = (Xi, 1, …, Xi, d) (i = 1, …, n), we denote the sample maximum by Mn with components
The distribution function of Mn of an independent sample X1, …, Xn from a distribution function is given by
As in the univariate case, one needs to normalize Mn in order to obtain a non‐trivial limit distribution as . The domain of attraction problem is then concerned with finding sequences an > 0 = (0, …, 0) and bn such that there exists a d‐variate distribution function G for which
Again, as in the univariate case, we say that F is in the max‐domain of attraction of G, and G is called a (multivariate) extreme value distribution.
Let Fj and Gj denote the jth marginal distribution functions of F and G, respectively. Then one easily derives from (3.6.15) that for j = 1, …, d
that is, Gj itself is a univariate extreme value distribution and Fj is in its domain of attraction. Below we will use the following general parametrization of Gj:
with γj the EVI for the jth margin.
Also, the notion of max‐stability of G carries over from the univariate case, that is,
for any positive integer k and , with vectors αk > 0 and βk with and as .
An extreme value distribution function G can be reconstructed from its margins and its stable tail dependence function (STDF) l. This function is defined as
with Qj the quantile function of the jth margin of G (j = 1, …, d). One then gets
The expression for an extreme value copula
then follows:
Note that the STDF describes the dependence between the components after transforming the margins to a standard exponential distribution, which is in contrast to the use of copulas where the margins are transformed to uniform (0, 1) distributions.
A STDF l has the following properties:
On the basis of (L1) it follows that an extreme value copula satisfies
The upper and lower bounds in (L3) are themselves STDFs: the lower bound corresponds to complete dependence , whereas the upper bound corresponds to independence G(x) = G1(x1)…Gd(xd).
Finally we note that properties (L1) to (L4) do not characterize the class of STDFs, that is, a function l that satisfies (L1)–(L4) is not necessarily an STDF.
Classical examples of bivarate STDFs are
If F is in the max‐domain of attraction of G, then
or, as ,
from which
Setting now (j = 1, …, d) and approximating by 1 − vj/n, we find that a multivariate distribution function F is in the max‐domain of attraction of an extreme value distribution with STDF l if the tail dependence function 1 − F(Q1(1 − v1), …, Qd(1 − vd)) converges in the following way to the STDF l of G:
which can be rewritten as
or, when using the corresponding copula:
For more details concerning multivariate extreme value theory see Chapter 8 in Beirlant et al. [100].
Copulas and stable tail dependence functions which describe the dependence between the components are infinite‐dimensional objects and therefore not always easy to handle. One can restrict to a parametric model, such as a logistic model, but alternatively one can summarize the main properties of the dependence structure in a number of well‐chosen dependence coefficients. We restrict the list here to the bivariate case.
As in the univariate case, the domain of attraction condition (3.6.15) can be cast in terms of exceedances over a high threshold. The event is called an exceedance over the (multivariate) threshold t. This means that there is at least one coordinate variable Xj that exceeds the corresponding threshold tj, although the precise coordinate where this happens remains unspecified. We are then interested in the asymptotic distribution of the excess vector conditionally on , as . It was shown, for example Beirlant et al. [100] or Rootzén and Tajvidi [654], that if as , and 0 < G(0) < 1,
as , where denotes the lower endpoint of Gj. H is then the distribution function of the multivariate generalized Pareto distribution.
Based on (3.6.24), (3.6.17) and (3.6.19) we then obtain, when αj > γjμj, j = 1, …, d, that
Setting σj = αj − γjμj and , we arrive at
for such that σ + γx > 0. Finally, when x ≥0 we obtain that
and (j = 1, …, d). Further note that with properties (L1) and (L2) for STDFs, we obtain in case x ≥0
with . Imposing the constraint l(ζ1, …, ζd) = 1 we then have
For further properties concerning multivariate generalized Pareto distributions see Rootzén and Tajvidi [654] and Kiriliouk et al. [488].