Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

4
Statistics for Claim Sizes

If there is any suspicion of heavy‐tailed distributions, then it is advisable that the actuary should make a number of different data plots. Modelling of large claims is quite an uncertain undertaking, and hence the more graphs considered the better in order to make a balanced conclusion.

As a baseline distribution one might depart from the exponential distribution and inspect for HTE tails. If the right tail of the distribution is obviously heavier than any exponential distribution, then Weibull, log‐normal or Pareto quantile plots offer potential improvements. Such a first step can be performed using different kinds of quantile plots (exponential, log‐normal, Weibull or Pareto) and their derivative plots.

After this large claims modelling using extreme value methodology comes into play. Here the maximum likelihood methodology applied to the peaks over threshold (POT) approach plays the central role. We also emphasize methods based on quantile plotting in order to allow for graphical validation of the models and results. We first discuss the classical case of independence and identically distributed data, followed by regression settings, censored and multivariate data. In reinsurance, the development of large claims can take several years. When evaluating a portfolio, not all the claims are fully developed and the indexed payments at the last available development year are an underestimation of the real final indexed payment. When historical incurred information per claim is available, this should assist in the estimation of the tails of the payment distribution.

It remains desirable to construct a distribution with an appropriate tail fit but which at the same time has enough parameters to fit also in the medium range. An early reference here is Albrecht [32], who pointed out that claim size data are often well described by a Pareto distribution for large claims, while the log‐normal distribution provides a good fit for medium‐sized claims. For a general review on the construction of mixture models with tail components, see Scarott and MacDonald [667]. Here we discuss the method of splicing different distributions in more detail, and in particular we propose combining a mixed Erlang distribution with a tail fit.

All of this material will be illustrated using the data sets introduced in Chapter 1. While the automobile liability data and the Dutch fire insurance data will be used throughout, we end the chapter by analysing the Austrian storm risk, European flood risk data, the Groningen earthquake data, and the Danish fire insurance case in order to illustrate statistical methods for tail estimation.

For a more general survey and statistical methods of extreme value theory see Embrechts et al. [329], Reiss et al. [645], Coles [221], and Beirlant et al. [100]. These references also contain more technical details that are omitted here.

4.1 Heavy or Light Tails: QQ‐ and Derivative Plots

As discussed in Section 3.4 the mean excess function offers a first tool to discriminate between HTE and LTE tails. In practice, based on a sample X₁, X₂, …, X_n, the mean excess function can be naively estimated when replacing the expectation by its empirical counterpart:

where for any set A, 1_A(X_i) equals 1 if X_i ∈ A, and 0 otherwise. The value t is often taken equal to one of the data points, say the (k + 1)‐largest observation X_{n−k, n} for some k = 1, 2, …, n − 1. We then obtain

(4.1.1)

The mean excess values e_{k, n} can be plotted as a function of the threshold x_{n−k, n} or as a function of the inverse rank k.

There is an interesting link between the values e_{k, n} and exponential QQ‐plots. For an exponential distribution the quantile values images stand in linear relationship to the corresponding quantiles of the standard exponential distribution :

Hence, when estimating images by the empirical quantiles X_{j, n}, we have that the exponential QQ‐plot, defined by

should exhibit a linear pattern which passes through the origin for the exponential model to be a plausible model. An estimator of the slope can then also be used as an estimator of 1/λ.

Now e_{k, n} can be viewed as an estimate of the slope 1/λ_k of the exponential QQ‐plot to the right of an anchor point , and hence (x_{n−k, n}, e_{k, n}) or (k, e_{k, n}) for k = 1, …, n, can be interpreted as a derivative plot of the exponential QQ‐plot. When fitting a regression line which passes through the anchor point using least squares regression minimizing

with respect to 1/λ_k, one indeed obtains

so that using the approximation , which is sharp even for small k.

Also, when the data come from a distribution with a tail heavier than exponential, the exponential QQ‐plot will ultimately be convex and ultimately upcross the fitted regression line for every k, so that the slopes e_{k, n} will increase always with increasing X_{n−k, n} (or decreasing k), while for a tail lighter than exponential, the QQ‐plot will ultimately be concave, ultimately appearing under the fitted regression line for every k, and the slopes will decrease with increasing X_{n−k, n} (or decreasing k).

When modelling reinsurance claim data we expect convex exponential QQ‐plots linked with increasing mean excess plots (x_{n−k, n}, e_{k, n}). A popular second step is to inspect log‐normal or Pareto QQ‐plots. Note that the mean excess plots of a Pareto‐type distribution ultimately will be linear increasing with slope 1/(α − 1), as follows from (3.4.12). Again, log‐normal, respectively Pareto, tail fits appear appropriate when the right upper end of the corresponding QQ‐plot is linear from some point on. It is advisable to accompany the QQ‐plot with the corresponding derivative plots.

Since log X is exponentially distributed with λ = α when X is strict Pareto(α) distributed, the Pareto QQ‐plot is defined as

with derivative plot

where
(4.1.2)
H_{k, n} is the estimator of 1/α introduced by Hill [442]. Indeed, if the data come from a Pareto distribution, then the Pareto QQ‐plot is linear and the derivative plot is horizontal at the level 1/α.
The normal QQ‐plot based on the logarithms of the data provides the log‐normal QQ‐plot

where Φ⁻¹ denotes the standard normal quantile function. The derivative plot is then given by

with

since, with φ denoting the standard normal density,
The quantile function of the Weibull distribution is given by

so that for this model . Again taking (i = 1, …, n) and estimating Q(i/(n + 1)) by X_{i, n} leads to the definition of the Weibull QQ‐plot

The derivative plot is then given by

with

Insurance claim data often exhibit different statistical behavior over various subsets of the outcome set which can be observed in mean excess plots, starting with components in the center of the data followed by a Pareto tail. Sometimes such Pareto tails then turn out to be upper‐truncated, as defined in Section 3.3.1.1.

Case studies.

In Figures 4.1–4.3 the exponential, Pareto, log‐normal, and Weibull QQ‐plot together with the corresponding derivative plots are given for the Dutch fire insurance data, and the ultimate values for the car liability insurance from Companies A and B. In Figure 4.1 the regression lines based on the top 100 Dutch fire claim observations and passing through the corresponding anchor point at k = 100 are given. The corresponding slope estimate can be traced back in the derivative plot through the vertical coordinate of the anchor point in the QQ‐plot, which then is the horizontal coordinate of the slope estimate in the derivative plot.

The Dutch fire insurance data show a heavy‐tailed behavior since the exponential QQ‐plot is convex, which is consistent with the mean excess plot being increasing over the whole data range. However, in total at least three components can be detected with different slopes in the QQ‐ and derivative plots, with Pareto behavior for , a decreasing Pareto derivative plot for , and ultimately a heavy tail piece at (approximately). Note the horizontal behavior of the H_{k, n} plot for log x between 14 and 16, followed by constant Weibull derivatives for . However, ultimately at the largest data points again Pareto behavior appears.

With the ultimate data values for Company A a three‐component spliced distribution can be observed in Figure 4.2, starting with a component with decreasing derivative plots for , followed by a Pareto component when and a HTE tail piece for x ∈ (13, 15). Finally, there is an ultimate section using the top eight data points which shows a strong downward trend in each derivative plot, which could indicate upper‐truncation near some high value T. So here possible model candidates for tail fits are log‐normal, Pareto or an upper‐truncated tail.

Finally, the ultimate data from Company B also show three components, , and , ending with a short Pareto piece appearing at the top 10 data points which follows from a linear increasing mean excess plot in that area.

These QQ‐ and derivative plots give first indications which then should be studied further using the extreme value and splicing methods developed next.□

Image described by caption. — **Figure 4.1** Dutch fire insurance data: exponential QQ‐plot and mean excess plot (x_{n−k, n}, e_{k, n}) (top); Pareto QQ‐plot and Hill plot (second line); log‐normal QQ‐plot and derivative plot (third line); Weibull QQ‐plot and derivative plot (bottom). For each QQ‐plot the regression line through X_{n−99, n}, …, X_{n, n} is plotted.

4.2 Large Claims Modelling through Extreme Value Analysis

4.2.1 EVA for Pareto‐type Tails

In order to model large claims, Pareto tail modelling is probably the most common approach. Here we use the subset of models with tails heavier than exponential for which the EVI γ is positive, as discussed in Section 3.3.2.2, which in fact equals the set of Pareto‐type models that can be defined through tail functions 1 − F, quantile functions Q, or tail quantile functions U(x) = Q(1 − 1/x). Indeed

(4.2.3)

where γ = 1/α > 0 and ℓ_U is a slowly varying function. Also (3.2.7) is equivalent to

(4.2.4)

for every u > 0. In this section we discuss the estimation of γ = 1/α, large quantiles Q(1 − p) = U(1/p), and small tail probabilities . We assume in this and the next subsection that the data are independent and identically distributed (i.i.d.). Moreover mathematical approximations of variances (AVar), bias (ABias), mean squared error (AMSE), and distributions of estimators using k largest observations of the n data will hold when and k/n → 0.

4.2.1.1 Estimating a Positive EVI

The most popular estimator for γ is given by the Hill estimator H_{k, n} defined in (4.1.2), see [442]. In the preceeding section this estimator was retrieved through regression on the Pareto QQ‐plot. Here, we also show how the maximum likelihood method based on the so‐called POT approach leads to the same estimation method in the Pareto‐type case.

In Section 4.1 the Hill estimator was motivated as an estimator of the slope of a linear Pareto QQ‐plot to the right of an anchor point . In fact, this interpretation can be carried over to the general case of Pareto‐type distributions since then ultimately for the Pareto QQ‐plot is still linear with slope γ for a small enough k or, equivalently, for a large enough X_{n−k, n}. Indeed, under (4.2.3),

It can now be shown that for every slowly varying function ℓ

as . Hence, whereas Pareto QQ‐plots are hardly ever completely linear, they are ultimately linear at some set of largest values. The speed at which the linearity sets in depends on the underlying slowly varying function. Like many publications, following Hall [419], we assume here that
(4.2.5)
for some C, β > 0, and D a real constant. This can, however, be generalized to

as with b essentially a power function or, more correctly, a regularly varying function with index − β, and . Under (4.2.5)
(4.2.6)
from which H_{k, n} follows taking x = (n + 1)/j (j = 1, …, k), estimating U((n + 1)/j) by X_{n−j+1, n} (j = 1, …, k + 1), and taking the average of both sides of (4.2.6) over j = 1, …, k after deleting the last term on the right‐hand side. Omitting this final term (or, equivalently, assuming a strict Pareto distribution with constant slowly varying function) causes a bias which will be more important with smaller β. Adverse situations for the Hill estimator are logarithmic slowly varying functions ℓ_U, as in case of the log‐gamma distribution. Such cases exhibit β = 0.
Alternatively, the Hill estimator is also a maximum likelihood estimator based on (4.2.4). Indeed, extreme value methodology proposes fitting the limiting Pareto distribution with distribution function 1 − x^−1/γ to the POT values Y = X/t over a high threshold t conditionally on X > t. Note that the use of the mathematical limit in (4.2.4) to fit the exceedance data introduces an approximation error that leads to estimation bias. Let N_t denote the number of exceedances over t. Then the log‐likelihood equals

with

leading to the maximum likelihood estimator

Choosing an upper order statistic X_{n−k, n} for the threshold t (so that N_t = k) we obtain H_{k, n}.
From Section 4.1 it also follows that the Hill statistic can be interpreted as an estimator of the mean excess function of the log‐transformed data, that is, , with the threshold value t substituted by X_{n−k, n}. As in (3.4.10) we here find

Estimating F(u) using the empirical distribution function
(4.2.7)
with value over the interval [X_{n−j, n}, X_{n−j+1, n}) we are led to the estimator
(4.2.8)
Using summation by parts one observes that this final expression equals the Hill estimator:
(4.2.9)
with

(with ).

To deduce approximate expressions for the variance and bias of the Hill estimator it is helpful to consider the preceding interpretation in terms of the scaled log‐spacings Z_j. Thanks to the Rényi representation j(E_{n−j+1, n} − E_{n−j, n}) = _d E_j (j = 1, …, n) concerning order statistics E_{1, n} ≤ E_{2, n} ≤ … ≤ E_{n, n} from a random sample E₁, E₂, …, E_n of n independent standard exponential random variables, we have in case of a strict Pareto distribution (i.e., with ℓ_U constant), that

(4.2.10)

This representation is based on the memoryless property of the exponential distribution and the fact that nE_{1, n} is standard exponentially distributed. From (4.2.9) and (4.2.10) we expect that, as ,

Concerning the bias due to the approximation error, we confine ourselves to the model (4.2.5). Then the theoretical analogue of the Hill estimator is given by

with images . Hence, the approximate mean squared error is given by

while in order to construct confidence bounds we have that, as with k/n → 0,

(4.2.11)

4.2.1.2 Estimating Large Quantiles and Small Tail Probabilities

One of the most important applications of EVA is the estimation of extreme quantiles q_p = Q(1 − p) with p small, also termed Value‐at‐Risk (VaR) in risk applications. Alternatively, the return period for a high claim amount x given by is another measure describing extreme risks.

The estimation of a high quantile under Pareto‐type modelling can be performed by extrapolating along a fitted regression line on the Pareto QQ‐plot through the point with slope H_{k, n}. Following (4.2.6) with x = 1/p, estimating U((n + 1)/(k + 1)) by X_{n−k, n} and γ by H_{k, n}, and omitting the second term on the right‐hand side, that is, using

we arrive at the estimator

(4.2.12)

which was first proposed by Weissman [777]. The estimator can also be retrieved from (4.2.3), leading to the approximation U(vx)/U(x) ≈ v^γ for large values of x. Setting vx = 1/p, x = (n + 1)/(k + 1) so that v = (k + 1)/((n + 1)p), and estimating U((n + 1)/(k + 1)) by X_{n−k, n} and γ by H_{k, n}, we obtain again.

Estimation of return periods can be obtained using the inverse relationship on the Pareto QQ‐plot:

(4.2.13)

The expression for can also be deduced from (4.2.4), leading to the approximation for large values of t. Setting tu = x, t = X_{n−k, n} so that u = x/X_{n−k, n}, and estimating by (k + 1)/(n + 1) we obtain .

Approximate confidence bounds for such parameters have been derived based on asymptotic distributions of the estimators. In the case of the tail probability estimator we find with

when , k/n → 0, np_x/k → τ ∈ [0, 1) and , while with q_p = Q(1 − p),

when , k/n → 0, np/k → τ ∈ [0, 1) and .

4.2.1.3 Bias Reduction

When constructing confidence intervals for risk measures such as p_x and q_p, again the approximation of the underlying conditional distribution by the simple Pareto distribution entails a bias for all the existing estimators, next to the bias induced by estimating γ. One approach to reduce the bias is to construct estimators based on regression models of the values Z_j. Indeed, under (4.2.5), using the approximation and with the mean value theorem on x^−β at the points j/(n + 1) and (j + 1)/(n + 1), the theoretical analogue of a Z_j random variable can be approximated by

(4.2.14)

(4.2.15)

An alternative representation, using 1 + u ≈ e^u for small values of u, is then

The more accurate approximation

where E_j denotes a sequence of independent standard exponentially distributed random variables, was derived in an asymptotic sense in Beirlant et al. [97]. For each k, model (4.2.15) can be considered as a non‐linear regression model in which one can estimate the intercept γ, the slope b_{k, n}, and the power β with the covariates j/(k + 1). One can estimate these parameters jointly, or by using an external estimate for β, or using external estimation for β and images on the regression model

(4.2.16)

Gomes et al. found that external estimation for B and β should be based on k₁ extreme order statistics where k = o(k₁) and . Such an estimator for β was presented, for example, in Fraga Alves et al. [35]. Given an estimator for β, an estimator for B was given in Gomes and Martins [399]:

When the three parameters are jointly estimated for each k, the asymptotic variance turns out to be γ²((1 + β)/β)⁴, which is to be compared with the asymptotic variance γ² for the Hill estimator. Performing linear regression on images importing an external estimator for β, the asymptotic variance drops down to γ²((1 + β)/β)². The original variance γ² is retained when using the external estimators for B and β in (4.2.16).

Bias reduction of the extreme quantile estimator should not be based solely on replacing H_{k, n} by a bias‐reduced estimator for γ. Here we use the fact that X_{n−k, n} = _dU(1/U_{k+1, n}), where U_{k+1, n} denotes the (k + 1)th smallest order statistic from a uniform (0,1) sample of size n. Then we obtain from (4.2.3) and (4.2.5) with x = 1/p, approximating U_{k+1, n} by its expected value (k + 1)/(n + 1), and using 1 + u ≈ e^u for u small, that

so that a bias‐reduced version of is given by

where , and are bias‐reduced estimators based on the regression model (4.2.15).

Bias‐reduced estimators can also be obtained by improving on the approximation (4.2.4) of the POT distribution by the simple Pareto distribution, using an extension of the Pareto distribution as introduced in Beirlant et al. [102]. Indeed, when ℓ_U satisfies (4.2.5), then

(4.2.17)

The distribution of the POT’s X/t (X > t) can then be approximated using the expansion (1 + u)^b ≈ 1 + bu for u small:

This leads to the extended Pareto distribution (EPD) with distribution function

(4.2.18)

() with δ = δ_t = DC^β/γt^−β/γ and τ = −β/γ. Note that for an EPD random variable Y with τ = −1 and , it follows that Y − 1 is GPD distributed with parameters γ and σ.

Using the density of the EPD g_{γ, δ, τ}(y) = γ⁻¹y^−1/γ−1{1 + δ(1 − y^τ)}^−1/γ−1[1 + δ{1 − (1 + τ)y^τ}], maximum likelihood estimators are then derived through maximization of

with respect to γ, δ using an external estimator of τ through estimates of β and γ, where the values denote the POT values over the threshold t.

Bias‐reduced estimation of return periods is then obtained using X_{n−k, n} again as a threshold t:

4.2.1.4 Estimating the Scale Parameter

Finally, note that the scale parameter C in (4.2.5), or A = C^1/γ in (4.2.17), can be estimated with

(4.2.19)

(4.2.20)

which follows, for instance, from (4.2.17) replacing x by X_{n−k, n} and estimating by the empirical probability (k + 1)/(n + 1).

The estimator Ĉ_k,n can also be retrieved using least squares regression on the k top points of the log–log plot

minimizing

(4.2.21)

Substituting H_{k, n} for γ and taking the derivative with respect to log C indeed gives

In Beirlant et al. [104] it is shown that Â_k,n is asymptotically normally distributed with asymptotic variance images and asymptotic bias .

A bias‐reduced estimator of the scale parameter A is then given by

where is a bias‐reduced estimator of γ, and estimators of b_{k, n} and β.

Case studies.

The Hill and bias‐reduced estimators of a positive EVI are plotted in Figure 4.5 as a function of k and log k. Estimators for extreme quantiles and return periods are given in Figure 4.6, while the scale estimates can be found in Figure 4.7.

For the Dutch fire insurance data set a level γ ≈ 0.8 is visible for k > 150 using the bias‐reduced estimators, while for the smallest k, values between 0.4 and 0.5 appear when plotting the estimates as a function of log k. These plots are to be compared with the plots on the second line in Figure 4.1, where the H_{k, n} values are plotted against the data values.

Concerning the estimation of the quantile Q(0.999) again two levels become apparent, namely around 1.5 × 10⁸ at k < 150 and 3 × 10⁸ at k > 150. Correspondingly, for the return period two values e^7.5 when k < 150 and e^6.5 for k > 150 are detected. These components are found back again in the scale plots of Figure 4.7 with values around 9 and 11. These two values correspond to extrapolating on the Pareto QQ‐plot using only the 150 largest values, compared to setting the anchor much deeper in the QQ‐plot. Of course this last choice leads to a much more conservative tail extrapolation and eventually higher reinsurance premiums, as discussed below. Note that the scale parameter “compensates” for the lower EVI value for k < 150 with a larger value for the scale.

Concerning the ultimate values of Company A, note the three γ levels appearing from Figure 4.5 (middle): for k ≥ 600, when k ∈ (300;600), ending with when k < 200. In fact, two Pareto components are also visible in the mean excess plot in Figure 4.2 with two linear pieces with different slopes. Finally, the estimates drop down to 0 when k → 1, which could be due to upper‐truncation. For the estimates of Q(0.999) notice an overall stable bias‐reduced value at 5 million, with some slightly higher value at k ∈ (300;600). Note, however, that this value could be too large in view of the possible upper‐truncation. Concerning the return period for values over 4 million, again we observe two levels: a return period close to e⁶ for smaller values of k and a value somewhat larger than e⁵ for k ∈ (300;600). We revisit the estimation of this tail using a truncated Pareto model below. Note that the three segments are also visible in the scale estimates in Figure 4.7 (middle).

For the ultimate values of Company B, a Pareto component with is clearly visible for k > 250 from Figure 4.5 (bottom). After a systematic decrease for k down to 100, a level is reached for k ∈ (1;50). This corresponds to the graphs from Figure 4.3, where a Pareto component is followed by a light tail component, ending with an ultimate Pareto section at the top data. The two Pareto levels are also visible at the estimators of the quantile Q(0.999) with levels 6 and 30 million. This lowest level is of course only based on a few top observations. Finally, the return period over 6 million is estimated at e^4.5 when k ∈ (100, 400) and a value around e⁷ when using only a few exceedances.□

Scale estimating Â k,n and Â BR k,n as a function of k: Dutch fire insurance data (top); MTPL data for Company A, ultimate values (middle); MTPL data for Company B, ultimate values (bottom). — **Figure 4.7** Scale estimates Â_k,n and as a function of k: Dutch fire insurance data (top); MTPL data for Company A, ultimate values (middle); MTPL data for Company B, ultimate values (bottom).

4.2.2 General Tail Modelling using EVA

In order to allow tail modelling with log‐normal or Weibull tails, one has to incorporate the case where the EVI γ can be 0, next to positive values. Estimation of γ, extreme quantiles and return periods under the max‐domains of attraction conditions C_γ in (3.2.5) or (3.2.6), with as few restrictions on the value of γ as possible, is the next step in tail modelling. Again we have two possible approaches: using quantile plotting or using a likelihood approach on POT values.

Here, several existing estimators start from the following condition, which follows from (3.2.5): for all u ≥ 1 as
(4.2.22)
From this it follows with , that as , k/n → 0
(4.2.23)
Hence estimating EH_{k, n} by H_{k, n}, and by X_{n−k, n}, we find that for any estimator of γ

(4.2.24)
leads to an estimator for .

Since a regularly varies with index , it also follows from (4.2.23) that U((n + 1)/(k + 1))EH_{k, n} = ((n + 1)/(k + 1))^γℓ((n + 1)/(k + 1)) for some slowly varying function ℓ. Hence the approach using linear regression and extrapolation on linear tail patterns on a QQ‐plot can be generalized to the case of a real‐valued EVI using the generalized QQ‐plot

which ultimately for smaller values of k will be linear with slope γ, whatever the sign or values of γ. Hence if a generalized QQ‐plot is ultimately horizontal, then tail modelling using a distribution in the Gumbel domain of attraction is appropriate. An ultimately decreasing generalized QQ‐plot indicates a negative EVI, which can occur, for instance, for truncated heavy‐tailed distributions.

A generalized Hill estimator of γ estimating the slopes at the last k points on the generalized QQ‐plot is then given by

where UH_{j, n} = X_{n−j, n}H_{j, n}.

Another generalization of the Hill estimator to real‐valued EVI was given in Dekkers et al. [268], termed the moment estimator:

where
Condition (3.2.6) for a distribution to belong to a domain of attraction of an extreme value distribution means that the generalized Pareto law is the limit distribution of the distribution of POT values X − t given X > t when t → x₊: setting h(t) = σ_t
(4.2.25)
Hence, we are led to modelling the tail function of POT values Y = X − t with X > t using the GPD with survival function . Denoting the number of exceedances over t again by N_t, the log‐likelihood is given by

Using a reparametrization (γ, σ) → (γ, τ) with τ = γ/σ, leads to the likelihood equations

Replacing t by an intermediate order statistic X_{n−k, n} again gives

In order to assess the goodness‐of‐fit of the GPD when modelling the POT values Y = X − t for a given threshold t, one can use the transformation

so that R is standard exponentially distributed if the POT values do follow a GPD, and the fit can be validated inspecting the overall linearity of the exponential QQ‐plot

where (i = 1, …, N_t) denote the ordered values of

For all these estimators is asymptotically normal under some regularity conditions on the underlying distributions when and k/n → 0, with mean 0 if k is not too large (or, equivalently, if the threshold t = x_{n−k, n} is not too small), and asymptotic variances (or covariance matrix for ) given by

Estimators for small tail probabilities or return periods can easily be constructed from the POT approach. In fact, the approximation of P(X − t > y|X > t) by (1 + (γ/σ)y)^−1/γ for y > 0, setting t + y = x, leads to

Inversion leads to an extreme quantile estimator

The estimators for high quantiles based on the approach used in the construction of the moment estimator are defined by

with â defined in (4.2.24). Note that can be seen as an alternative estimator for σ when comparing the expressions of and . This then in turn leads to a moment tail probability estimator

The asymptotic distributions of these tail estimators have been derived in the literature. For instance, for we have under some regularity conditions that with a_n = (k + 1)/(p(n + 1)) as np_n → c ≥ 0, one has

for γ > 0
for γ < 0

For further details see de Haan and Ferreira [258].

Case studies.

The estimators of the EVI, extreme quantiles and return periods, which are consistent under all max‐domains of attraction, are given in Figures 4.8 and 4.9.

In the case of the Dutch fire example two linear increasing parts are clearly visible in the generalized QQ‐plot with a smaller slope at the largest values. These correspond again to the two γ levels in the Hill derivative estimates, namely 0.8 for higher values of k and 0.4 for k ↓ 1. The ML‐GPD estimators are somewhat lower. The quantile estimates and return period estimates confirm two levels, as in Figure 4.6, but the quantile levels are somewhat lower than under the Pareto analysis.

In the case of the ultimate values from Company A, we find an ultimately decreasing generalized QQ‐plot and correspondingly negative values of the EVI estimators at the smallest values of k. Concerning the quantile Q(0.999) estimates, here no stable pictures appear for the estimates based on the generalized Hill estimator, with a decreasing plot as k decreases, ending at approximately 4 million Euros. The quantile level at 5 million appearing in the Pareto analysis is confirmed here for the smallest k values. The return period corresponding to 4 million is higher than e⁶ in contrast to the Pareto analysis, which hints at a value just below e⁶.

In the case of the ultimate values from Company B, the segment with k ∈ (100, 250), which was already visible through a decreasing mean excess plot in Figure 4.3, corresponds with a slightly decreasing generalized QQ‐plot and negative EVI estimates in that region. Given the ultimate Pareto tail at the top data in this case, the values of return positive when k ↓ 1. In Figure 4.9 (bottom) the estimate of the quantile level Q(0.999) of 5 million found from a Pareto analysis at the smallest k values is found back here. Note finally that the return periods for values over 6 million yield similar results as with Pareto tail modelling.□

4.2.3 EVA under Upper‐truncation

Practical problems can arise when using the strict Pareto distribution and its generalization to the Pareto‐type model because some probability mass can still be assigned to loss amounts that are unreasonably large or even impossible. With respect to tail fitting of an insurance claim data, upper‐truncation is of interest and can be due to the existence of a maximum possible loss. Such truncation effects are sometimes visible in data, for instance when an overall linear Pareto QQ‐plot shows non‐linear deviations at only a few top data. Let W be an underlying non‐truncated distribution with distribution function F_W, quantile function Q_W and tail quantile function U_W. Upper‐truncation of the distribution of W at some value T was defined in the preceding chapter through the conditioning operation W|W < T. Let F_T and U_T be the distribution and tail quantile function of this truncated distribution. In practice one does not always know if the data X₁, …, X_n come from a truncated or non‐truncated distribution, and hence the behavior of estimators should be evaluated under both cases, and a statistical test for upper‐truncation is useful. This section is taken from Beirlant et al. [95].

Upper‐truncation of the distribution of W at some truncation point T yields

The corresponding quantile function Q_T is then given by

while the tail function U_T satisfies

(4.2.26)

(4.2.27)

with the odds of the truncated probability mass under the untruncated distribution W. Note also that for a fixed T, upper‐truncation models are known to exhibit an EVI γ = −1. This follows from verifying (3.2.5) for U_T as given in (4.2.27). For instance when U_W(x) = x^γ, we find images as . This final expression satisfies (3.2.5) with γ = −1.

4.2.3.1 EVA for Upper‐truncated Pareto‐type Distributions

We restrict attention to tail estimation for upper‐truncated Pareto‐type distributions:

where ℓ_F is a slowly varying function at infinity or, with W/t denoting the peaks over a threshold t when W > t,

Upper‐truncation of a Pareto‐type distribution at a high value T then necessarily requires and

One can now consider two cases as :

Rough upper‐truncation with the threshold t when T/t → β > 1 and
(4.2.28)
This corresponds to situations where the deviation from the Pareto behavior due to upper‐truncation at a high value will be visible in the data from the threshold t onwards, and an adaptation of the above Pareto tail extrapolation methods appears appropriate.
Light (or no) upper‐truncation with the threshold t: when
(4.2.29)
and hardly any truncation is visible in the data from the threshold t onwards, and the Pareto‐type model without truncation and the corresponding extreme value methods for Pareto‐type tails appear appropriate when restricted to excesses over t.

Under rough upper‐truncation we have

with density

Estimating T by X_{n, n} and taking t = X_{n−k, n} so that , we obtain the following log‐likelihood:

Now

which leads to the defining equation for the likelihood estimator :

This estimator was first proposed in Aban et al. [6]. Beirlant et al. [95] showed that with κ = β^1/γ − 1

From (4.2.27) it is clear that the estimation of D_T is an intermediate step in important estimation problems following the estimation of γ, namely of extreme quantiles and of the endpoint T. When U_W satisfies (4.2.3) it follows from (4.2.27) that as and T/t → β

(4.2.30)

so that

(4.2.31)

Motivated by (4.2.31) and estimating Q_T(1 − (k + 1)/(n + 1))/Q_T(1 − 1/(n + 1)) by R_{k, n}, one arrives at

(4.2.32)

as an estimator for D_T. In practice one makes use of the admissible estimator

to make it useful for truncated and non‐truncated Pareto‐type distributions.

For D_T > 0, in order to construct estimators of T and extreme quantiles q_p = Q_T(1 − p), as in (4.2.31) we find that

(4.2.33)

Then taking logarithms on both sides of (4.2.33) and estimating Q_T(1 − (k + 1)/(n + 1)) by X_{n−k, n} we find an estimator of q_p:

(4.2.34)

which equals the Weissman estimator when . An estimator of T follows from letting p → 0 in the above expressions for :

(4.2.35)

Here we take the maximum of log X_n,n and the value following from (4.2.34) with p → 0 in order for this endpoint estimator to be admissible. It has been shown that is superefficient under rough upper‐truncation, which means that the asymptotic variance is o(1/k) and the asymptotic bias is also smaller than, for instance, that of the moment quantile estimator .

However, is not a consistent estimator for q_p under light upper‐truncation and when np_n → 0. In that case one should use

(4.2.36)

The estimation of tail probabilities p_x = P(X > x) can be based directly on (4.2.28) using R_{k, n} as an estimator for 1/β:

(4.2.37)

Of course, in order to decide between (4.2.34) and (4.2.36) one should use a statistical test for deciding between rough and light upper‐truncation.

4.2.3.2 Testing for Upper‐truncated Pareto‐type Tails

Aban et al. [6] proposed a test for versus under the strict Pareto model, rejecting H₀ at level q ∈ (0, 1) when

(4.2.38)

for some 1 < k < n with A the scale parameter in the Pareto model. In (4.2.38), γ is estimated by H_{k, n}, the maximum likelihood estimator under H₀, while A is estimated using Â_k,n from (4.2.19). Note that the rejection rule (4.2.38) can be rewritten as

(4.2.39)

and the P‐value is given by images .

Considering the testing problem

under the upper‐truncated Pareto‐type model, Beirlant et al. [95] propose to reject when an appropriate estimator of (n + 1)D_T/(k + 1) is significantly different from 0. Here we construct such an estimator generalizing with an average of ratios (X_{n−k, n}/X_{n−j+1, n})^1/γ, j = 1, …, k, which then possesses an asymptotic normal distribution under the null hypothesis. Observe that with (4.2.30) under as

Estimating Ē_k,n by

leads now to

(4.2.40)

as an estimator of (n + 1)D_T/(k + 1), with an appropriate estimator of γ. Under , the Hill estimator H_{k, n} is an appropriate estimator of γ. Moreover, it can be shown that under some regularity assumptions on the underlying Pareto‐type distribution, we have under for and k/n → 0, that is asymptotically normal with mean 0 and variance 1/12. It is then also shown under rough upper‐truncation as , k/n → 0 that L_{k, n}(H_{k, n}) tends to a negative constant so that an asymptotic test based on L_{k, n}(H_{k, n}) rejects on level q when

(4.2.41)

with . The P‐value is then given by .

Case study.

Given the fact that the mean excess function of the ultimate values from Company A are ultimately decreasing at the largest values, an upper‐truncated Pareto model is a possible tail model. This is also clear from the Hill plot in Figure 4.10, which systematically decreases to 0 as k ↓ 1, and from the plots of the generalized Hill, moment and POT estimators of γ, which decrease to − 1 as k ↓ 1. The P‐values of the T_B test for upper‐truncation are lower than 0.05 in areas around k = 100 and k = 250. This means that, at the corresponding thresholds t = Q_T(1 − k/n), the upper‐truncated Pareto model in (4.2.28) yields a more appropriate model than the strict Pareto model to fit the distribution of the exceedances X/t. This is illustrated on the Pareto QQ‐plot in Figure 4.10 overlaying this upper‐truncated Pareto fit over the top 250 points. While the strict Pareto fit corresponds to a regression line on these points, the concave curve provided by modelling a truncation effect appears to provide a better fit.

The estimates , and even more are quite stable as a function of k leading to the approximate values, respectively , just above 6, and just below 5 million. This is a bit lower than the earlier estimates of the 0.999 quantile, and leads to a return period corresponding with 4 million, which is close to the values returned by the GPD‐ML method. Note that the endpoint T is estimated here at around 5 million at k = 100 and k = 250.□

4.3 Global Fits: Splicing, Upper‐truncation and Interval Censoring

Given an appropriate tail fit, the ultimate goal consists of fitting a distribution with a global satisfactory fit. Rather than trying to splice specific parametric models such as log‐normal or Weibull models for the modal part of the distribution, one can rely on fitting a mixed Erlang (ME) distribution, as discussed in Verbelen et al. [758]. We also consider this set‐up in the presence of truncation and censoring.

4.3.1 Tail‐mixed Erlang Splicing

The Erlang distribution has a gamma density

where r is a positive integer shape parameter. Following Lee and Lin [534] we consider mixtures of M Erlang distributions with common scale parameter 1/λ having density

and tail function

where the positive integers r = (r₁, …, r_M) with r₁ < r₂ < … < r_M are the shape parameters of the Erlang distributions, and α = (α₁, …, α_M) with α_j > 0 and are the weights in the mixture. Tijms [745] showed that the class of mixtures of Erlang distributions with a common scale 1/λ is dense in the space of positive continuous distributions on . Moreover this class is also closed under mixtures, convolution and compounding. Hence aggregate risks calculations are simple, and XL premiums and risk measures based on quantiles can also be evaluated in a rather straightforward way.For instance, a composite ME generalized Pareto distribution can be built using (3.5.14), that is, a two‐component spliced distribution with density

If a continuity requirement at t were imposed, this would lead to

The survival function of this spliced distribution is given by

Alternatively, one can take , where k_* is an appropriate number of top order statistics corresponding to an extreme value threshold .

Fitting ME distributions through direct likelihood maximization is difficult. A first algorithm was proposed by Tijms [745], but it turns out to be slow and can lead to overfitting. Lee and Lin [534] use the expectation‐maximization (EM) algorithm proposed by Dempster et al. [271] to fit the ME distribution. Model selection criteria, such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC) information criteria, are then used to avoid overfitting. Verbelen et al. [758] extend this approach to censored and/or truncated data. The need for the EM algorithm follows from the data incompleteness due to mixing and censoring.

The EM algorithm is used to compute the maximum likelihood estimator (MLE) for incomplete data where direct maximization is impossible. It consists of two steps that are put in an iteration until convergence:

E‐step: Compute the conditional expectation of the log‐likelihood given the observed data and previous parameter estimates.
M‐step: Determine a subsequent set of parameter estimates in the parameter range through maximization of the conditional expectation computed in the E‐step.

Rather than proposing a data‐driven estimator of the splicing point t, we use an expert opinion on the splicing point t based on EVA as outlined above. Then, π can be estimated by the fraction of the data not larger than t. Similarly, T is deduced from the EVA. The extreme value index γ is estimated in the algorithm, starting from the value obtained from the EVA at the threshold t. The final estimates for γ always turned out to be close to the EVA estimates. Next, the ME parameters (α, λ) are estimated using the EM algorithm as developed in Verbelen et al. [758]. The number of ME components M is estimated using a backward stepwise search, starting from a certain upper value, whereby the smallest shape is deleted if this decreases an information criterion such as AIC or BIC. Moreover, for each value of M, the shapes r are adjusted based on maximizing the likelihood starting from r = (s, 2 s, …, …, M × s), where s is a chosen spread factor.

Of course tail splicing of an ME can also be performed using a simple Pareto fit, or an EPD fit, whether or not adapted for truncation. For instance, splicing an ME with an upper‐truncated Pareto approximation leads to

(4.3.42)

4.3.2 Tail‐mixed Erlang Splicing under Censoring and Upper‐truncation

In reinsurance data left truncation appears at some point, denoted here by t^l, which can be a deductible or a percentage of the retention u from an XL contract. Claims leading to a cumulative payment below t^l at a given stage during development are then left truncated. Such a claim constitutes an IBNR claim. As discussed above, an upper‐truncation mechanism at some point T can appear.

We denote the ME density and distribution function by f_ME and F_ME, and similarly f_EV and F_EV for the EVA distribution. We then define, omitting the model parameters from the notation for the moment,

with 0 ≤ t^l < t < T where T can be equal to . The densities f₁ and f₂ are then valid densities on the intervals [t^l, t] and [t, T], respectively. For the first density, this means that it is lower truncated at t^l and upper truncated at t, and the second density is lower truncated at t and upper truncated at T. The corresponding distribution functions are

We consider the splicing density and distribution function

and

(4.3.43)

Next to truncation, censoring mechanisms occur in reinsurance¹ :

right censoring occurs for instance when a claim has not been settled at the evaluation date (RBNS claims). See Chapter 1 for the case of motor liability data. The final claim amount x_i will be larger than the lower censoring value l_i
left censoring occurs when only an upper bound u_i to the claim x_i is given
interval censoring means that the final claim value x_i is only known to be inside an interval [l_i, u_i] ⊂ [t^l, T].

In the splicing context with an EVA component from a threshold t on, we have the following five classes of observations:

Uncensored observations with t^l ≤ l_i = u_i = x_i ≤ t < T.
Uncensored observations with t^l < t < l_i = u_i = x_i ≤ T.
Interval censored observations with t^l ≤ l_i < u_i ≤ t < T.
Interval censored observations with t^l < t ≤ l_i < u_i ≤ T.
Interval censored observations with t^l ≤ l_i < t < u_i ≤ T.

These classes are shown in Figure 4.12. In the conditioning argument in the E‐step of the algorithm, the fifth case is split into x_i ≤ t and x_i > t, as indicated in Figure 4.12.

Illustration displaying 3 vertical dashed lines for t l, t, and T with 2 solid squares as observed data point in columns i and ii, and 4 shaded circles as unobserved data point in columns iii, iv, and v. — **Figure 4.12** The different classes of censored observations.

For the Erlang mixture, the number M and the integer shapes r are fixed when estimating Θ₁ = (α, λ). Also, Θ₂ denotes the extreme value parameter γ (together with σ when using the GPD tail fit). The idea behind the EM algorithm in this context is to consider the censored sample in contrast to the complete data which is not fully observed. Given a complete version of the data, we can construct a complete likelihood function as

where is the indicator function for the event {X_i ≤ t}. The corresponding complete data log‐likelihood function is

As we do not fully observe the complete version of the data sample, it is not possible to optimize the complete data log‐likelihood directly. The intuitive idea for obtaining parameter estimates in the case of incomplete data is to compute the expectation of the complete data log‐likelihood and then use this expected log‐likelihood function to estimate the parameters. However, taking the expectation of the complete data log‐likelihood requires the knowledge of the parameter vector, and so the algorithm has to run iteratively. Starting from an initial guess for the parameter vector, the EM algorithm iterates between two steps. In the hth iteration of the E‐step the expected value of the complete data log‐likelihood is computed with respect to the unknown data given the observed data and using the current estimate of the parameter vector Θ^(h−1) as true values,

In the M‐step, one maximizes the expected value of the complete data log‐likelihood obtained in the E‐step with respect to the parameter vector:

Both steps are iterated until convergence.

In the E‐step we distinguish the five cases of data points again to determine the contribution of a data point to this expectation:

Note that the event {t^l ≤ l_i = u_i ≤ t < T} indicates that we know t^l, l_i = u_i, t and T, and that the ordering t^l ≤ l_i = u_i ≤ t < T holds. Similar reasonings hold for the other conditional arguments in the expectations. Then, using the law of total probability, the final case can be rewritten as

where {t^l ≤ l_i < X_i ≤ t < u_i ≤ T} denotes that t^l, l_i, t, u_i and T are known, that the ordering t^l ≤ l_i < t < u_i ≤ T holds, and that {X_i ≤ t}. Using (4.3.43) we find that the probability in the first term is then given by

and similarly for the second term. The M‐step with maximization with respect to π, Θ₁ and Θ₂, and the choice of the initial values, is discussed in detail in Reynkens et al. [647].

EVA is not available in the literature for interval censored data. The role of the empirical survival and quantile functions in the construction of a tail analysis for complete data (i.e., setting X_{n−j+1, n} as an estimator of Q(1 − j/(n + 1)), j = 1, …, n) is taken over by the Turnbull [747] estimator . The Turnbull estimator is an extension to interval censoring of the Kaplan–Meier estimator or product‐limit estimator [482], that is, when .

The Kaplan–Meier estimator of 1 − F is defined as follows: letting 0 = τ₀ < τ₁ < τ₂ < … < τ_N (with N < n) denote the observed possible censored data, N_j the number of observations X_i ≥ τ_j, and d_j the number of values l_i equal to τ_j, then

This expression is motivated from the fact that
Turnbull’s algorithm is then constructed as follows: Let 0 = τ₀ < τ₁ < … < τ_m denote here the grid of all points l_i, u_i, i = 1, 2, …, n. Define δ_ij as the indicator whether the observation in the interval (l_i, u_i] could be equal to τ_j, j = 1, …, m. δ_ij equals 1 if (τ_j−1, τ_j] ⊂ (l_i, u_i] and 0 otherwise. Initial values are assigned to 1 − F(τ_j) by distributing the mass 1/n for the ith individual equally to each possible τ_j ∈ (l_i, u_i]. The algorithm is given as:
1. Compute the probability p_j that an observation equals τ_j by p_j = F(τ_j) − F(τ_j−1), j = 1, …, m.
2. Estimate the number of observations at τ_j by
3. Compute the estimated number of data with l_i ≥ τ_j by .
4. Update the product‐limit estimator using the values of d_j and N_j found in the two preceding steps. Stop the iterative process if the new and old estimate of 1 − F for all τ_j do not differ too much.

In case of interval censored data we can then estimate the mean excess function e (see (3.4.10)) substituting 1 − F by the Turnbull estimator :

(4.3.44)

As discussed in Section 4.2.1, the mean excess function based on the log‐data leads to an estimator of a positive extreme value index γ. As in (4.2.8), using the Turnbull estimator rather than the classical empirical distribution we obtain an estimator of γ > 0 in the case of incomplete data:

(4.3.45)

We then compute these statistics at the positions , k = 1, …, n − 1. Such plots will assist in choosing an appropriate threshold t and estimates of the extreme value index γ to validate the tail component in the splicing.

Case study: Dutch fire insurance data.

In this case no censoring is present, while the EVA did not indicate any upper‐truncation effect. However, there is a left truncation point t^l = 900 000. Fitting (4.3.42) with , π = 0.925 and t = 9 075 878 on the basis of the mean excess plot, setting γ = 0.784 to be compared with the Hill estimator at the threshold t, combined with an ME component with M = 2, α = (0.901, 0.099), r = (1, 5), λ⁻¹ = 1 038 901 leads to the fit presented in Figure 4.13.

Note, however, that the tail fit following from images (j = 1, …, n) (Figure 4.13, bottom right) is unsatisfactory. This is expected from the EVA discussed above, where we found two Pareto tail pieces with a lower EVI value following from the bias‐reduced estimators at the top 1% of the data. This is also visible with the extreme quantile estimates of Q(0.999) in Figure 4.6 (top). From this the following splicing model was fitted:

with t^l = 900 000, π₁ = 0.925, π₂ = 0.065, t₁ = 9 075 878, t₂ = 45 000 000, γ₁ = 0.947 and γ₂ = 0.427, while M = 2, α = (0.901, 0.099), and r = (1, 5), λ⁻¹ = 1 038 901 (see Figure 4.14).□

Case study: MTPL data for Company A.

The interval censoring approach is considered here with the indexed payments in 2010 as a lower bound and upper bounds for the non‐closed claims which are derived from the indexed incurred values. Concerning the upper bounds two methods are applied here.

First, in Figure 4.15 (top) we plot the percentage of incurred values which correctly act as upper bounds for the final payments of the closed claims as a function of the development year. From this we observe that from the sixth year of development the incurred values start to be reliable upper bounds with 90% confidence. We then restrict attention to the claims with at least 5 years of development, that is, with accident year before 2006. This restricted data set contains 596 claims of which 45% are censored. First we inspect the tails within the interval censoring approach on the basis of and from (4.3.45) with x taken in . We conclude that this mean excess plot adapted for interval censoring based on (4.3.44) has a shape comparable to the mean excess plot based on the ultimate values, but with a different horizontal scale and with a Hill‐type estimate (see (4.3.45)) that is situated between the two levels found in Figure 4.5. We coupled a ME with a Pareto (ME‐Pa). The parameters are:

See Figure 4.16.
Another approach follows from Figure 4.15 (bottom) where, for every development year d, we present the boxplots based on all claims, closed or non‐closed in 2010, of the ratios R_{i, d} of the final cumulative payment Z_i in 2010 over the incurred value I_{i, d} for the given development year for claim i: R_{i, d} = Z_i/I_{i, d}. When a claim is closed before a particular development year d, the ratio for that claim in year d equals 1. This plot yields relevant information on the possibility of using the incurred values as an upper bound: if a ratio R_{i, d} is larger than 1, the incurred value is smaller than the final available cumulative payment. The ratios R_{i, d} are also right censored in case the cumulative payment is censored. Estimating the right endpoints of the distributions of the R_{i, d} values per d using the methods developed in Einmahl et al. [315] then leads to factors f_d so that provide more reliable upper bounds for the real final cumulative payments. We then still deleted the claims from 2010 since the upper bounds for these losses are still not reliable. In Figure 4.15 (bottom) we also plot the factors f_d. We then inspect the tails again within the interval censoring approach on the basis of e^TB and H^TB from (4.3.45) with Ĩ_i,d serving as an upper bound for the final cumulative payment of claim i. The corresponding tail fit and splicing results, given in Figure 4.17, compare well with the results in Figure 4.16. However, the confidence intervals based on the Turnbull estimator are wider when using the larger upper bounds Ĩ_i,d for larger claim sizes (see Figure 4.17, bottom right).
The parameters of the splicing model here are:

Case study: MTPL data for Company B.

Again, the interval censoring approach is considered here with the indexed payments in 2010 as a lower bound and the indexed incurred values as upper bound of the intervals for claims under development in 2010. Here we restrict attention to the claims with at least 5 years of development and use the incurred values I_{i, d} as upper bounds. This restricted data set contains 428 claims, of which 48% are censored.

On the basis of e^TB and H^TB from (4.3.44) and (4.3.45) with x in , we conclude that the mean excess plot adapted for interval censoring has a shape comparable to the mean excess plot based on the ultimate values, but with a different horizontal scale, and with a stable plot of the Hill type estimates for γ at 0.5. This value can be seen as a compromise between the two levels found in Figure 4.5. We then splice an ME with a Pareto (ME‐Pa) tail. The parameters are:

See Figure 4.18.□

MPTL data for Company A: proportion vs. DY with 15 open circle plots (top) and boxplots of Ri,d for every development year d and factor fd used in the interval censoring approach (bottom). — **Figure 4.15** MTPL data for Company A: percentage of closed claims with incurred value being a correct upper bound for final payment as a function of the number of development years (DY) (top); boxplots of R_{i, d} for every development year d and factor *f_d* used in the interval censoring approach (bottom).

If the upper bounds are put to , that is, if one uses the right censoring framework, then, under the random right censoring assumption of independence between the real cumulative payment at closure of the claim and the censoring variable C which is observed in case the claim is right censored, estimators of γ > 0 have been proposed in Beirlant et al. [96, 101], Einmahl et al. [315], and Worms and Worms [796]. Using the likelihood approach, Beirlant et al. [101] proposed the estimator

(4.3.46)

with (i = 1, …, n) and the proportion of non‐censored data in the top kZ‐data. Einmahl et al. [315] derived asymptotic results, while Beirlant et al. [96] proposed a bias‐reduced version. Worms and Worms [796] derived a tail index estimator which is derived through the estimation of the mean excess function of the log‐data, comparable with the estimator derived in (4.3.45):

(4.3.47)

where the Kaplan–Meier estimator can be written as

with Δ_{i, n} equal to 1 if the ith smallest observation Z_{i, n} is non‐censored, and 0 otherwise.

4.4 Incorporating Covariate Information

In certain instances, the assumption of i.i.d. random variables, underlying the extreme value methods discussed above, may be violated. When analysing claim data from different companies, the tail fits may differ. Also, loss distributions may change over calendar years or along the number of development years. Sometimes considering covariates may remedy the situation. Let the covariate information, whether using continuous or indicator variables, be contained in a covariate vector x = (x₁, …, x_p). The extension of the POT approach based on (4.2.25) has been popular in literature, starting with the seminal paper by Davison and Smith [253]. However, there are also some methods available that focus on response random variables that exhibit Pareto‐type tails. Here we denote the response variables Z_i (rather than X_i as in the preceding sections) with the corresponding exceedances or POTs Y = Z/t or Y = Z − t when Z > t.

4.4.1 Pareto‐type Modelling

When modelling time dependence or incorporating any other covariate information in an independent data setting with Pareto‐type distributed responses Z_i, the exceedances are defined through Y_i = Z_i/t for some appropriate threshold t. Note that in many circumstances the threshold should then also be modelled along x = x_i, i = 1, …, n. As before, we assume that as

(4.4.48)

where A_i, γ_i > 0. Regression can be modelled through the scale parameter A and/or the extreme value index γ.

Changes in γ can be modelled in a parametric way using likelihood techniques. Suppose, for instance, that regression modelling of γ > 0 using an exponential link function appears appropriate in a given case study:

The log‐likelihood function is then given by

leading to the likelihood equations

Beirlant and Goegebeur [99] propose to inspect the goodness‐of‐fit of such a regression model under constant scale parameter A on the basis of a Pareto QQ‐plot using

which are indeed approximately Pareto distributed with tail index 1, when the regression model is appropriate.

The case where γ does not depend on i, while A does depend on i, was formalized in Einmahl et al. [316] assuming that there exists a tail function 1 − F and a continuous, positive function A defined on [0, 1] such that

(4.4.49)

uniformly for all and all i = 1, …, n with images . A is then called the skedasis function, which characterizes the trend in the extremes through the changes in the scale parameter A. Under (4.4.49), Einmahl et al. [316] showed that the Hill estimator H_{k, n} is still a consistent estimator for γ. Assuming equidistant covariates x_i = i/n, i = 1, …, n, as in (4.2.19),

where images denotes the number of Z values larger than the threshold Z_{n−k, n} with covariate value x in a neighbourhood of x_i. The contribution of the observations to is governed by a symmetric density kernel function K on [−1, 1] and K_h(x) = K(x/h)/h, so that K gives more weight to the observations with covariates closer to x_i. We hence obtain

Finally, estimators of small tail probabilities and large quantiles follow directly from (4.4.49), (4.2.12) and (4.2.13):

4.4.2 Generalized Pareto Modelling

Let Y₁, …, Y_n be independent GPD random variables and let x_i denote the covariate information vector, that is,

where γ(x), σ(x), μ(x) denote admissible functions of x, whether of parametric nature using three vectors of regression coefficients β_j (j = 1, 2, 3) of length p with , and , or of non‐parametric nature. Again this model is used as an approximation of the conditional distribution of excesses Y (x) = Z − μ(x) over a high threshold μ(x) given that there is an exceedance. The choice of an appropriate threshold μ(x) is of course even more difficult than in the non‐regression setting since the threshold can depend on the covariates in order to take the relative extremity of the observations into account.

When parametric functions , and have been chosen, the estimators of β_j (j = 1, 2, 3) can be obtained by maximizing the log‐likelihood function

where N_μ denotes the number of excesses over the threshold function μ(x).
Alternatively, non‐parametric regression techniques are available to estimate the parameter functions γ(x), σ(x). Consider independent random variables Z₁, …, Z_n and associated covariate information x₁, …, x_n. Suppose we focus on estimating the tail of the distribution of Z at x^*. Fix a high local threshold μ(x^*) and compute the exceedances Y_i = Z_j − μ(x^*), provided Z_j > μ(x^*), . Here j is the index of the ith exceedance in the original sample, and denotes the number of exceedances over the threshold μ_x^*. Then re‐index the covariates in an appropriate way such that x_i denotes the covariate associated with exceedance Y_i. Using local polynomial maximum likelihood estimation, one approximates γ(x) and σ(x) by polynomials, centered at x^*. Let h denote a bandwidth parameter and consider a univariate covariate x. Assuming γ, respectively σ, being m₁, respectively m₂, times differentiable one has for |x_i − x^*| ≤ h,

where

The coefficients of these approximations can be estimated by local maximum likelihood fits of the GPD, with the contribution of each observation to the log‐likelihood being governed by a kernel function K. The local polynomial maximum likelihood estimator (β₁, β₂) = () is then the maximizer of the kernel weighted log‐likelihood function

with respect to , where g(y;μ, σ) = (1/σ)(1 + (γ/σ)y)^−1−1/γ is the density of the generalized Pareto distribution.

A more recent approach is using penalized log‐likelihood optimization based on spline functions. Let the covariates x be one‐dimensional within an interval [a, b]. The goal is to fit reasonably smooth functions h_γ and h_σ with γ(x) = h_γ(x) and σ(x) = h_σ(x) to the observations (Y_i, x_i), i = 1, …, N_μ. The penalized log‐likelihood is then given by

The introduction of the penalty terms is a standard technique to avoid over‐fitting when one is interested in fitting smooth functions (see Hastie and Tibshirani [428] or Green and Silverman [408]). Next • stands for γ or σ. Intuitively the penalty functions measure the roughness of twice‐differentiable curves and the smoothing parameters λ_• are chosen to regulate the smoothness of the estimates ĥ_• : larger values of these parameters lead to smoother fitted curves.

Let a = s₀ < s₁ < … < s_m < s_m+1 = b denote the ordered and distinct values among . A function h defined on [a, b] is a cubic spline with the above knots if the following conditions are satisfied:
- on each interval [s_i, s_i+1], h is a cubic polynomial
- at each knot s_i, h and its first and second derivatives are continuous.
A cubic spline is a natural cubic spline if in addition to the two latter conditions it satisfies the natural boundary condition that the second and third derivatives of h at a and b are zero. It follows from Green and Silverman [408] that for a natural cubic spline h with knots s₁, …, s_m one has

where h_• = (h_•(s₁), …, h_•(s_m)), and K is a symmetric m × m matrix of rank m − 2 only depending on the knots s₁, …, s_m. Hence

In order to assess the validity of a chosen regression model one can generalize the exponential QQ‐plot of generalized residuals defined before in the non‐regression case:

with

Finally, given regression estimators for (γ(x), σ(x)) using an appropriate threshold function μ(x), extreme quantile estimators are given by

where can, for instance, be taken to be equal to the Nadaraya–Watson estimator

For more details and other non‐parametric methods, refer to Davison and Ramesh [252], Hall and Tajvidi [420], Chavez‐Demoulin and Davison [200], Daouia et al. [249] [248], Gardes and Girard [368, 369], Gardes and Stupfler [370], Goegebeur, Guillou and Osmann [393], and Stupfler [713], as well as Chavez‐Demoulin et al. [201] for other non‐parametric extreme value regression methods and applications.

Case study: Austrian storm claim data.

We consider here the modelling of the normalized historical losses of residential buildings from Section 1.3.3 in Vienna and the Upper Austria provinces as a function of the building value weighted wind index W. We model the conditional extreme value index γ = β₁ constant in W, while log σ_W is considered to be linear in W:

Finally, we take here μ_W = 0, that is, we take all the data, since the data can already be considered as exceedances. Hence the model is

The results from a maximum likelihood analysis are

Upper Austria: β₁ = 0.445, β_{2, 1} = −7.2, β_{2, 2} = 0.046;
Vienna: β₁ = 0.337, β_{2, 1} = −10.8, β_{2, 2} = 0.065

(cf. Figure 4.20). We also plot the estimates of the quantile Q(0.97|W) using parametric and non‐parametric fits, jointly with the residual QQ‐plots. From the residual QQ‐plot for the Vienna province we deduce that the storm with w = 59 is an outlier. Deleting that storm from the data set leads to β₁ = −0.163, β_{2, 1} = −10.5 andβ_{2, 2} = 0.064. Hence this particular storm has a high influence on the analysis.□

4.4.3 Regression Extremes with Censored Data

In Section 4.3 we discussed the problem when estimating the distribution of the final payments based on censored data using the Kaplan–Meier estimator of the distribution of the payment data. Here we propose to consider regression modelling of the final payments given the development time at the closure of a claim. Note, however, that both the final payments and development periods are right censored, both variables being censored (or not censored) at the same time. We again use the notation Z_i (i = 1, …, n) for the observed cumulative payment at the end of the study from that section, and similarly nDY_{e, i} for the observed number of development years at the end of 2010. Again Δ_{i, n} denotes the indicator of non‐censoring corresponding to the ith smallest observed value payment Z_{i, n}. Akritas and Van Keilegom [11] proposed the following non‐parametric estimator of the conditional distribution of X given a specific value of nDY assuming that X and the censoring variable C (see Section 4.3) are conditionally independent given nDY:

with weights

Denoting the weight W corresponding to the ith smallest Z value Z_{i, n} with W_{i, n} we then arrive at the following Hill‐type estimator of the conditional extreme value given nDY = d, generalizing the unconditional Worms and Worms estimator defined in (4.3.47):

Pareto QQ‐plots adapted for censoring per chosen d value can then be defined as

MTPL data for Company A: time plots of cumulative payments Zi as a function of nDYe (top); Pareto QQ-plots (middle) and Hill estimates (bottom) adapted for right censoring at development years nDY = 3, 8,13. — **Figure 4.21** MTPL data for Company A: time plots of cumulative payments *Z_i* as a function of *nDY_e* (top); Pareto QQ‐plots (middle) and Hill estimates (bottom) adapted for right censoring at development years *nDY* = 3, 8, 13.

In order to derive a full model for the complete payments X as a function of nDY = d a local version of the splicing algorithm from Section 4.3.2 can be developed considering random right censoring on X with the kernel weights W_i = W_i(d;h) as introduced above. The EM algorithm can then be applied using a kernel weighted log‐likelihood, comparable with the approach from Section 4.4.1. For instance, given the complete version of the data, the complete likelihood function is then given by

The corresponding complete data weighted log‐likelihood function then equals

Table 4.1

	nDY = 3	nDY = 8	nDY = 13
π	0.859	0.846	0.826
t	390 000	390 000	390 000
M	2	2	2
α	(0.235,0.765)	(0.213,0.787)	(0.173,0.827)
r	(1,5)	(1,4)	(1,4)
λ⁻¹	39 693	50 954	51 966
γ	0.441	0.453	0.468

MTPL data for Company A: fit of splicing model at development years nDY = 3, 8, 13; PP plot of empirical survival function against splicing model RTF at nDY = 8 (top); idem with – log transformation (bottom). — **Figure 4.22** MTPL data for Company A: fit of splicing model at development years *nDY* = 3, 8, 13; PP plot of empirical survival function against splicing model RTF at *nDY* = 8 (top); idem with transformation (bottom).

4.5 Multivariate Analysis of Claim Distributions

Joint or multivariate estimation of claim distributions, for example originating from different lines of business which are possibly dependent, requires estimation of each component or marginal separately and of the dependence structure. The joint analysis of loss and allocated loss adjustment expenses (ALAE) forms another example in insurance. An early analysis of such a case is provided in Frees and Valdez [359]. A detailed EVA of such a data set using the concept of extremal dependence is found in Chapter 9 in Beirlant et al. [100].

We first model the multivariate tails using data that are large in at least one component, followed by a splicing exercise combining a tail and a modal fit. In a multivariate setting this program is of course much more complex in comparison with the univariate case. For the tail section we refer to the multivariate POT modelling using the multivariate generalized Pareto distribution, as introduced in Section 3.6. The joint modelling of “small” losses will be based on a multivariate generalization of the mixed Erlang distribution introduced by Lee and Lin [533]. Research in this matter has started only recently and here we only examine an ad hoc modelling for the Danish fire insurance data.

4.5.1 The Multivariate POT Approach

From (3.6.25) and (3.6.26) one observes the importance of estimating the stable tail dependence function l defined in Chapter 3. The estimation of the tail dependence can be performed non‐parametrically or using parametric models. We refer to Kiriliouk et al. [488] for fitting parametric multivariate generalized Pareto models using censored likelihood methods.

A non‐parametric estimator of an STDF is given by

(4.5.50)

with

where R_{i, j} denotes the rank of X_{i, j} among X_{1, j}, …, X_{n, j}:

The estimator is a direct empirical version of definition (3.6.22) of l with u = n/k.

A slightly different version is given by

(4.5.51)

where denotes the ith smallest observation of component j. Bias‐reduced versions of these estimators were proposed in Fougères et al. [357] and Beirlant et al. [98]. In the bivariate case, or then act as estimators of the extremal coefficient θ.

An estimator of the extremal dependence coefficient χ can be constructed on the basis of an estimator of χ(u) for u → 1 using the estimator of C(u, u)

(4.5.52)

where (j = 1, 2; i = 1, …, n) with denoting the empirical distribution function of the jth marginal and X_{i, j} the ith observation of the jth component. Hence

As an application note that an estimator of the parameter τ in the logistic dependence model can be obtained from χ(u) → 2 − 2^τ as u → 1, from which

Setting

the Hill estimator based on (i = 1, …, n) leads to an estimator of the coefficient of tail dependence η. Of course bias reduction techniques can be applied here too.

4.5.2 Multivariate Mixtures of Erlangs

Lee and Lin [533] defined a d‐variate Erlang mixture where each mixture component is the joint distribution of d independent Erlang distributions with a common scale parameter 1/λ > 0. The dependence structure is then captured by the combination of the positive integer shape parameters of the Erlangs in each dimension. We denote the positive integer shape parameters of the jointly independent Erlang distributions in a mixture component by the vector r = (r₁, …, r_d) and the set of all shape vectors with non‐zero weight by ℛ. The density of a d‐variate Erlang mixture evaluated in x > 0 can then be written as

(4.5.53)

Lee and Lin [533] showed that, given any density f(x), the d‐variate Erlang mixture

with mixing weights

satisfies . The weights α_r of the components in the mixture are defined by integrating the density over the corresponding d‐dimensional rectangle of the grid formed by the shape parameters multiplied with the common scale. When the value of λ increases, this grid becomes more refined and the sequence of Erlang mixtures converges to the underlying distribution function.

Verbelen et al. [757] provided a flexible fitting procedure for multivariate mixed Erlangs (MMEs), which iteratively uses the EM algorithm, by introducing a computationally efficient initialization and adjustment strategy for the shape parameter vectors. Randomly censored and fixed truncated data can also be dealt with.

Case study: Danish fire insurance data.

Here we consider a bivariate splicing model for the components building and contents, conditional on (building, contents) t^l = (1, 1). We first fitted a bivariate GPD based on a logistic extreme value distribution based on excesses over the threshold vector t = (7.32, 10.27) corresponding to k = 35 in the univariate extreme value plots. Univariate EVA leads to γ values around 0.5 for the building component and around 0.6 for the contents component. Fitting the GPD to each component leads to initial σ estimates. The parameter τ in the logistic dependence model can be estimated through estimating θ = l(1, 1) = 2^τ or by estimating χ(u) → 2 − 2^τ as u → 1. These estimates are plotted in Figure 4.23 (middle) leading to or images taking u = 0.5 from which . We further consider this second estimate.

Concerning the tail dependence coefficient η, the level is dominating, while at the smallest k values the estimates increase systematically with decreasing k. As the plot appears to indicate asymptotic dependence corresponding with η equal to 1, one has to be cautious interpreting the plot which indeed ultimately for the smallest k tends to values around 1.

The bivariate (c.d.f) function of a splicing model with a bivariate mixed Erlang and a bivariate GPD is now given by

with F_MGPD denoting the distribution function of the bivariate GPD as given in (3.6.25).

A bivariate mixed Erlang distribution was fitted along the method provided in Verbelen et al. [757] conditioned on [1, 7.32] × [1, 10.27], leading to r vectors (1,1) and (3,8) and α weights 0.92 and 0.08, and 1/λ = 1.49. The proportion for the bivariate mixed Erlang fit is π = 0.794. The (c.d.f) corresponding to f₁ is then given by

The bivariate distribution function of the fitted bivariate GPD is given by

In order to guarantee that the marginal distributions have support on one has to impose the constraints and , which then lead to the parameter values (γ₁ = 0.57, σ₁ = 3.57) and (γ₂ = 0.65, σ₂ = 6.05).

4.6 Estimation of Other Tail Characteristics

In Section 4.2.1.2 using EVA we discussed the estimation of an extreme quantile or a VaR

in detail. Another popular tail characteristic is the conditional tail expectation CTE_1−p(X) defined by

when , where e denotes the mean excess function defined in Section 3.4. If X is a continuous random variable, the CTE equals the Tail‐VaR and the expected shortfall (ES) (cf. Section 7.2.2, where the role of these quantities for determining the solvency capital is discussed).

For an unlimited XL treaty with retention u, recall from Chapter 2 that the expected reinsured amount of a single claim X is given by

which is also referred to as the pure premium for R (see Chapter 7 for details). One immediately observes

With a finite layer size v in the XL treaty, the pure premium becomes

Hence the estimation of VaR_1−p(X) and Π(u) at small and intermediate values of p, and at high and intermediate values of u is an important building block in measuring and managing risk.

When estimating VaR_1−p(X) for a two‐component spliced distribution, we have from (4.3.43)

(4.6.54)

where Q₁ denotes the quantile function of the ME component and Q₂ of the tail component. Q₁ can be obtained numerically. When the tail component is given by a simple Pareto distribution we have

and hence with t = X_{n−k, n} and 1 − π = (k + 1)/(n + 1), (4.6.54) yields from (4.2.12) when π < p ≤ 1. Using an upper‐truncated Pareto or a generalized Pareto tail fit, one can use or , respectively, for π < p ≤ 1.

When estimating Π(u) we again identify two cases: u ≤ t = x_{n−k, n} and u > t = x_{n−k, n}, in which case the EVA modelling can be used.

When u > t, then from (4.3.43)

where Π₂(u) is given by the following expressions for the different possible EVA tail fits with EVI estimate smaller than 1:

Truncated Pareto fit:
EPD fit: using the notation from (4.2.18)
Generalized Pareto fit:

When u < t, we have from (4.3.43) that

Note that Π(u) = 0 for u ≥ T and Π(u) = Π(t^l) + (t^l − u) for u ≤ t^l. For the mixed Erlang distribution we get

with

and, assuming that r_n = n, n = 1, …, M,

Case study: MTPL data for Company A.

Based on the splicing model for the data of Company A within the interval censored framework, using an unbounded Pareto, the fit of which is shown in Figure 4.16, we calculate the XL pure premium Π(u) as a function of u in Figure 4.24. We also add an estimate for Π(u) when taking , that is, considering only the lower bounds for the censored claims. The resulting value is significantly higher, which is consistent with the high estimates of the extreme value index as indicated in Figure 4.19. In order to compare with the classical approach using a statistical model for the ultimate estimates of the open claims, we also provide a comparison with the results based on the splicing model from Figure 4.11. This “classical” pure premium is also uniformly higher than the one obtained using interval censoring.□

U vs. excess−loss premium displaying descending solid, dashed, and dotted lines representing ME−Pa ic, ME−Pa rc and ME−Pa ultimates, respectively. — **Figure 4.24** MTPL data for Company A: XL pure premium Π(u) based on ME‐Pa fit taking interval censoring into account. Comparison with the result when the upper bounds are ignored (right censoring) and when the premium is based on the ultimates.

Of course the estimation of Π(u) can be extended to a regression context. For instance, when u = u(x) is larger than a threshold function μ(x) of a one‐dimensional covariate x, and using the GPD modelling approach, one obtains for

The result of this procedure based on the GPD regression fit for the storm claim data of Upper Austria, with GPD(0.445, e^{−7.2+0.046w}, 0), is shown in Figure 4.25.

Austrian storm claim data: XL pure premium Π(exp(0.001w)) for Upper Austria based based on a GPD regression fit with the wind index W as covariate, illustrated by an ascending curve. — **Figure 4.25** Austrian storm claim data: XL pure premium Π(exp(0.001w)) for Upper Austria based based on a GPD regression fit with the wind index W as covariate.

4.7 Further Case Studies

We end this chapter by analysing the case studies on flood risk and earthquake risk which were introduced in Chapter 1.

Flood risk. Here we model the aggregate annual loss data introduced in Section 1.3.4 (given as a percentage of the building value) for Germany and the UK. All presented derivative plots for Germany in Figure 4.27 based on the Pareto, log‐normal, and Weibull QQ‐plots ultimately are decreasing, while for the UK in Figure 4.26 the decrease in the Weibull derivative plot is small and this plot is closest to being constant when . The systematic decrease in the different estimators of γ with increasing threshold, together with the P‐values of the T_B test for upper‐truncation does indicate some evidence for a truncated Pareto tail. Indeed, for both countries the truncated Pareto model fits well. The estimates of the right truncation point T are situated around 0.25 for the UK and 0.35 for Germany. However, for the UK data, a Weibull fit provides a valid alternative.
Earthquake risk. We consider recent magnitude data of the 200 largest earthquakes in the Groningen area (the Netherlands) which are caused by gas extraction. In Figure 4.28 (top left), we present the exponential QQ‐plot. A linear pattern is visible for a large section of the magnitudes data, while some concave curvature appears at the largest values. Along the Gutenberg–Richter (1956) law the magnitudes of independent earthquakes are drawn from a doubly truncated exponential distribution

Kijko and Singh [487] provide a review of the vast literature on estimating the maximum possible magnitude T_M. The energy E released by earthquakes, expressed in megaJoules, relates to the magnitude M by

When transforming the magnitude data back to the energy scale, the Gutenberg–Richter model predicts a truncated Pareto tail. In Figure 4.28, plotting the Hill estimates we observe a systematic decrease with decreasing k, while the moment and ML‐GPD estimators tend to − 1 near k = 1. The estimates of stay rather stable at a level . The P‐values of the T_B test for upper‐truncation are boundary significant at significance level 0.05 for k ∈ (30, 70). The amount of truncation is estimated around . The goodness of fit of the truncated Pareto fit is illustrated on the Pareto QQ‐plot of the energy data where the truncated Pareto‐model is fitted based on the top 50 values. The maximum magnitude is then estimated at 3.75 for the Groningen area.

4.8 Notes and Bibliography

In case of the Gumbel domain of attraction with γ = 0, EVA based on fitting a generalized Pareto distribution to POT values is known to exhibit slow convergence rates in many cases. To this end more specific models have been proposed, for example El Methni et al. [320] and De Valk and Cai [262].

In the last few decades some papers have appeared concerning robust estimation methods. Robust methods can improve the quality of extreme value data analysis by providing information on influential observations, deviating substructures and possible mis‐specification of a model while guaranteeing good statistical properties over a whole set of underlying distributions around the assumed one. On the other hand an EVA precisely is performed to consider and emphasize the role of extremes. Hence in a risk management context it can hardly be the purpose to delete the most extreme observations when they were correctly reported. Robust and non‐robust estimators then yield different scenarios for risk assessment should be compared. An interesting discussion on this can be found in Dell’Aquila and Embrechts [270].

EVA is an active field of research. A notable recent contribution is Naveau et al. [587], which gives an alternative to splicing methods in order to produce full models in a hydrological context. In De Valk [261] and Guillou et al. [415], some further new modelling approaches in the multivariate case are introduced.

A Bayesian approach to estimate the total cost of claims in XL reinsurance has been covered in Hesselager [441]. For the estimation of the Pareto index in XL treaties, see Reiss et al. [644]. Leadbetter [529] studied the connection between tail inference and high‐level exceedance modelling, which is relevant for the XL case. Examples of early statistical analyses of large fire losses are Ramachandran [639], Ramlau‐Hansen [640], and Corradin et al. [227]. Resnick [646] also studied the Danish fire insurance data set. Data for coverages of homes are analyzed in Grace et al. [404]. For glass losses, see Ramlau‐Hansen [640]. Property reinsurance for the USA is covered in Gogol [394], for example.

Note

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4 Statistics for Claim Sizes

Create new playlist

Sign In

Sign Up

4.1 Heavy or Light Tails: QQ‐ and Derivative Plots

4.2 Large Claims Modelling through Extreme Value Analysis

4.2.1 EVA for Pareto‐type Tails

4.2.1.1 Estimating a Positive EVI

4.2.1.2 Estimating Large Quantiles and Small Tail Probabilities

4.2.1.3 Bias Reduction

4.2.1.4 Estimating the Scale Parameter

4.2.2 General Tail Modelling using EVA

4.2.3 EVA under Upper‐truncation

4.2.3.1 EVA for Upper‐truncated Pareto‐type Distributions

4.2.3.2 Testing for Upper‐truncated Pareto‐type Tails

4.3 Global Fits: Splicing, Upper‐truncation and Interval Censoring

4.3.1 Tail‐mixed Erlang Splicing

4.3.2 Tail‐mixed Erlang Splicing under Censoring and Upper‐truncation

4.4 Incorporating Covariate Information

4.4.1 Pareto‐type Modelling

4.4.2 Generalized Pareto Modelling

4.4.3 Regression Extremes with Censored Data

4.5 Multivariate Analysis of Claim Distributions

4.5.1 The Multivariate POT Approach

4.5.2 Multivariate Mixtures of Erlangs

4.6 Estimation of Other Tail Characteristics

4.7 Further Case Studies

4.8 Notes and Bibliography

Note

Table of Contents for
4 Statistics for Claim Sizes