Chapter 7

Multivariate Survival Models

7.1 Introduction

Sometimes, survival data come in clusters, and multivariate, or frailty, models are appropriate to use.

Ever since the paper by Vaupel, Manton & Stallard (1979), the concept of frailty has spread in even wider circles of the research community. Although their primary purpose was to show various consequences of admitting individual frailties (“individuals age faster than cohorts,” due to the selection effect), the effect was that people started to implement their frailty model in Cox regression models.

7.1.1 An Introductory Example

Let us assume that in a follow-up study, the cohort is not homogeneous but instead consists of two equally sized groups with differing hazard rates. Assume further that we have no indication of which group an individual belongs to, and that members of both groups follow an exponential life length distribution:

h1(t)=λ1h2(t)=λ2t>0.

This implies that the corresponding survival functions S1 and S2 are

S1(t)=eλ1tS2(t)=eλ2tt>0,

and a randomly chosen individual will follow the “population mortality” S, which is a mixture of the two distributions:

S(t)=12S1(t)+12S2(t),t>0.

Let us calculate the hazard function for this mixture. We start by finding the density function f:

f(t)=dS(x)dx=12(λ1eλ1t+λ2eλ2t),t>0.

Then, by the definition of h, we get

h(t)=f(t)S(t)=ω(t)λ1+(1ω(t))λ2,t>0,(7.1)

with

ω(t)=eλ1teλ1t+eλ2t.

It is easy to see that

ω(t){ 0,λ1>λ212,λ1=λ21,λ1<λ2   ,as t,

implying that

h(t)min(λ1,λ2),t,

see Figure 7.1. The important point here is that it is impossible to tell from data alone whether the population is homogeneous, with all individuals following the same hazard function (7.1), or if it in fact consists of two groups, each following a constant hazard rate. Therefore, individual frailty models (like hi(t) = Zih(t), i = 1,..., n, where Zi is the “frailty” for individual No. i, and Z1, ..., Zn are independent and identically distributed (iid)) are less useful.

Figure 7.1

Figure showing “population hazard function” (solid line). the dashed lines are the hazard functions of each group, λ1 = 1, λ2 = 2.

“Population hazard function” (solid line). The dashed lines are the hazard functions of each group, λ1 = 1, λ2 = 2.

A heuristic explanation to all this is the dynamics of the problem: we follow a population (cohort) over time, and the composition of it changes over time. The weaker individuals die first, and the proportion stronger will steadily grow as time goes by.

Another terminology is to distinguish between individual and population hazards. In Figure 7.1 the solid line is the population hazard, and the dashed lines represent the two kinds of individual hazards present. Of course, in a truly homogeneous population, these two concepts coincide.

7.2 Frailty Models

Frailty models in survival analysis correspond to hierarchical models in linear or generalized linear models. They are also called mixed effects models. A general theory, with emphasis on using R, of mixed effects models can be found in (Pinheiro & Bates 2000).

7.2.1 The Simple Frailty Model

Vaupel et al. (1979) described an individual frailty model,

h(t;,x,Z)=h0(t)Zeβx,t>0,

where Z is assumed to be drawn independently for each individual. Hazard rates for “random survivors” are not proportional, but converging (to each other) if the frailty distribution has finite variance. Thus, the problem may be less pronounced in AFT than in PH regression. However, as indicated in the introductory example, with individual frailty the identification problems are large, and such models are best avoided.

7.2.2 The Shared Frailty Model

Frailty models work best when there is a natural grouping of the data, so that observations from the same group are dependent (share the same frailty), while two individual survival times from different groups can be regarded as independent. Such a model may be described as

hi(t;x)=hi0(t)eβx,i=1,...,s;  t>0,(7.2)

which simply is a stratified Cox regression model. By assuming

hi0(t)=Zih0(t),i=1,...s;  t>0,(7.3)

the traditional multivariate frailty model emerges. Here it is assumed that Z1, ..., Zs are independent and identically distributed (iid), usually with a lognormal distribution. From (7.2) and (7.3) we get, with Ui = log(Zi),

hi(t;x)=h0(t)eβx+Ui,i=1,....s;  t>0.

In this formulation, U1, ..., Us are iid normal with mean zero and unknown variance σ2. Another popular choice of distribution for the Z:s is the gamma distribution.

In R, the package coxme (Therneau 2011) fits frailty models. We look at the fertility data set in the R package eha.

> require(eha)
> data(fert)
> head(fert)
 id parity age year next.ivl event prev.ivl ses parish
1 1  0 24 1825 0.411 1  NA farmer SKL
2 1  1 25 1826  22.348 0 0.411 farmer SKL
3 2  0 18 1821 0.304 1  NA unknown SKL
4 2  1 19 1821 1.837 1 0.304 unknown SKL
5 2  2 21 1823 2.546 1 1.837 unknown SKL
6 2  3 23 1826 2.541 1 2.546 unknown SKL

It seems natural to assume that the lengths of birth intervals vary with mother, so we try a frailty model with id as the grouping variable. Also notice that the first interval for each woman is measured from marriage (only married women are included in this data set) to first birth, so we will start by removing them. They are characterized by parity being 0.

> fe <- fert [fert$parity != 0,]
> require(coxme)
> fit <- coxme(Surv(next.ivl, event) ˜ age + ses + parity +
    (1 | id), data = fe)
> fit
Cox mixed-effects model fit by maximum likelihood
 Data: fe
 events, n = 8458, 10312
 Iterations= 15 65
    NULL Integrated Fitted
Log-likelihood -71308.31 -69905.17 -68211.87
    Chisq  df p AIC  BIC
Integrated loglik 2806.28 6.00 0 2794.28 2752.02
 Penalized loglik 6192.88 1211.67 0 3769.54 -4764.08
Model: Surv(next.ivl, event) ˜ age + ses + parity + (1 | id)
Fixed coefficients
     coef exp(coef) se(coef) z   p
age   -0.07960314 0.9234828 0.004165514 -19.11 0.00000
sesunknown -0.07785406 0.9250994 0.060674528 -1.28 0.20000
sesupper 0.10090487 1.1061714 0.157986684  0.64 0.52000
seslower  -0.17029644 0.8434148 0.049672825 -3.43 0.00061
parity -0.11410191 0.8921670 0.010136840 -11.26 0.00000
Random effects
 Group Variable Std Dev  Variance
 id Intercept 0.7518637 0.5652990

The estimates of the fixed effects have the same interpretation as in ordinary Cox regression. The question is whether the results point to the significance of including frailty terms. In the last line of the output we get the estimate of the frailty variance, σ2 = 0.565, but no p-value for the test of the null hypothesis H0 : σ = 0. One explanation of this is that ordinary asymptotic theory does not hold for parameter values at the boundary of the parameter space.

One way to get a feeling for the impact of the frailty effect is to fit the same model but without frailty, that is, the term (1 | id).

> fit0 <- coxreg(Surv(next.ivl, event) ˜ age + ses + parity,
     data = fe)
> fit0
Call:
coxreg(formula = Surv(next.ivl, event) ˜ age + ses + parity,
 data = fe)
Covariate   Mean  Coef Rel.Risk S.E. Wald p
age   33.865 -0.080 0.923  0.003 0.000
ses
  farmer  0.487 0  1 (reference)
 unknown  0.183 -0.085 0.919 0.030  0.005
  upper  0.018 0.147 1.158 0.083  0.077
  lower  0.312 -0.083 0.920 0.026  0.001
parity    4.434 0.013 1.013 0.007  0.068

Events     8458
Total time at risk  -21348
Max. log. likelihood -70391
LR test statistic  1834
Degrees of freedom  5
Overall p-value  0

We can compare the two “max log likelihoods,” in the frailty model the “Integrated” value −69905.17, and in the fixed effects case −70391.12. The difference is so large (485.94) that we safely can reject the hypothesis that the frailty model is not needed. As an “informal” test, you could take twice that difference and treat it as a χ2 statistic with 1 degree of freedom, calculate a p-value and take as real p-value one half of that (all this because ordinary asymptotic theory does not hold for parameter values on the boundary of the parameter space!). This gives an approximation of the true p-value that is not too bad.

As a final example, let us look at old age mortality in the R package eha. This example also shows a dangerous situation that is too easy to overlook. It has nothing to do with frailty, but with a problem caused by missing data.

Example 27 Old age mortality for siblings

Take a look at the variables in oldmort:

> data(oldmort)
> head(oldmort)
  id enter  exit event birthdate m.id f.id  sex
1 765000603 94.510 95.813 TRUE 1765.490  NA  NA female
2 765000669 94.266 95.756 TRUE 1765.734  NA  NA female
3 768000648 91.093 91.947 TRUE 1768.907  NA  NA female
4 770000562 89.009 89.593 TRUE 1770.991  NA  NA female
5 770000707 89.998 90.211 TRUE 1770.002  NA  NA female
6 771000617 88.429 89.762 TRUE 1771.571  NA  NA female
   civ ses.50 birthplace imr.birth  region
1 widow unknown remote 22.20000 rural
2 unmarried unknown parish 17.71845 industry
3 widow unknown parish 12.70903 rural
4 widow unknown parish 16.90544 industry
5 widow middle  region 11.97183 rural
6 widow unknown parish 13.08594 rural

The variable m.id is mother’s id. This means that siblings will have the same value on that variable, and we can check whether we find a “sibling effect” in the sense that siblings tend to have a similar risk of dying.

> fit <- coxme(Surv(enter, exit, event) ˜ sex + civ + (1 | m.id),
    data = oldmort)
> fit
Cox mixed-effects model fit by maximum likelihood
 Data: oldmort
 events, n = 888, 3340 (3155 observations deleted due to missing)
 Iterations= 16 84
    NULL Integrated Fitted
Log-likelihood -5702.662  -5693.43 -5646.252
    Chisq df  p  AIC BIC
Integrated loglik 18.46 4.00 1.0012e-03 10.46  -8.69
 Penalized loglik 112.82 49.23 6.7155e-07 14.36 -221.39

Model: Surv(enter, exit, event) ˜ sex + civ + (1 | m.id)
Fixed coefficients
    coef exp(coef)  se(coef)  z  p
sexfemale -0.2026325 0.8165783 0.07190559 -2.82 0.00480
civmarried -0.4653648 0.6279060 0.12144548 -3.83 0.00013
civwidow  -0.3305040 0.7185615 0.11964284 -2.76 0.00570

Random effects
 Group Variable Std Dev Variance
 m.id Intercept 0.23719204 0.05626007

Now, compare with the corresponding fixed effects model:

> fit0 <- coxreg(Surv(enter, exit, event) ˜ sex + civ,
     data = oldmort)
> fit0
Call:
coxreg(formula = Surv(enter, exit, event) ˜ sex + civ,
  data = oldmort)
Covariate   Mean  Coef Rel.Risk  S.E. Wald p
sex
   male 0.406 0  1(reference)
  female 0.594 -0.243 0.784 0.047  0.000
civ
  unmarried 0.080 0  1 (reference)
  married 0.530 -0.397 0.672 0.081 0.000
   widow 0.390 -0.261 0.770 0.079 0.001
Events    1971
Total time at risk  37824
Max. log. likelihood  -13558
LR test statistic  41.2
Degrees of freedom   3
Overall p-value   5.83367e-09

Note that we have now got a very much smaller value of the maximized log likelihood, −13557.98 compared to −5693.43! Something is wrong, and the big problem is that the two analyses were performed on different data sets. How is that possible, when we used oldmort on both occasions? The variable m.id has a lot of missing values, almost 50% are missing (NA), and the standard treatment of NA:s in R is to simply remove each record that contains an NA on any of the variables in the analysis. So, in the first case, the frailty model, a lot of records are removed before analysis, but not in the second. To be able to compare the models, we must remove all records with m.id = NA from the second analysis.

> olm <- oldmort[!is.na(oldmort$m.id),]
> fit0 <- coxreg(Surv(enter, exit, event) ˜ sex + civ,
     data = olm)
> fit0
Call:
coxreg(formula = Surv(enter, exit, event) ˜ sex + civ, data = olm)
Covariate   Mean  Coef Rel.Risk  S.E.  Wald p
sex
   male 0.418 0  1 (reference)
  female 0.582 -0.196 0.822 0.070  0.005
civ
  unmarried 0.076 0  1 (reference)
  married 0.555 -0.443 0.642 0.118  0.000
   widow 0.369 -0.310 0.733 0.116  0.007

Events    888
Total time at risk  19855
Max. log. likelihood  -5693.8
LR test statistic  17.7
Degrees of freedom   3
Overall p-value   0.000504597

This is another story! We now got very similar values of the maximized log likelihoods, −5693.81 compared to −5693.43! The conclusion is that in this case, there is no frailty effect whatsoever.

One lesson to learn from this example is that you have to be very cautious when a data set contains missing values. Some functions, like drop1, give a warning when a situation like this is detected, but especially when comparisons are made in more than on step, it is too easy to miss the dangers.

Also note the warning that is printed in the results of coxme: 3163 observations deleted due to missingness. This is a warning that should be taken seriously.

7.3 Parametric Frailty Models

It is possible utilize the connection between Poisson regression and the piecewise constant proportional hazards model discussed in Chapter 6 to fit parametric frailty models. We look at the fertility data again. The standard analysis without frailty effects is

> fit0 <- phreg(Surv(next.ivl, event) ˜ parity + ses,
    dist = “pch”, cuts = 1:13, data = fe)
> drop1(fit0, test = “Chisq”)
Single term deletions
Model:
Surv(next.ivl, event) ˜ parity + ses
  Df  AIC LRT  Pr(Chi)
<none> 29044
parity 1 29852 809.62 < 2.2e-16
ses 3 29104 65.93 3.17e-14
> fit0
Call:
phreg(formula = Surv(next.ivl, event) ˜ parity + ses, data = fe,
 dist = “pch”, cuts = 1:13)
Covariate  W.mean  Coef Exp(Coef) se(Coef) Wald p
parity   4.242 -0.131 0.878 0.005 0.000
ses
   farmer  0.491 0  1   (reference)
  unknown  0.186 0.094 1.099 0.029 0.001
   upper  0.018 0.086 1.089 0.083 0.302
   lower  0.305 -0.150 0.860 0.026 0.000
Events    8458
Total time at risk  29806
Max. log. likelihood  -14518
LR test statistic  838
Degrees of freedom   4
Overall p-value   0

For testing the presence of a random effect over women, an alternative is to use glmmML in eha.

> fe13 <- survSplit(fe, end = “next.ivl”, event = “event”,
     cut = 1:13, episode = “years”,
     start = “start”) # 1
> fe13$years <- as.factor(fe13$years) # 2
> fe13$offs <- log(fe13$next.ivl - fe13$start) # 3
> fit1 <- glmmML(event ˜ parity + ses + years + offset(offs),
     cluster = id, family = poisson, method = “ghq”,
     data = fe13, n.points = 9)
> fit1
Call: glmmML(formula = event ˜ parity + ses + years + offset(offs),
   family = poisson, data = fe13, cluster = id,
   method = “ghq”, n.points = 9)
    coef se(coef)   z Pr(>|z|)
(Intercept) -3.72891 0.090077 -41.3970 0.00e+00
parity  -0.27362 0.006041 -45.2906 0.00e+00
sesunknown  0.08574 0.065331  1.3124 1.89e-01
sesupper -0.03306 0.170860 -0.1935 8.47e-01
seslower -0.26173 0.054002 -4.8468 1.25e-06
years1  3.16821 0.085245 37.1659 0.00e+00
years2  4.61527 0.085842 53.7649 0.00e+00
years3  4.67617 0.091603 51.0483 0.00e+00
years4  4.22567 0.102653 41.1646 0.00e+00
years5  3.56732 0.127991 27.8717 0.00e+00
years6  2.99978 0.172067 17.4338 0.00e+00
years7  2.53474 0.232450 10.9045 0.00e+00
years8  1.62728 0.388357  4.1902 2.79e-05
years9  1.96967 0.370240  5.3200 1.04e-07
years10  2.30379 0.347516  6.6293 3.37e-11
years11  1.42206 0.586882  2.4231 1.54e-02
years12  1.95708 0.510188  3.8360 1.25e-04
years13  0.34816 0.584738  0.5954 5.52e-01

Scale parameter in mixing distribution: 0.8451 gaussian
Std. Error:      0.0242

   LR p-value for H_0: sigma = 0: 1.2e-277

Residual deviance: 27770 on 34699 degrees of freedom
AIC: 27810

Note the use of the function survSplit (# 1), the transformation to factor for the slices of time (years) created by survSplit (# 2), and the creation of an offset (# 3). This is described in detail in Chapter 6.

The conclusion is very much the same as with the nonparametric (coxme) approach: The clustering effect of mother is very strong and must be taken into account in the analysis of birth intervals. The nonparametric approach is easier to use and recommended.

7.4 Stratification

A simple way to eliminate the effect of clustering is to stratify on the clusters. In the birth intervals example, it would mean that intervals are only compared to other birth intervals from the same mother. The drawback with a stratified analysis is that it is not possible to estimate the effect of covariates that are constant within clusters. In the birth intervals case, it is probable that ses, socioeconomic status, would vary little within families. On the other hand, the effect of birth order or mother’s age would be suitable to analyze in a stratified setting.

> fit2 <- coxreg(Surv(next.ivl, event) ˜ parity + prev.ivl +
     strata(id), data = fe)
> fit2
Call:
coxreg(formula = Surv(next.ivl, event) ˜ parity + prev.ivl +
 strata(id), data = fe)
Covariate   Mean  Coef Rel.Risk S.E. Wald p
parity   4.434  -0.329 0.720  0.008 0.000
prev.ivl   2.578  -0.054 0.948  0.016 0.001

Events    8458
Total time at risk  -21348
Max. log. likelihood -9557.9
LR test statistic   2989
Degrees of freedom  2
Overall p-value  0

Contrast this result with an unstratified analysis.

> fit3 <- coxreg(Surv(next.ivl, event) ˜ parity + prev.ivl,
     data = fe)
> fit3
Call:
coxreg(formula = Surv(next.ivl, event) ˜ parity + prev.ivl,
  data = fe)
Covariate   Mean  Coef Rel.Risk S.E.  Wald p
parity   4.434  -0.084 0.920  0.005 0.000
prev.ivl   2.578  -0.319 0.727  0.011 0.000

Events    8458
Total time at risk  -21348
Max. log. likelihood -70391
LR test statistic   1835
Degrees of freedom  2
Overall p-value  0

Note how the effect of parity is diminished when aggregating comparison over all women, while the effect of the length of the previous interval is enlarged. Try to explain why this result is expected!

Example 28 Matched data

Under certain circumstances, it is actually possible to estimate an effect of a covariate that is constant within strata, but only if it is interacted with a covariate that is not constant within strata. See Example 30 in Chapter 9.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset