Chapter 8
Classical competing risks models will be discussed, as well as their modern interpretations. The basic problem is that we want to consider more than one type of event, but where exactly one will occur. For instance, consider different causes of death.
The first problem is: How can the intensities
be nonparametrically estimated? It turns out that this is quite simple. The trick is to take one cause at at time and estimate its intensity as if the rest of the causes (events) are censorings. The real problem starts when these intensities are turned into probabilities.
First we need some strict definitions, so that the correct questions are asked, and the correct answers are obtained.
The cause-specific cumulative hazard functions are
The total mortality is
and total survival is
So far, this is not controversial. But asking for “cause-specific survivor functions” is.
The quantities Γk, k = 1, 2, 3, Λ, and S can be estimated in the usual way. Γ1, the cumulative hazard function for cause 1 is estimated by regarding all other causes (2 and 3) as censorings. The total survivor function S is estimated by the method of Kaplan–Meier, regarding all causes as the same cause (just death).
Is it meaningful to estimate (calculate) Sk(t) = exp {-Γk(t)}, k = 1, 2, 3? The answer is “No.” The reason is that it is difficult to define what these probabilities mean in the presence of other causes of death. For instance, what would happen if one cause was eradicated?
It is meaningful to estimate (and calculate)
the probability to die from cause k before time t. Note that
Now, estimation is straightforward with the following estimators:
and , t > 0 is the ordinary Kaplan–Meier estimator for total mortality.
See the function comp in Section 8.6 for how to write code for this estimation.
It is also possible to include covariates in the estimating procedure.
These equations are estimated separately. In R, this is done with the package cmprsk (Fine & Gray 1999, Gray 2011).
Example 29 Mortality and Emigration
People born in Skellefteå in northern Sweden around the year 1800 are followed over time from 15 years of age to outmigration or death. Right censoring occurs for those who are still present on January 1, 1870. At exit the cause is noted; see Figure 8.2.
The data at hand looks like this:
> load(“zz.RData”)
> head(zz)
Pnr Mpnr Fpnr birthdate enter exit event
1 801000758 775000541 771000493 1801.023 15 39.47241 2
2 801000759 758000309 753000242 1801.028 15 50.00000 0
3 801000760 770000458 765000452 1801.031 15 50.00000 0
4 801000763 777000494 775000404 1801.039 15 26.10868 2
5 801000768 780000572 774000453 1801.061 15 16.43292 2
6 801000770 774000442 776000421 1801.072 15 48.59237 1
parity ab
1 3 leg
2 4 leg
3 10 leg
4 3 leg
5 2 leg
6 2 leg
> summary(zz[, -(1:3)])
birthdate enter exit event
Min. :1801 Min. :15 Min. :15.00 Min. :0.0000
1st Qu.:1835 1st Qu.:15 1st Qu.:20.47 1st Qu.:0.0000
Median :1856 Median :15 Median :28.34 Median :0.0000
Mean :1852 Mean :15 Mean :31.82 Mean :0.7138
3rd Qu.:1872 3rd Qu.:15 3rd Qu.:44.82 3rd Qu.:2.0000
Max. :1886 Max. :15 Max. :50.00 Max. :4.0000
parity ab
Min. : 1.000 leg :16423
1st Qu. : 2.000 illeg: 850
Median : 3.000
Mean : 3.916
3rd Qu. : 6.000
Max. :19.000
The variable event above is treated as a continuous variable, which is not very useful, so we make a table instead.
> table(zz$event)
0 1 2 3 4
10551 2102 4097 59 464
The variable event is coded as shown in Table 8.1.
Competing risks in Skellefteå
Code |
Meaning |
Frequency |
---|---|---|
0 |
Censored |
10543 |
1 |
Dead |
2 102 |
3 |
Moved to other parish in Sweden |
4 097 |
4 |
Moved to other Scandinavian country |
59 |
5 |
Moved outside the Scandinavian countries |
464 |
Without covariates we get the picture shown in Figure 8.3.
> source(“comp.R”)
> comp(zz$enter, zz$exit, zz$event, start.age = 15)
The code for the function comp is displayed at the end of this chapter. It is not necessary to understand before reading on. It may be useful as a template for analyzing competing risks. And with covariates and cmprsk; the syntax to get it done is a little bit nonstandard. Here, moving to other parish in Sweden is analysed (failcode = 2).
> library(cmprsk)
> xx <- model.matrix(˜ -1 + parity, zz)
> fit <- crr(zz$exit, zz$event, xx, failcode = 3)
> fit
convergence: TRUE
coefficients:
parity
0.03927
standard errors:
[1] 0.04489
two-sided p-values:
parity
0.38
Note that this is not Cox regression (but close!). The corresponding Cox regression is
> coxreg(Surv(enter, exit, event == 3) ˜ parity, data = zz)
Call:
coxreg(formula = Surv(enter, exit, event == 3) ˜ parity, data = zz)
Covariate Mean Coef Rel.Risk S.E. Wald p
parity 3.890 0.045 1.046 0.048 0.355
Events 59
Total time at risk 290569
Max. log. likelihood -548.83
LR test statistic 0.83
Degreesoffreedom 1
Overall p-value 0.363362
The code that produced Figure 8.3 is shown here.
> comp
function (enter, exit, event, start.age = 0)
{
require(eha)
n <- max(event)
rs.tot <- risksets(Surv(enter, exit, event > 0.5))
haz.tot <- rs.tot$n.events/rs.tot$size
n.times <- length(haz.tot) + 1
S <- numeric(n.times)
S[1] <- 1
for (i in 2:n.times) S[i] <- S[i - 1] * (1 - haz.tot[i - 1])
haz <- matrix(0, nrow = n, ncol = length(haz.tot))
P <- matrix(0, nrow = n, ncol = length(haz.tot) + 1)
for (row in 1:n) {
rs <- risksets(Surv(enter, exit, event == row))
haz.row <- rs$n.events/rs$size
tmp <- 0
cols <- which(rs.tot$risktimes %in% rs$risktimes)
haz[row, cols] <- haz.row
P[row, 2:NCOL(P)] <- cumsum(S[1:(n.times - 1)] * haz[row,
])
}
plot(c(start.age, rs.tot$risktimes), S, ylim = c(0, 1),
xlim = c(start.age,
max(rs.tot$risktimes)), xlab = “Age”, type = “s”,
ylab = “Probability”)
for (i in 1:n) lines(c(start.age, rs.tot$risktimes), P[i,
], lty = i + 1, type = “s”)
abline(h = 0)
legend(35, 0.95, lty = 1:(n + 1), legend = c(“Survival”,
“Dead”, “Other parish”, “Scandinavia”,
“Outside Scandinavia”))
invisible(list(P = P, S = S))
}