Chapter 3

Cox Regression

3.1 Introduction

We move slowly from the two-sample and k-sample cases, where the logrank test is introduced, to the general regression situation. There the logrank test is generalized to Cox regression (Cox 1972). The fundamental concept of proportional hazards is introduced.

3.2 Proportional Hazards

The property of proportional hazards is fundamental in Cox regression. It is in fact the essence of Cox’s simple yet ingenious idea. The definition is as follows.

Definition 1 Proportional hazards

If h1(t) and h0(t) are hazard functions from two separate distributions, we say that they are proportional if

h1(t)=ψh0(t),for all  t0,(3.1)

for some positive constant ψ and all t ≥ 0. Further, if (3.1) holds, then the same property holds for the corresponding cumulative hazard functions H1(t) and H0(t).

H1(t)=ψH0(t),for all  t0,(3.2)

with the same proportionality constant ψ as in (3.1).

Strictly speaking, the second part of this definition follows easily from the first (and vice versa), so more correct would be to state one part as a definition and the other as a corollary. The important part of this definition is “for all t ≥ 0”, and that the constant ψ does not depend on t. Think of the hazard functions as age-specific mortality for two groups, for example, women and men. It is “well known” that women have lower mortality than men in all ages. It would therefore be reasonable to assume proportional hazards in that case. It would mean that the female relative advantage is equally large in all ages. See Example 4.5 for a demonstration of this example of proportional hazards.

It must be emphasized that this is an assumption that always must be carefully checked. In many other situations, it would not be reasonable to assume proportional hazards. If in doubt, check data by plotting the Nelson– Aalen estimates for each group in the same figure.

See Figure 3.1 for an example with two Weibull hazard functions and the proportionality constant ψ = 2. The proportionality is often difficult to judge by eye, so in order make it easier to see, the plot can be made on a log scale, see Figure 3.2. Note that both dimensions are on a log scale. This type of plot, constructed from empirical data, is called a Weibull plot in reliability applications: If the lines are straight lines, then data are well fitted by a Weibull distribution. Additionally, if the the slope of the line is 1 (45 degrees), then an exponential model fits well.

Figure 3.1

Figure showing two hazard functions that are proportional. the proportionality constant is ψ = 2.

Two hazard functions that are proportional. The proportionality constant is ψ = 2.

To summarize Figure 3.2: (i) The hazard functions are proportional because on the log-log scale, the vertical distance is constant, (ii) Both hazard functions represent a Weibull distribution, because both lines are straight lines, and (iii) neither represents an exponential distribution, because the slopes are not one. This latter fact may be difficult to see because of the different scales on the axes.

Figure 3.2

Figure showing two hazard functions that are proportional are on a constant distance from each other on a log-log scale. the proportionality constant is ψ = 2, corresponding to a vertical distance of log(2) ≈ 0.693.

Two hazard functions that are proportional are on a constant distance from each other on a log-log scale. The proportionality constant is ψ = 2, corresponding to a vertical distance of log(2) ≈ 0.693.

3.3 The Log-Rank Test

The log-rank test is a k-sample test of equality of survival functions. We first look at the two-sample case, that is, k = 2.

3.3.1 Two Samples

Suppose that we have the small data set illustrated in Figure 3.3. There are two samples, the letters (A, B, C, D, E) and the numbers (1, 2, 3, 4, 5). We are interested in investigating whether letters and numbers have the same survival chances or not. Therefore, the hypothesis

H0 : No difference in survival between numbers and letters

Figure 3.3

Figure showing two-sample data, the letters (dashed) and the numbers (solid). circles denote censored observations, plusses uncensored.

Two-sample data, the letters (dashed) and the numbers (solid). Circles denote censored observations, plusses uncensored.

is formulated. In order to test H0, we make five tables, one for each observed event time (see Table 3.1). Let us look at the first table at failure time t(1), that is, the first three rows in Table 3.1, from the viewpoint of the numbers.

Table 3.1

Five 2 × 2 tables. Each table reports the situation at one observed failure time point, i.e., at t(1), ..., t(5)

Time

Group

Deaths

Survivals

Total

t(1)

numbers

1

4

5

letters

1

4

5

Total

2

8

10

t(2)

numbers

0

3

3

letters

1

2

3

Total

1

5

6

t(3)

numbers

1

1

2

letters

0

2

2

Total

1

3

4

t(4)

numbers

0

1

1

letters

1

1

2

Total

1

2

3

t(5)

numbers

1

0

1

letters

0

1

1

Total

1

1

2

  • The observed number of deaths among numbers: 1.
  • The expected number of deaths among numbers: 2 × 5/10 = 1.

The expected number is calculated under H0, that is, as if there is no difference between letters and numbers. It is further assumed that the two margins are given (fixed). Then, given two deaths in total and five out often observations are from the group numbers, the expected number of deaths is calculated as above.

This procedure is repeated for each of the five tables, and the results are summarized in Table 3.2. Finally, the observed test statistic T is calculated as

T=0.221.410.028

Table 3.2

Observed and expected number of deaths for numbers at the five observed event times

Time

Observed

Expected

Difference

Variance

t(1)

1

1.0

0.0

0.44

t(2)

0

0.5

−0.5

0.25

t(3)

1

0.5

0.5

0.25

t(4)

0

0.3

−0.3

0.22

t(5)

1

0.5

0.5

0.25

Sum

3

2.8

0.2

1.41

Under the null hypothesis, this is an observed value from a χ2(1) distribution, and H0 should be rejected for large values of T. Using a level of significance of 5%, the cutting point for the value of T is 3.84, far from our observed value of 1.41. The conclusion is therefore that there is no (statistically significant) difference in survival chances between letters and numbers. Note, however, that this result depends on asymptotic (large sample) properties, and in this toy example, these properties are not valid.

For more detail about the underlying theory, see Appendix A.

In R, the log-rank test is performed by the coxph function in the package survival (there are other options). With the data in our example, see Table 3.3, we get:

> library(survival)
> ex.data <- read.table(“exlr.dat”, header = TRUE)
> fit <- coxph(Surv(time, event) ˜ group, data = ex.data)
> summary(fit)
Call:
coxph(formula = Surv(time, event) ˜ group, data = ex.data)
 n= 10, number of events= 6

   coef exp(coef) se(coef) z Pr(>|z|)
groupnumbers 0.1130 1.1196  0.8231 0.137 0.891

    exp(coef) exp(-coef) lower .95 upper .95
groupnumbers  1.12 0.8931 0.2231 5.619

Concordance= 0.481 (se = 0.15)
Rsquare= 0.002  (max possible= 0.85)
Likelihood ratio test= 0.02 on 1 df,  p=0.8908
Wald test   = 0.02 on 1 df,  p=0.8908
Score (logrank) test = 0.02 on 1 df,  p=0.8907

Table 3.3

Data in a simple example

group

time

event

numbers

4.0

1

numbers

2.0

0

numbers

6.0

1

numbers

1.0

1

numbers

3.5

0

letters

5.0

1

letters

3.0

1

letters

6.0

0

letters

1.0

1

letters

2.5

0

Obviously, there is a lot of information here. We have, in fact, performed a Cox regression ahead of schedule! This also tells us that the simplest way of performing a logrank test in R is to run a Cox regression. The result of the logrank test is displayed on the last line of output. The p-value is 0.891; that is, there is no significant difference between groups whatsoever. This is not very surprising, since we are using a toy example with almost no information, only six events in total.

Let us now look at a real data example, the old age mortality data set oldmort in eha.

> require(eha) # Loads ‘survival’ as well.
> data(oldmort)
> fit <- coxph(Surv(enter, exit, event) ˜ sex, data = oldmort)
> summary(fit)
Call:
coxph(formula =  Surv(enter, exit, event) ˜ sex, data = oldmort)
 n= 6495, number of events= 1971

    coef exp(coef) se(coef)  z Pr(>|z|)
sexfemale -0.1930  0.8245  0.0456 -4.232 2.32e-05

  exp(coef) exp(-coef) lower .95 upper .95
sexfemale 0.8245  1.213 0.754 0.9016

Concordance= 0.532 (se = 0.007)
Rsquare= 0.003  (max possible= 0.985)
Likelihood ratio test  = 17.73 on 1 df, p=2.545e-05
Wald test   = 17.91 on 1 df, p=2.32e-05
Score (logrank) test = 17.96 on 1 df, p=2.254e-05

As in the trivial example, the last line of the output gives the p-value of the logrank test, p = 0.0000225 (in “exponential notation” in the output). This is very small, and we can conclude that the mortality difference between women and men is highly statistically significant. But how large is it, that is, what is the value of the proportionality constant? The answer to that question is found five rows up, where sexfemale reports the value 0.8245 for exp(Coef). The interpretation of this is that female mortality is about 82.45 per cent of the male mortality, that is, 17.55 per cent lower. This is for ages above 60, because this data set contains only information in that range.

Remember that this result depends on the proportional hazards assumption. We can graphically check it as follows.

> ##plot(Surv(oldmort$enter, oldmort$exit, oldmort$event),
>  ##  strata = oldmort$sex, fn = “cum”)
> with(oldmort, plot(Surv(enter, exit, event), strata = sex))

First, note that the out-commented (##) version works equally well, but it is more tedious to type, and also harder to read. The with function is very handy and will be used often throughout the book. Note also that the grouping factor (sex) is given through the argument strata. The result is shown in Figure 3.4. The proportionality assumption seems to be a good description from 60 to 85–90 years of age, but it seems more doubtful in the very high ages. One reason for this may be that the high-age estimates are based on few observations (most of the individuals in the sample died earlier), so random fluctuations have a large impact in the high ages.

Figure 3.4

Figure showing old age mortality, women vs. men, cumulative hazards.

Old age mortality, women vs. men, cumulative hazards.

3.3.2 Several Samples

The result for the two-sample case is easily extended to the k-sample case. Instead of one 2 × 2 table per observed event time, we get one k × 2 table per observed event time and we have to calculate expected and observed numbers of events for (k − 1) groups at each failure time. The resulting test statistic will have (k − 1) degrees of freedom and still be approximately χ2 distributed. This is illustrated with the same data set, oldmort, as above, but with the covariate birthplace, which is a factor with three levels, instead of sex.

> require(eha) # Loads ‘survival’ as well.
> data(oldmort)
> fit <- coxph(Surv(enter, exit, event) ˜ birthplace,
    data = oldmort)
> summary(fit)

Call:
coxph(formula = Surv(enter, exit, event) ˜ birthplace,
  data = oldmort)
 n= 6495, number of events= 1971

    coef exp(coef) se(coef) z Pr(>|z|)
birthplaceregion 0.0594 1.0612  0.0552 1.076  0.2819
birthplaceremote 0.1008 1.1060  0.0595 1.694  0.0903

    exp(coef) exp(-coef) lower .95 upper .95
birthplaceregion 1.061 0.9423 0.9524 1.182
birthplaceremote 1.106 0.9041 0.9843 1.243

Concordance= 0.507 (se = 0.007)
Rsquare= 0.001  (max possible= 0.985)
Likelihood ratio test= 3.25 on 2 df,  p=0.1968
Wald test   = 3.28 on 2 df,  p=0.1941
Score (logrank) test = 3.28 on 2 df,  p=0.1939

The degree of freedom for the score test is now 2, equal to the number of levels in birthplace (parish, region, remote) minus one. The birthplace thus does not seem to have any great impact on old age mortality. It is, however, recommended to check the proportionality assumption graphically; see Figure 3.5.

> with(oldmort, plot(Surv(enter, exit, event), strata = birthplace))

Figure 3.5

Figure showing old age mortality by birthplace, cumulative hazards.

Old age mortality by birthplace, cumulative hazards.

There is obviously nothing that indicates nonproportionality in this case either.

We do not go deeper into this matter here, mainly because the logrank test generally is a special case of Cox regression, which will be described in detail later in this chapter. There are, however, two special cases of the logrank test, the weighted and stratified versions, that may be of interest on their own. The interested reader may consult the textbooks by Kalbfleisch & Prentice (2002, pp. 22–23) or Collett (2003, pp. 49–53).

3.4 Proportional Hazards in Continuous Time

The definition in Section 3.2 is valid for models in continuous time. Figure 3.6 shows the relationships between the cumulative hazards functions, the density functions, and the survival functions when the hazard functions are proportional. Note that the cumulative hazards functions are proportional by implication, with the same proportionality constant (ψ = 2 in this case). On the other hand, for the density and survival functions, proportionality does not hold; it is, in fact, theoretically impossible except in the trivial case that the proportionality constant is unity.

Figure 3.6

Figure showing the effect of proportional hazards on the density and survival functions.

The effect of proportional hazards on the density and survival functions.

3.4.1 Proportional Hazards, Two Groups

The definition of proportionality, given in equation (3.1), can equivalently be written as

hx(t)=ψxh0(t),t>0, x=0,1, ψ>0.(3.3)

It is easy to see that this “trick” is equivalent to (3.1): When x = 0 it simply says that h0(t) = h0(t), and when x = 1 it says that h1(t) = ψh0(t). Since ψ > 0, we can calculate β = log(ψ), and rewrite (3.3) as

hx(t)=eβxh0(t),t>0; x=0,1; <β<,

or with a slight change in notation,

h(t;x)=eβxh0(t),t>0; x=0, 1; <β<.(3.4)

The sole idea of this rewriting is to pave the way for the introduction of Cox’s regression model (Cox 1972), which in its elementary form is a proportional hazards model. In fact, we can already interpret equation (3.4) as a Cox regression model with an explanatory variable x with corresponding regression coefficient (to be estimated from data) β. The covariate x is still only a dichotomous variate, but we will now show how it is possible to generalize this to a situation with explanatory variables of any form. The first step is to go from the two-sample situation to a k-sample one.

3.4.2 Proportional Hazards, More Than Two Groups

Can we generalize to (k + 1) groups, k ≥ 2? Yes, by expanding the procedure in the previous subsection:

h0(t)   ~   group  0h1(t)   ~   group  1        hk(t)   ~   group  k

The underlying model is: hj(t) = ψjh0(t), t ≥ 0, j = 1, 2,..., k. That is, with (k + 1) groups, we need k proportionality constants ψ1, ..., ψk in order to define proportional hazards. Note also that in this formulation (there are others), one group is “marked” as a reference group, that is, to this group there is no proportionality constant attached. All relations are relative to the reference group. Note also that it essentially doesn’t matter which group is chosen as the reference. This choice doesn’t change the model itself, only its representation.

With (k + 1) groups, we need k indicators. Let

x=(x1,x2,...,xk).

Then

x=(0,0,...,0)      in group  0x=(1,0,...,0)      in group  1x=(0,1,...,0)      in group  2                   x=(0,0,...,1)      in group  k

and

h(t;x)=h0(t)=1kψx={ h0(t)x=(0,0,...,0)h0(t)ψj,xj=1,   j=1,...k

With ψj=eβj, j=1,...,k, we get

h(t;x)=h0(t)ex1β1+x2β2++xkβk=h0(t)exβ,(3.5)

where β=(β1,β2,...,βk).

3.4.3 The General Proportional Hazards Regression Model

We may now generalize (3.5) by letting the components of xi take any value. Let data and model take the following form:

Data: (ti0, ti, di, xi), i = 1,..., n, where

  • ti0 is the left truncation time point. If yi0 = 0 for all i, then this variable may be omitted.
  • ti is the end time point.
  • di is the “event indicator” (1 or TRUE if event, else 0 or FALSE).
  • xi is a vector of explanatory variables.

Model:

h(t;xi)=h0(t)exiβ,t>0.(3.6)

This is a regression model where

  • the response variable is (t0, t, d) (we will call it a survival object)
  • and the explanatory variable is x, possibly (often) vector valued.

In equation (3.6) there are two components to estimate, the regression coefficients β, and the baseline hazard function h0(t), t > 0. For the former task, the partial likelihood (Cox 1975) is used. See Appendix A for a brief summary.

3.5 Estimation of the Baseline Hazard

The usual estimator (continuous time) of the baseline cumulative hazard function is

H^0(t)=j:tjtdjRjexβ^,(3.7)

where dj is the number of events at tj. Note that if β^=0, this reduces to

H^0(t)=j:tjtdjnj,(3.8)

the Nelson–Aalen estimator. In (3.8), nj is the size of Rj.

In the R package eha, the baseline hazard is estimated at the means of the covariates (or, more precisely, at the means of the columns of the design matrix; this makes a difference for factors).

H^0(t)=j:tjtdjRje(xx¯)β^,(3.9)

In order to calculate the cumulative hazards function for an individual with a specific covariate vector x, use the formula

H^(t;x)=H^0(t)e(xx¯)β^.(3.10)

The corresponding survival functions may be estimated by the relation

S^(t;x)=exp(H^(t;x))

It is also possible to use the terms in the sum (3.7) to build an estimator analogous to the Kaplan–Meier estimator (3.8). In practice, there is no big difference between the two methods.

3.6 Explanatory Variables

Explanatory variables, or covariates, may be of essentially two different types, continuous and discrete. The discrete type usually takes only a finite number of distinct values and is often called a factor, for example, in R. A special case of a factor is one that takes only two distinct values, say 0 and 1. Such a factor is called an indicator, because we can let the value 1 indicate the presence of a certain property and 0 denote its absence. To summarize, there is

Covariate: taking values in an interval (e.g., age, blood pressure).

Factor: taking a finite number of values (e.g., civil status, occupation).

Indicator: a factor taking two values (e.g., gender).

3.6.1 Continuous Covariates

We use the qualifier continuous to stress that factors are excluded, because often the term covariate is used as a synonym for explanatory variable.

Values taken by a continuous covariate are ordered. The effect on the response is by model definition ordered in the same or reverse order. On the other hand, values taken by a factor are unordered (but may be defined as ordered in R).

3.6.2 Factor Covariates

An explanatory variable that can take only a finite (usually small) number of distinct values is called a categorical variable. In R language, it is called a factor. Examples of such variables are gender, socioeconomic status, and birth place. Students of statistics have long been taught to create dummy variables in such situations, in the following way:

  • Given a categorical variable F with (k+1) levels (f0, f1, f2, ... fk) (k + 1 levels),
  • Create k indicator (“dummy”) variables (I1, I2, ... Ik).

F

I1

I2

I3

...

Ik−1

Ik

Comment

f0

0

0

0

...

0

0

reference

f1

1

0

0

...

0

0

f2

0

1

0

...

0

0

f3

0

0

1

...

0

0

...

...

...

...

...

...

...

fk-1

0

0

0

...

1

0

fk

0

0

0

...

0

1

The level f0 is the reference category, characterized by that all indicator variables are zero for an individual with this value. Generally, for the level, fi, i = 1,..., k, the indicator variable Ii is one, and the rest are zero. In other words, for a single individual, at most one indicator is one, and the rest are zero.

In R, there is no need to explicitly create dummy variables; it is done behind the scene by the functions factor and as.factor.

Note that a factor with two levels, that is, an indicator variable, can always be treated as a continuous covariate, if coded numerically (e.g., 0 and 1).

Example 10 Infant mortality and mother’s age

Consider a demographic example, the influence of mother’s age (a continuous covariate) on infant mortality. It is considered wellknown that a young mother means high risk for the infant, and also that old mother means high risk, compared to “in-between-aged” mothers. So the risk order is not the same (or reverse) as the age order.

One solution to this problem is to factorize: Let, for instance,

mothers age={ low,15<age25middle,25<age35high,35<age(3.11)

In this layout, there will be two parameters measuring the deviation from the reference category, which will be the first category by default.

In R, this is easily achieved with the aid of the cut function. It works like this:

> age <- rnorm(100, 30, 6)
> age.group <- cut(age, c(15, 25, 35, 48))
> summary(age.group)
(15,25] (25,35] (35,48]  NA’s
 18  64  17  1

Note that the created intervals by default are closed to the right and open to the left. This has consequences for how observations exactly on a boundary are treated; they belong to the lower-valued interval. The argument right in the call to cut can be used switch this behaviour the other way around.

Note further that values falling below the smallest value (15 in our example) or above the largest value (48) are reported as missing values (NA in R terminology, Not Available).

For further information about the use of the cut function, see the help page.

3.7 Interactions

The meaning of interaction between two explanatory variables is described by an example, walking through the three possible combinations: two factors, one factor and one continuous covariate, and two covariates.

Example 11 Mortality in lung cancer

There are four possible explanatory variables behind survival after a diagnosis of lung cancer in the example:

      smoking habits={ smokernonsmoker(eα1)          tumor size={ no tumorsmall tumor(eβ1)large tumor(eβ2)     age at diagnosis  (x1)                    (eγ1x1)tumor size​  in mm  (x2)                    (eγ1x1)

The two first are factors, and the last two are continuous covariates. We go through the three distinct combinations and illustrate the meaning of interaction in each.

3.7.1 Two Factors

The concept of interaction is easiest to explain with two factors; see Table 3.4. With no interaction, the model is called multiplicative. The reason is that with no interactions, the (relative) effect of smoking is the same for each level of tumor size, exp(α), and vice versa. The model with interaction is sometimes called saturated, and the reason is that each combination of the two levels has its own parameter, except for the reference cell, to which all other effects are compared.

Table 3.4

The role of interaction, two factors

Tumor size

no tumor

small

large

No interaction

Smoking

nonsmoker

1

eβ1

eβ2

habits

smoker

eα1

eα1+β1

eα1+β2

With interaction

Smoking

nonsmoker

1

eβ1

eβ2

habits

smoker

eα1

eα1+β1+θ11

eα1+β2+θ12

In this example, two extra parameters, θ11 and θ12 measure interaction. If both are zero, we get the multiplicative model in the upper part of Table 3.4.

3.7.2 One Factor and one Continuous Covariate

With one covariate and one factor, the interpretation of no interaction is that the “slope” (coefficient connected to the covariate) is the same for each level of the factor, it is only the “intercept” that varies. With interaction, the slopes also varies between levels of the factor. In Table 3.5, the parameters θ11 and θ12 measure the strength and direction of the interaction between the two covariates.

Table 3.5

The role of interaction, one factor and one continuous covariate

Tumor size

no tumor

small

large

No interaction

age at diagnosis

eγ1x1

eβ1+γ1x1

eβ2+γ1x1

With interaction

age at diagnosis

eγ1x1

eβ1+(γ1+θ11)x1

eβ2+(γ1+θ12)x1

Table 3.6

The role of interaction, two continuous covariates

Tumor size in mm

No interaction

age at diagnosis

eγ1x1+γ2x2

With interaction

age at diagnosis

eγ1x1+γ2x2+θ12x1x2

3.7.3 Two Continuous Covariates

With two continuous covariates, the interaction term is constructed through a new covariate which is the product of the two. The main effects have a specific interpretation in case of the presence of an interaction; they measure the effect of one covariate when the other covariate is zero. Note that this fact will have severe consequences if zero (0) is not included in the natural range of the covariate(s). For instance, in the examples in this book, calendar year (e.g., birthdate) is often present as a potential covariate. If that variable is used “as is” in an interaction, its reference value will come from the year 0 (zero)! This is an extreme extrapolation when studying nineteenth century mortality. The way around this is to “center” the covariate by subtracting a value in its natural range.

3.8 Interpretation of Parameter Estimates

In the proportional hazards model the parameter estimates are logarithms of risk ratios relative to the baseline hazard. The precise interpretations of the coefficients for the two types of covariates are discussed. The conclusion is that eβ has a more direct and intuitive interpretation than β itself.

3.8.1 Continuous Covariate

If x is a continuous covariate, and h(t;x)=h0(t)eβx, then

h(t;x+1)h(t;x)=h0(t)eβ(x+1)h0(t)eβx=eβ.

so the risk increases with a factor eβ, when x is increased by one unit. In other words, eβ is a relative risk (or a hazard ratio, which often is preferred in certain professions).

3.8.2 Factor

For a factor covariate, in the usual coding with a reference category, eβ is the relative risk compared to the reference category.

3.9 Proportional Hazards in Discrete Time

In discrete time, the hazard function is, as we saw earlier, a set of (conditional) probabilities, and so its range is restricted to the interval (0, 1). Therefore, the definition of proportional hazards used for continuous time is unpractical; the multiplication of a probability by a constant may easily result in a quantity larger than one.

One way of introducing proportional hazards in discrete time is to regard the discreteness as a result of grouping true continuous time data, for which the proportional hazards assumption hold. For instance, in a follow-up study of human mortality, we may only have data recorded once a year, and so life length can only be measured in years. Thus, we assume that there is a true exact life length T, but we can only observe that it falls in an interval (ti, ti+1).

Assume continuous proportional hazards, and a partition of time:

0=t0<t1<t2<<tk=.

Then

P(tj1T<tj|Ttj1;x)=S(tj1|x)S(tj|x)S(tj1|x)             =1S(tj|x)S(tj1|x)=1(S0(tj)S0(tj1))exp(βx)                              =1(1hi)exp(βx)         (3.12)

with hi=P(tj1T<tj|Ttj1;x=0), j=1,...,k. We take (3.12) as the definition of proportional hazards in discrete time.

3.9.1 Logistic Regression

It turns out that a proportional hazards model in discrete time, according to definition (3.12), is nothing else than a logistic regression model with the cloglog link (cloglog is short for “complementary log-log” or βx = log(−log(p))). In order to see that, let

(1hj)=exp(exp(αj)),  j=1,...,k

and

Xj={ 1,tj1T<tj0,otherwise,j=1,...,k

Then

                       P(X1=1;  x)=1exp(exp(α1+βx))P(Xj=1|X1==Xj1=0;  x)=1exp(exp(αj+βx)),                   j=2,...,k.

This is logistic regression with a cloglog link. Note that extra parameters α1,..., αk are introduced, one for each potential event age. They correspond to the baseline hazard function in continuous time, and are estimated simultaneously with the regression parameters.

3.10 Model Selection

In regression modeling, there is often several competing models for describing data. In general, there are no strict rules for “correct selection.” However, for nested models, there are some formal guidelines. For a precise definition of this concept, see Appendix A.

3.10.1 Model Selection in General

Some general advice regarding model selection is given here.

  • Remember, there are no true models, only some useful ones. This statement is due to G.E.P. Box.
  • More than one model may be useful.
  • Keep important covariates in the model.
  • Avoid automatic stepwise procedures!
  • If interaction effects are present, the corresponding main effects must be there.

3.11 Male Mortality

We utilize the male mortality data, Skellefteå 1800–1820, to illustrate the aspects of an ordinary Cox regression. Males aged 20 (exact) are sampled between 1820 and 1840 and followed to death or reaching the age of 40, when they are right censored. So, in effect, male mortality in the ages 20–40 are studied. The survival time starts counting at zero for each individual at his 21st birthdate (i.e., exactly at age 20 years).

There are two R functions that are handy for a quick look at a data frame, str and head. The first give a summary description of the structure of an R object, often a data frame :

> data(mort) #Loads the data frame ‘mort’
> str(mort)
‘data.frame’:   1208 obs. of 6 variables:
$ id   : int 1 2 3 3 4 5 5 6 7 7 ...
$ enter : num 0 3.48 0 13.46 0 ...
$ exit  : num 20 17.6 13.5 20 20 ...
$ event : int 0 1 0 0 0 0 0 0 0 1 ...
$ birthdate : num 1800 1800 1800 1800 1800 ...
$ ses  : Factor w/ 2 levels “lower”,“upper”: 2 1 2 1 1 1 2..

First, there is information about the data frame: It is a data.frame, with six variables measured on 1208 objects. Then each variable is individually described: name, type, and a few of the first values. The values are usually rounded to a few digits. The Factor line is worth noticing: It is, of course, a factor covariate, it takes two levels, lower and upper. Internally, the levels are coded 1, 2, respectively. Then (by default), lower (with internal code equal to 1) is the reference category. If desired, this can be changed by the relevel function.

> mort$ses2 <- relevel(mort$ses, ref = “upper”)
> str(mort)
‘data.frame’:   1208 obs. of 7 variables:
$ id  : int 1 2 3 3 4 5 5 6 7 7 ...
$ enter : num 0 3.48 0 13.46 0 ...
$ exit : num 20 17.6 13.5 20 20 ...
$ event : int 0 1 0 0 0 0 0 0 0 1 ...
$ birthdate: num 1800 1800 1800 1800 1800 ...
$ ses  : Factor w/ 2 levels “lower”,“upper”: 2 1 2 1 1 1 2..
$ ses2 : Factor w/ 2 levels “upper”,“lower”: 1 2 1 2 2 2 1..

Comparing the information about the variables ses and ses2, you find that the codings, and also the reference categories, are reversed. Otherwise the variables are identical, and any analysis involving any of them as an explanatory variable would lead to identical conclusions (but not identical parameter estimates, see below).

The function head prints the first few lines (observations) of a data frame (there is also a corresponding tail function that prints a few of the last rows).

> head(mort)
 id enter  exit event birthdate  ses ses2
1 1 0.000 20.000 0 1800.010 upper upper
2 2 3.478 17.562 1 1800.015 lower lower
3 3 0.000 13.463 0 1800.031 upper upper
4 3 13.463 20.000 0 1800.031 lower lower
5 4 0.000 20.000 0 1800.064 lower lower
6 5 0.000 0.089 0 1800.084 lower lower

Apparently, the variables ses and ses2 are identical, but we know that behind the scenes they are formally different; different coding and (therefore) different reference category. This shows up in analyses involving the two variables, one at a time.

First, the analysis is run with ses.

> res <- coxreg(Surv(enter, exit, event) ˜ ses + birthdate,
    data = mort)
> res
Call:
coxreg(formula = Surv(enter, exit, event) ˜ ses + birthdate,
 data = mort)
Covariate   Mean  Coef Rel.Risk  S.E. Wald p
ses
   lower 0.416 0  1 (reference)
   upper 0.584 -0.484 0.616 0.121 0.000
birthdate   1811.640 0.029 1.029 0.011 0.008

Events    276
Total time at risk 17038
Max. log. likelihood -1841.7
LR test statistic 23.1
Degrees of freedom 2
Overall p-value  9.72167e-06

Then the variable ses2 is inserted instead.

> res2 <- coxreg(Surv(enter, exit, event) ˜ ses2 + birthdate,
     data = mort)
> res2
Call:
coxreg(formula = Surv(enter, exit, event) ˜ ses2 + birthdate,
 data = mort)
Covariate   Mean  Coef Rel.Risk  S.E. Wald p
ses2
   upper 0.584 0  1 (reference)
   lower 0.416 0.484 1.623 0.121 0.000
birthdate   1811.640 0.029 1.029 0.011 0.008
Events    276
Total time at risk   17038
Max. log. likelihood -1841.7
LR test statistic   23.1
Degrees of freedom  2
Overall p-value  9.72167e-06

Notice the difference; it is only formal. The substantial results are completely equivalent.

3.11.1 Likelihood Ratio Test

If we judge the result by the Wald p-values, it seems as if both variables are highly significant. To be sure, it is advisable to apply the likelihood ratio test. In R, this is easily achieved by applying the drop1 function on the result, res, of the Cox regression.

> drop1(res, test = “Chisq”)
Single term deletions
Model:
Surv(enter, exit, event) ˜ ses + birthdate
  Df AIC LRT  Pr(Chi)
<none>  3687.3
ses   1 3701.4 16.1095 5.978e-05
birthdate 1 3692.6 7.2752 0.006991

In fact, we have performed two likelihood ratio tests here. First, the effect of removing ses from the full model. The p-value for this test is very small, p = 0.00005998. The scientific notation, 5.998e-05, means that the decimal point should be moved five steps to the left (that’s the minus sign). The reason for this notation is essentially to save space. Then, the second LR test concerns the absence or presence of birthdate in the full model. This also gives a small p-value, p = 0.006972.

The earlier conclusion is not altered; both variables are highly significant.

3.11.2 The Estimated Baseline Cumulative Hazard Function

The estimated baseline cumulative function (see (3.7)) is best plotted by the plot command.

> plot(res)

The result is shown in Figure 3.7 This figure shows the baseline cumulative hazard function for an average individual (i.e., with the value zero on all covariates when all covariates are centered around their weighted means).

Figure 3.7

Figure showing estimated cumulative baseline hazard function.

Estimated cumulative baseline hazard function.

3.11.3 Interaction

The introduction of the interaction effect is done as follows.

> res2 <- coxreg(Surv(enter, exit, event) ˜ ses * birthdate,
     data = mort)
> res2
Call:
coxreg(formula = Surv(enter, exit, event) ˜ ses * birthdate,
 data = mort)
Covariate   Mean  Coef Rel.Risk  S.E. Wald p
ses
   lower 0.416  0   1(reference)
   upper 0.584 -53.514 0.000  39.767  0.178
birthdate   1811.640  0.016 1.016 0.014  0.252
ses:birthdate
  upper:     0.029 1.030 0.022  0.182

Events     276
Total time at risk  17038
Max. log. likelihood  -1840.8
LR test statistic  24.9
Degrees of freedom  3
Overall p-value   1.64202e-05

Note that main effects mean nothing in isolation when involved in an interaction. For instance, if the covariate birthdate is centered,

> birthdate.c <- mort$birthdate - 1810
> res3 <- coxreg(Surv(enter, exit, event) ˜ ses * birthdate.c,
    data = mort)
> res3
Call:
coxreg(formula = Surv(enter, exit, event) ˜ ses * birthdate.c,
 data = mort)
Covariate   Mean  Coef Rel.Risk  S.E. Wald p
ses
   lower 0.416 0  1 (reference)
   upper 0.584 -0.558 0.572 0.134 0.000
birthdate.c  1.640 0.016 1.016 0.014 0.252
ses:birthdate.c
  upper:     0.029 1.030 0.022 0.182
Events    276
Total time at risk  17038
Max. log. likelihood  -1840.8
LR test statistic  24.9
Degrees of freedom   3
Overall p-value   1.64202e-05

the main effects are different. Note the use of the function I; it is the identity function, that is, it returns its argument unaltered. The reason for using it here is that it is the (only) way to avoid a mix-up with the formula notation. The drawback with this is that the printed variable name is too long; an alternative is to first create a new variable.

> mort$bth1810 <- mort$birthdate - 1810
> res3 <- coxreg(Surv(enter, exit, event) ˜ ses * bth1810,
    data = mort)
> res3
Call:
coxreg(formula = Surv(enter, exit, event) ˜ ses * bth1810,
  data = mort)
Covariate   Mean  Coef Rel.Risk  S.E.  Wald p
ses
  lower 0.416 0  1(reference)
  upper 0.584 -0.558 0.572 0.134 0.000
bth1810    1.640 0.016 1.016 0.014 0.252
ses:bth1810
  upper:     0.029 1.030 0.022 0.182
Events    276
Total timeatrisk   17038
Max. log. likelihood  -1840.8
LR test statistic  24.9
Degrees of freedom   3
Overall p-value   1.64202e-05

The likelihood ratio test of interaction becomes

> drop1(res2, test = “Chisq”)
Single term deletions
Model:
Surv(enter, exit, event) ˜ ses * birthdate
   Df AIC LRT Pr(Chi)
<none>   3687.5
ses:birthdate 1 3687.3 1.7899 0.1809

with non-centered birthdate, and

> drop1(res3, test = “Chisq”)
Single term deletions
Model:
Surv(enter, exit, event) ˜ ses * bth1810
   Df AIC LRT Pr(Chi)
<none>  3687.5
ses:bth1810 1 3687.3 1.7899 0.1809

with centered birthdate. The results are identical, so the interaction effect is unaffected by centering.

So, there are two things to be noted: (i) The results with centered or non-centered birthdate are the same, and (ii) the main effects are not tested. Such tests would be (almost) meaningless in the presence of an interaction effect.

In Figure 3.8 the effect of birthdate (centered at 1810) on mortality is shown for the two levels of ses. The two curves seem to converge over time, meaning that the socioeconomic differences in male mortality shrink over time. On the other hand mortality increases over time in both groups.

Figure 3.8

Figure showing effect of birthdate (centered at 1810) on mortality, upper class (upper curve) and lower class (lower curve). model with interaction.

Effect of birthdate (centered at 1810) on mortality, upper class (upper curve) and lower class (lower curve). Model with interaction.

Note that the reference point is at the intersection of the year 1810 (e.g., at 1 January 1810, 00:00) and the curve corresponding to the ses category lower (marked by a circle in the figure).

In Figure 3.9 the effect of birthdate (not centered) on mortality is shown for the two levels of ses. Here the reference point is at year 0 (which doesn’t even exist!), in the intersection with the curve corresponding to the lower class. That is the reason for the ridiculously large numbers on the y-axis, and the extremely large absolute value of the coefficient for the main effect of upper: It measures the extrapolated, hypothetical, difference between lower and upper at year zero (1 January, 00:00). This is, of course, nonsense, and the background for the earlier statement that main effects mean nothing when corresponding interaction effects are in the model.

Figure 3.9

Figure showing effect of birthdate (not centered) on mortality, upper class (upper curve) and lower class (lower curve). model with interaction.

Effect of birthdate (not centered) on mortality, upper class (upper curve) and lower class (lower curve). Model with interaction.

That said, given proper centering of continuous covariates, the main effects do mean something. It is the effect when the value of the other covariate is kept at zero, or at its reference category, if it is a factor. But then, anyway, it may be argued that the term main effect is quite misleading.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset