Chapter 8

A Two-Stage Longitudinal Propensity Adjustment for Analysis of Observational Data

Andrew C. Leon
Donald Hedeker
Chunshan Li

Abstract

8.1 Introduction

8.2 Longitudinal Model of Propensity for Treatment

8.3 Longitudinal Propensity-Adjusted Treatment Effectiveness Analyses

8.4 Application

8.5 Summary

Acknowledgments

References

 

Abstract

A two-stage longitudinal propensity adjustment is described for treatment effectiveness analyses of observational data. The propensity score is estimated in a mixed-effects logistic regression model. Treatment effectiveness is then examined with quintile-stratified mixed-effects grouped-time survival models. Tests of model assumptions are described, including representativeness of treatments in each quintile, treatment by quintile interaction, and balance between treatment groups. An application that evaluates the effectiveness of antidepressants is presented for illustration.

 

8.1 Introduction

Treatment effectiveness evaluations of observational data face several fundamental challenges. First, because of non-randomized treatment assignment, there is the clear possibility of non-equivalent comparison groups. This is because selection biases often play a role in the particular treatment that is received by those in need. For instance, those who seek and receive treatment are likely to be more severely ill than those who do not. As a consequence, unadjusted effectiveness analyses would likely find that untreated subjects have better outcomes. In part, this is because the factors that contribute to treatment choices are confounding variables because they are also associated with outcome. Second, a longitudinal observation study will have repeated measures on many of the subjects. Third, unlike randomized controlled clinical trials, the duration of treatment is not standardized in a research protocol but instead determined by the clinician and the patient.

Rosenbaum and Rubin (1983) proposed the propensity score as a balancing score for comparison of non-equivalent comparison groups. They defined the propensity score as the conditional probability of assignment to receive a particular treatment given a vector of observed covariates (x):

e(xi ) = P (Ti = 1 | xi),

for subject i (i = 1, … , N), for Ti, which represents treatment.

The propensity adjustment is used to reduce the impact of confounding variables on effectiveness analyses and, thus, the propensity score can reduce bias in estimates of treatment effect in an observational study (Rosenbaum and Rubin, 1983, 1984). They proposed implementation through matching, subclassification, or covariate adjustment, yet discussed caveats regarding use of the latter. Here we use subclassification, also called stratification.

 

8.2 Longitudinal Model of Propensity for Treatment

A strategy to implement the propensity for longitudinal, observational data has been proposed. It was a dynamic adaptation of the propensity adjustment for ordinal doses (Leon et al., 2001, 2003). For simplicity, here we simply consider the longitudinal propensity adjustment for binary treatment. It is a two-stage, longitudinal data analytic strategy that includes a model of propensity for treatment and a model of treatment effectiveness. In the first stage, the propensity model examines repeated observations of binary treatment over time. The model can include multiple treatment intervals per subject over time and variations in both within-subject propensity for treatment and within-subject treatment over the course of the study. The model characterizes those who either did or did not receive treatment over time based on covariates such as clinical and demographic features. In the second stage, a treatment effectiveness model examines the time from the start of each treatment until a prespecified event. In this chapter we examine the time until recurrence.

The mixed-effects framework accounts for the correlated recurrence times that represent the successive within-subject treatment intervals. A mixed-effects logistic regression model examines treatment as a function of these characteristics, whether time-invariant or time-varying:

e(xij, ui ) = P (Tij = 1 | νi, xij),

for subject i (i = 1, … , N), at time j (j = 1, … , Ji), and where ui is a subject-specific random intercept. Assuming this mixed-effects logistic model, the propensity score, which ranges in value from 0 to 1, can be expressed using the logistic response function for subject i at time j as:

e(xij,νi)=exp(α+xij'β+νi)1+exp(α+xij'β+νi),

A subject with a low propensity score presents as someone unlikely to receive treatment at time point j, whereas a subject with a high propensity score has characteristics of someone more likely to receive treatment.

 

8.3 Longitudinal Propensity-Adjusted Treatment Effectiveness Analyses

The propensity adjustment is implemented in the treatment effectiveness analyses through stratification. This is based on the assumption that treatment assignment is ignorable within a propensity score stratum. (Here we stratify into quintiles.) Stated differently, within a quintile subjects who do and do not receive treatment will not differ on the covariates (x) that are included in the propensity score. Next, we discuss a method to examine the extent to which balance between groups has been achieved with the adjustment. Based on the ignorability assumption, causal inferences can be drawn regarding direct effects of treatment, conditioned on the random intercept.

 

8.3.1 Propensity-Based Quintile Classification

Based on the distribution of thepropensity score, each observation of subject i at time j is classified into one propensity quintile, q(1) … q(5). The propensity score e(xij, νi ) comprises both time-varying and time-invariant variables. Therefore, each subject’s propensity score varies over time and, consequently, the subject’s propensity quintile could also vary in a longitudinal study. Quintile classification is conducted so that treatment effectiveness analyses can be conducted separately for each quintile. The rationale is that this approach will remove much of the confounding effect of each variable that is included in the propensity score. However, before quintile-specific analyses can be conducted, the representation of each treatment group in each quintile must be examined. If a treatment is not well-represented in a particular quintile, of course, treatment effectiveness analyses cannot be conducted for that quintile. Quintile representation is evaluated by examining cell frequencies in a propensity quintile by treatment contingency table.

 

8.3.2 Longitudinal Treatment Effectiveness Analyses

The effectiveness of treatment is examined using survival analysis methods in which the time until recurrence of disease is the dependent variable. In the example used in this chapter, the measurement of survival is ascertained in time intervals, examining whether the event occurred since the prior follow-up visit. Therefore, a grouped-time approach to survival analysis is used. Furthermore, in an effort to incorporate all data throughout follow-up, which typically involves multiple observations per subject (for example, multiple recurrences), a mixed-effects grouped-time survival analysis model is used (Hedeker et al., 2000). The model examines the probability of recurrence up to, and including, time interval t for observation j of subject i as:

Pijt = Pr(tijt)

Using a complementary log-log link function, the model is a proportional hazards regression model that describes the cumulative probability of recurrence as a function of treatment:

Pijt =1-exp(-exp(αt + x’ijβ + νi))

where αt represents the intercept term (that is, the baseline hazard), x is an indicator variable to represent the treatment group, and νi represents a random subject effect. Of note, marginal structural models, which we do not consider here, provide an alternative approach to evaluating time-varying treatments in which inverse probability weighting attempts to produce between-group balance in covariates. (See Chapter 9.)

 

8.3.3 Effectiveness Analysis Stratification

As stated earlier, the treatment effectiveness analyses are conducted separately for each propensity quintile. This is based on the work of Cochran (1968), who showed that quintile-stratification on a confounding variable removes a substantial proportion of the bias associated with that variable. The quintile-specific estimates of treatment effectiveness are then pooled to obtain one unified estimate of the treatment effect.

 

8.3.4 Pooling the Quintile-Specific Effectiveness Results

The quintile-specific results are pooled using the Mantel-Haenszel (1959) procedure as described by Fleiss (1981). Using this strategy, each quintile-specific parameter estimate is weighted by the inverse of its squared standard error and the pooled estimate is calculated as a weighted mean. The strategy assumes that there is not a treatment by propensity interaction (that is, that the treatment effect does not differ across quintiles). We assess this assumption in analyses of the pooled data set, which includes all observations from the five quintiles. A likelihood ratio test compares two mixed-effects models:

1. main effects terms for treatment and quintile (expressed with indicator variables)

2. main effects and the interaction of treatment and quintile

A significant interaction would indicate that the effect of treatment varies across propensity quintiles. In such a case, pooling of quintile-specific results is not indicated. Instead, treatment effectiveness conclusions must be reported at the quintile-specific level. In contrast, if there is no treatment by propensity interaction, the focus of the hypothesis test is the pooled treatment effect: H0: β=0.

 

8.4 Application

This application examines data from the National Institute of Mental Health Collaborative Depression Study. The study enrolled subjects with affective disorders from 1978–1981 who sought treatment for one of the major affective disorders at five medical centers in the United States (Boston, Chicago, Iowa City, New York, and St. Louis). Each subject provided written informed consent. The design and objectives of this longitudinal, observational study have been described elsewhere (Katz and Klerman, 1979). Subjects were followed with semi-annual assessments for the first five years and annual assessments for as much as 23 additional years. We examine somatic antidepressant treatment effectiveness for relapse prevention among those who recovered from unipolar major depression as defined by the Research Diagnostic Criteria (RDC) (Spitzer et al., 1978). The data that were used in the analyses reported here include up to 20 years of follow-up assessments.

 

8.4.1 Data Analytic Procedures

The analyses proceeded in two stages, as previously described. Initially, the model of propensity for treatment examined the magnitude of the association of demographic and clinical variables that were hypothesized to be associated with receiving somatic antidepressant treatment. In this example, treatment was the binary dependent variable. (An ordinal treatment intensity variable, described in Keller, 1988, and Leon et al., 2003, was dichotomized so that a treatment intensity of 0 was compared with intensities of 2, 3, and 4 combined. Intensity levels of 1 were excluded in the analyses for this application.) A mixed-effects logistic regression model was used because many subjects had multiple observations of treatment over time. A subject-specific random intercept was included in this model to account for the within-subject clustering of the repeated observations within subjects.

The second stage involved a mixed-effects grouped-time survival model of antidepressant treatment effectiveness that examined the time from the start of the course of treatment until the recurrence of an affective episode. The survival variable, time until recurrence, represented the number of consecutive weeks during which treatment status (yes/no) remained unchanged during a well period. The survival intervals could terminate in one of three ways:

1. recurrence of an affective episode

2. change in treatment status

3. end of follow-up

The latter two were classified as censored. In these analyses, it was assumed that censoring due to the end of follow-up was unrelated to time until recurrence. Subjects accrued additional survival intervals, also referred to here as treatment intervals, with each new episode and each change in treatment status while in episode. Treatment intervals were the unit of analysis for both the propensity and effectiveness models. Thus the data set included multiple observations per subject, one observation for each treatment interval. The intervals represent the distinct courses of treatment, including no treatment, that each subject received. Separate propensity scores were calculated for each of the treatment intervals.

 

8.4.2 Results

8.4.2.1 Subjects

Four hundred thirty-one subjects had major depressive disorder at study intake (Keller et al., 1992). Eighty-two of those subjects were excluded from the analyses because they developed bipolar disorder and another 46 of them did not recover from their intake episode and, therefore, were also excluded. Of those who otherwise met criteria for these analyses, 19 were excluded due to missing covariate data required for calculating a propensity score. Therefore, the analyses described here include 284 subjects with 1,319 observations.

8.4.2.2 Propensity for Treatment

The propensity models includes five variables: symptom severity (Psychiatric Status Rating, PSR, which ranges from 0-6:variable name meanpsr), symptom trajectory over the prior 8 weeks (increasing vs. decreasing vs. stable: symdec and syminc), educational status (colgrad, somecol, and highsch), age category (agelt30, age40t49, age50t59, and agege60), and study site (site1, site3, site4, and site5). (See Program 8.1.) Three of the findings in the model are highlighted. (See Output from Program 8.1.) Subjects with more severe symptoms were significantly more likely to receive treatment (odds ratio [OR]: 1.99; 95% confidence interval [CI]: 1.70-2.34). Those whose symptom severity decreased in the 8 weeks prior to the interval were marginally less likely to get treated (OR: 0.70; 95% CI: 0.49-1.02). Similarly, subjects from three of the study sites (New York, Iowa, and Chicago) were more likely to get treatment than were subjects from the other sites. Whether or not a subject received treatment, across the multiple treatment intervals, was moderately consistent over time (intraclass correlation coefficient=0.305).

Program 8.1 Computing the Propensity Score with Mixed-Effects Logistic Regression

TITLE1 'Performing Mixed-effects Logistic Regression of Propensity Model’;

/* Propensity Analysis for Binary tx   */
/* Binary LOGISTIC RANDOM-INTERCEPT MODEL */

Data data1;

  SET “c:uniint.sas7bdat”;

 

PROC NLMIXED;
PARMS b0=0 b1=0 b2=0 b3=0 b4=0 b5=0 b6=0 b7=0 b8=0 b9=0 b10=0 b11=0 b12=0 b13=0 b14=0 sd=1;
z = b0 + b1*meanpsr + b2*colgrad + b3*somecol + b4*highsch + b5*site1 + b6*site3 + b7*site4 + b8*site5 + b9*agelt30 + b10*age40t49 + b11*age50t59
         + b12*agege60 + b13*symdec + b14*syminc + sd*u;
IF (tx=0) THEN
  p = 1 – (1 / (1 + EXP(-z)));
ELSE IF (tx=1) THEN
  p = 1 / (1 + EXP(-z));
like = LOG(p);
MODEL tx ~ GENERAL(like);
RANDOM u ~ NORMAL(0,1) SUBJECT=id;
PREDICT z OUT=zest;
ESTIMATE 'ICC' sd*sd/((((ATAN(1)*4)**2)/3)+sd*sd);
RUN;

 

Output from Program 8.1

The NLMIXED Procedure

 

Observations Used 1319
Observations Not Used 0
Total Observations 1319
Subjects 284
Max Obs Per Subject 27
Parameters 16
Quadrature Points 5

 

NOTE: GCONV convergence criterion satisfied.

 

Fit Statistics

 

-2 Log Likelihood 1537.3
AIC (smaller is better) 1569.3
AICC (smaller is better) 1569.8
BIC (smaller is better) 1627.7

 

Parameter Estimates

 

Parameter Estimate Standard
Error
DF t Value Pr > |t| Alpha Lower Upper Gradient
                   
b0 -3.5115 0.4758 283 -7.38 <.0001 0.05 -4.4481 -2.5749 0.003336
b1 0.6898 0.08074 283 8.54 <.0001 0.05 0.5309 0.8487 0.008001
b2 0.6704 0.3528 283 1.90 0.0584 0.05 -0.02410 1.3649 0.000295
b3 -0.1628 0.3320 283 -0.49 0.6241 0.05 -0.8162 0.4906 -0.00037
b4 0.4327 0.3210 283 1.35 0.1788 0.05 -0.1992 1.0645 0.003104
b5 1.3591 0.4779 283 2.84 0.0048 0.05 0.4185 2.2997 0.001145
b6 0.4026 0.3673 283 1.10 0.2739 0.05 -0.3203 1.1255 -0.00018
b7 1.1222 0.3770 283 2.98 0.0032 0.05 0.3802 1.8642 0.000473
b8 1.0876 0.4245 283 2.56 0.0109 0.05 0.2520 1.9232 0.0013
b9 -0.5782 0.2439 283 -2.37 0.0184 0.05 -1.0584 -0.09803 0.000034
b10 0.3813 0.2100 283 1.82 0.0705 0.05 -0.03206 0.7946 0.002361
b11 0.2037 0.2979 283 0.68 0.4948 0.05 -0.3827 0.7900 0.000226
b12 0.5608 0.3100 283 1.81 0.0715 0.05 -0.04930 1.1710 0.00244
b13 -0.3508 0.1889 283 -1.86 0.0644 0.05 -0.7226 0.02108 0.000302
b14 0.3339 0.2110 283 1.58 0.1148 0.05 -0.08154 0.7493 0.001543
sd 1.2023 0.1390 283 8.65 <.0001 0.05 0.9287 1.4759 0.001859

 

Additional Estimates

 

Label Estimate Standard
Error
DF t Value Pr > |t| Alpha Lower Upper
                 
ICC 0.3053 0.04903 283 6.23 <.0001 0.05 0.2088 0.4018

 

The results of the propensity model (see Output from Program 8.1) were used to calculate a propensity score for each observation. Note that because time-varying covariates (that is, symptom severity and symptom trajectory) are included in the propensity score, the propensity score itself is time-varying. Each observation was then classified into one of the propensity score quintiles. (See Program 8.2.) A contingency table is displayed to evaluate the representativeness of treatments in each quintile. (See Output from Program 8.2.) It is clear that subjects in the lower quintiles were less likely to receive treatment, whereas those in higher quintiles were more likely to receive treatment. This lends some support to the validity of the model. Importantly, each treatment group is represented in each quintile and, therefore, quintile-specific treatment effectiveness analyses can be used to compare the groups.

Program 8.2 Quintile Stratification

/* Generate QUINTILE values (0 to 4) based on estimates from above model */
DATA data2 (Keep = id event tx meanpsr colgrad somecol highsch site1 site3 site6 site7 agelt30 age40t49 age50t59 agege60 symdec syminc surv2 pred ppred);
MERGE data1 zest;

/* convert to probability scale */
ppred = 1 / (1 + EXP(-pred));

PROC RANK GROUPS=5 OUT=rankout;
     VAR ppred;
     RANKS QUINT;

DATA all; SET rankout;
q1=0;q2=0;q3=0;q4=0;
if quint eq 1 then q1 = 1;
if quint eq 2 then q2 = 1;
if quint eq 3 then q3 = 1;
if quint eq 4 then q4 = 1;

/* Table of Quintile by tx */
PROC FREQ;
TABLES tx*QUINT;
RUN;

 

Output from Program 8.2

The FREQ Procedure

 

Table of tx by QUINT

 

tx(Treated yes/no)     QUINTILE(Rank for Variable ppred)

 

Image

 

8.4.3 Treatment Effectiveness Analyses

Initially, the treatment by propensity interaction was evaluated with a likelihood ratio test that compared the fit of the main effect model (-2 log likelihood = 3203.7; see Output from Program 8.5) with that of the main effects and interaction model (-2 log likelihood = 3197.3; see Output from Program 8.6). The interaction is not statistically significant (c2=6.40, df=4, p=0.171). Therefore, results from the quintile-specific treatment effectiveness analyses were pooled as described previously. The pooled results show that when subjects received somatic antidepressant therapy, they were 35% less likely to have a recurrence than when they did not receive somatic treatment (OR: exp [-0.427] = 0.65; 95% CI: 0.50-0.85; see Output from Program 8.4), after controlling for propensity for treatment.

Program 8.3 Quintile-Specific Treatment Effectiveness Analyses

/* Analysis for Grouped-Time Survival – QUINTILE-SPECIFIC ANALYSES
*/

/* Binary Complementary Log-Log RANDOM-INTERCEPT MODEL with censoring
*/
PROC SORT; BY QUINT ID;
PROC NLMIXED;
PARMS b0=0 b1=0 sd=1 t2=1 t3=1.25 t4=1.5 t5=1.75 t6=2 t7=2.25;
ODS OUTPUT ParameterEstimates=estb;
z = b0 + b1*tx + sd*u;
IF (event = 1) THEN 
DO;                                     /* event occurred */
  IF (surv2=1) THEN
    p = 1 - EXP(0 - EXP(0+z));
  ELSE IF (surv2=2) THEN
    p = (1 - EXP(0 - EXP(t2+z))) -  (1 - EXP(0 - EXP(0+z)));
  ELSE IF (surv2=3) THEN
    p = (1 - EXP(0 - EXP(t3+z))) -  (1 - EXP(0 - EXP(t2+z)));
  ELSE IF (surv2=4) THEN
    p = (1 - EXP(0 - EXP(t4+z))) -  (1 - EXP(0 - EXP(t3+z)));
  ELSE IF (surv2=5) THEN
    p = (1 - EXP(0 - EXP(t5+z))) -  (1 - EXP(0 - EXP(t4+z)));
  ELSE IF (surv2=6) THEN
    p = (1 - EXP(0 - EXP(t6+z))) -  (1 - EXP(0 - EXP(t5+z)));
  ELSE IF (surv2=7) THEN
    p = (1 - EXP(0 - EXP(t7+z))) -  (1 - EXP(0 - EXP(t6+z)));
END;
IF (event = 0) THEN 
DO;                      /* event did not occur - censored */
  IF (surv2=1) THEN
    p = 1 - (1 - EXP(0 - EXP(0+z)));
  ELSE IF (surv2=2) THEN
    p = 1 - (1 - EXP(0 - EXP(t2+z)));
  ELSE IF (surv2=3) THEN
    p = 1 - (1 - EXP(0 - EXP(t3+z)));
  ELSE IF (surv2=4) THEN
    p = 1 - (1 - EXP(0 - EXP(t4+z)));
  ELSE IF (surv2=5) THEN
    p = 1 - (1 - EXP(0 - EXP(t5+z)));
  ELSE IF (surv2=6) THEN
    p = 1 - (1 - EXP(0 - EXP(t6+z)));
  ELSE IF (surv2=7) THEN
    p = 1 - (1 - EXP(0 - EXP(t7+z)));
END;

like = LOG(p);
MODEL surv2 ~ GENERAL(like);
RANDOM u ~ NORMAL(0,1) SUBJECT=id;
ESTIMATE 'ICC' sd*sd/((((ATAN(1)*4)**2)/6)+sd*sd);
BY QUINT;
RUN;

Program 8.4 Pooling Quintile-Specific Treatment Effectiveness Results

/* Generate pooled results based on results from the above quintile-
   specific models */
DATA estw; SET estb;
w = 1 / StandardError**2;
west = Estimate*w;

PROC SORT; BY Parameter;

PROC MEANS NOPRINT; CLASS Parameter; VAR west w;
OUTPUT OUT=sums SUM = sumwest sumw;

DATA poolest; SET sums; IF _TYPE_ EQ 1;
poolest = sumwest / sumw;
poolse  = 1 / sqrt(sumw);
poolz   = poolest / poolse;
poolp   = 2*(1 - probnorm(abs(poolz)));

/* Print the pooled results */
PROC PRINT;
VAR Parameter poolest poolse poolz poolp;
RUN;

 

Output from Program 8.4

Obs Parameter poolest poolse poolz poolp
             
1 b0 -3.84229 0.23478 -16.3656 0.00000
2 b1 -0.42659 0.13324 -3.2016 0.00137
3 sd 0.13043 0.09210 1.4161 0.15674
4 t2 0.50464 0.13715 3.6795 0.00023
5 t3 0.89581 0.16553 5.4118 0.00000
6 t4 1.60004 0.19464 8.2204 0.00000
7 t5 2.45576 0.21083 11.6482 0.00000
8 t6 2.86478 0.21645 13.2354 0.00000
9 t7 4.10759 0.23911 17.1784 0.00000

Program 8.5 Treatment Effectiveness Analyses Pooled across All Subjects: Main Effects

/* Analysis for Grouped-Time Survival - all subjects – all QUINTILES
   main effects */
/* Binary Complementary Log-Log RANDOM-INTERCEPT MODEL with censoring   */
PROC NLMIXED;
PARMS b0=0 b1=0 b2=0 b3=0 b4=0 b5=0  sd=1 
      t2=1 t3=1.25 t4=1.5 t5=1.75 t6=2 t7=2.25;
z = b0 + b1*tx + b2*q1 + b3*q2 + b4*q3 + b5*q4 + sd*u;

…more SAS statements…

like = LOG(p);
MODEL surv2 ~ GENERAL(like);
RANDOM u ~ NORMAL(0,1) SUBJECT=id;
ESTIMATE 'ICC' sd*sd/((((ATAN(1)*4)**2)/6)+sd*sd);
RUN;

 

Output from Program 8.5

NOTE: GCONV convergence criterion satisfied.

 

Fit Statistics

 

-2 Log Likelihood 3203.7
AIC (smaller is better) 3229.7
AICC (smaller is better) 3230.0
BIC (smaller is better) 3277.2

Program 8.6 Treatment Effectiveness Analyses Pooled across All Subjects: Main Effects and Interactions

/* Analysis for Grouped-Time Survival - all subjects – all QUINTILES main effects and interactions */
/* Binary Complementary Log-Log RANDOM-INTERCEPT MODEL with censoring    */
PROC NLMIXED;
PARMS b0=0 b1=0 b2=0 b3=0 b4=0 b5=0 b6=0 b7=0  b8=0 b9=0 sd=1 
      t2=1 t3=1.25 t4=1.5 t5=1.75 t6=2 t7=2.25;
z = b0 + b1*tx + b2*q1 + b3*q2 + b4*q3 + b5*q4
       + b6*tx*q1 + b7*tx*q2 + b8*tx*q3 + b9*tx*q4 + sd*u;

…more SAS statements…

like = LOG(p);
MODEL surv2 ~ GENERAL(like);
RANDOM u ~ NORMAL(0,1) SUBJECT=id;
ESTIMATE 'ICC' sd*sd/((((ATAN(1)*4)**2)/6)+sd*sd);
RUN;

 

Output from Program 8.6

NOTE: GCONV convergence criterion satisfied.

 

Fit Statistics

 

-2 Log Likelihood 3197.3
AIC (smaller is better) 3231.3
AICC (smaller is better) 3231.7
BIC (smaller is better) 3293.3

 

8.4.4 Evaluating Balance across Treatment Groups

The propensity score is a balancing score that, if assumptions are fulfilled, allows for comparison of non-equivalent groups. The extent to which balance has been achieved can be examined. In an effort to parallel the propensity adjustment that has been described, balance is examined here by using a quintile-stratification strategy in each of a series of mixed-effects models for the time-varying variables that comprise the propensity score. In this case, however, the independent variable is the treatment group and the respective dependent variables in the successive models are the variables that were included in the propensity score. (Contrast this with the propensity model described previously in which the dependent variable is the treatment and the independent variables are those that are components of the propensity score.) If the primary objective of the propensity adjustment has been achieved, there would not be substantial differences between treatment groups on each variable included in the propensity score. We illustrate this for one time-varying continuous variable, symptom severity (PSR). The results show that the magnitude of the association between treatment group and symptom severity has been muted (B=.589 and p<.0001 [in Output from Program 8.7] vs. B=0.037 and p=0.425 [in Output from Program 8.8]) and therefore, in the case of symptom severity, the objective has been achieved. Note that this examination must focus not simply on whether the statistical significance of the association has been attenuated but also on the reduction in the magnitude of the bivariate association between treatment and symptom severity.

Program 8.7 Evaluating Balance of Psychiatric Status Rating (PSR) across Treatment Groups with No Propensity Adjustment

Title2 "Compare the treatment groups on PSR using unadjusted model";
PROC SORT DATA=ALL; BY ID;
RUN;
PROC MIXED data =ALL;
MODEL meanpsr = tx / S;
RANDOM INTERCEPT / SUB=id;
RUN;

 

Output from Program 8.7

Compare the treatment groups on PSR using unadjusted model

 

The Mixed Procedure

 

Solution for Fixed Effects

 

Effect Estimate Standard
Error
DF t Value Pr > |t|
           
Intercept 2.4170 0.04891 283 49.42 <.0001
tx 0.5889 0.06359 1034 9.26 <.0001

Program 8.8 Evaluating Balance of Psychiatric Status Rating across Treatment Groups with a Propensity Adjustment

Title2 "Compare the treatment groups on PSR using quintile adjustment";
/*Linear random intercept model*/
PROC SORT DATA=ALL; BY QUINT ID;

PROC MIXED data=ALL;
ODS OUTPUT SolutionF=estc1;
MODEL meanpsr = tx / S;
RANDOM INTERCEPT / SUB=id;
BY QUINT;
RUN;

/* Generate pooled results based on results from the above quintile-specific models */
DATA estw1; SET estc1;
w = 1 / StdErr**2;
west = Estimate*w;

PROC SORT; BY Effect;

PROC MEANS NOPRINT; CLASS Effect; VAR west w;
OUTPUT OUT=sums1 SUM = sumwest sumw;

DATA poolest1; SET sums1; IF _TYPE_ EQ 1;
poolest = sumwest / sumw;
poolse  = 1 / sqrt(sumw);
poolz   = poolest / poolse;
poolp   = 2*(1 - probnorm(abs(poolz)));

/* Print the pooled results */
PROC PRINT;
VAR Effect poolest poolse poolz poolp;
RUN;

 

Output from Program 8.8

Compare the treatment groups on PSR using quintile adjustment.

 

Obs Effect poolest poolse poolz poolp
         
1 Intercept 2.47631 0.044452 55.7073 0.00000
2 tx -0.03650 0.045767 -0.7976 0.42512

This approach is not necessary for time-invariant components of the propensity score (for example, ethnicity). Instead, a simpler approach (not shown here) is to compare the strength of the association of each (time-invariant) component of the propensity score (in this case, the independent variable) with treatment (in this case, the dependent variable). Results from two models, the unadjusted model and a propensity-adjusted model that includes four quintile indicator variables as covariates, can be compared (for example, see Leon et al., 2007). If balance is achieved, the propensity-adjusted models will show substantially attenuated, and presumably nonsignificant, associations.

 

8.5 Summary

A two-stage longitudinal propensity adjustment has been described for treatment effectiveness analyses of observational data. A mixed-effects logistic regression model is used to estimate the propensity for treatment, and quintile-stratified, mixed-effects, grouped-time survival models are used to estimate the treatment effect. Tests of three of the model assumptions have been described, including the representativeness of treatments in each quintile, the treatment by quintile interaction, and the balance.

Finally, we mention two topics that have not been considered here. First, the impact of model propensity misspecification on cross-sectional and longitudinal analyses has been examined in simulation studies (Drake, 1993; Leon and Hedeker, 2007b). Sensitivity analyses to evaluate propensity model misspecification have been described in detail elsewhere (Rosenbaum, 2002). Second, the sample size required for the propensity adjustment has not been discussed. The choice of sample size is guided by statistical power analyses and the N needed for stable estimates in mixed-effects models, which each have been examined in simulation studies (Leon and Hedeker, 2005; Leon et al., 2007). The sample size is also driven by the stratification process, which necessitates analyses of five quintile-specific effectiveness models, each of which contains only 20% of the observations.

 

Acknowledgments

The National Institute of Mental Health Collaborative Depression Study was conducted with current participation of the following investigators: M.B. Keller, M.D. (Chairperson, Providence); W. Coryell (Co-Chair Person, Iowa City); D.A. Solomon, M.D. (Providence); W.A. Scheftner, M.D. (Chicago); W. Coryell, M.D. (Iowa City); J. Endicott, Ph.D., A.C. Leon, Ph.D., J. Loth, M.S.W. (New York); and J. Rice, Ph.D. (St. Louis). Other current contributors include: H.S. Akiskal, M.D.; J. Fawcett, M.D.; L.L. Judd, M.D.; P.W. Lavori, Ph.D.; J.D. Maser, Ph.D.; and T.I. Mueller, M.D. This manuscript has been reviewed by the Publication Committee of the Collaborative Depression Study, and has its endorsement. The data for this manuscript came from the National Institute of Mental Health (NIMH) Collaborative Program on the Psychobiology of Depression-Clinical Studies (Katz and Klerman, 1979). The Collaborative Program was initiated in 1975 to investigate nosologic, genetic, family, prognostic, and psychosocial issues of mood disorders, and is an ongoing, long-term multidisciplinary investigation of the course of mood and related affective disorders. The original principal and co-principal investigators were from five academic centers and included Gerald Klerman, M.D. (Co-Chairperson), Martin Keller, M.D., and Robert Shapiro, M.D. (Massachusetts General Hospital, Harvard Medical School); Eli Robins, M.D., Paula Clayton, M.D., Theodore Reich, M.D., and Amos Wellner, M.D. (Washington University Medical School); Jean Endicott, Ph.D., and Robert Spitzer, M.D. (Columbia University); Nancy Andreasen, M.D., Ph.D., William Coryell, M.D., and George Winokur, M.D. (University of Iowa); and Jan Fawcett, M.D., and William Scheftner, M.D. (Rush-Presbyterian-St. Luke’s Medical Center). The NIMH Clinical Research Branch was an active collaborator in the origin and development of the Collaborative Program with Martin M. Katz, Ph.D., Branch Chief as the Co-Chairperson, and Robert Hirschfeld, M.D., as the Program Coordinator. Other past contributors include: J. Croughan, M.D.; M.T. Shea, Ph.D.; R. Gibbons, Ph.D.; M.A. Young, Ph.D.; and D.C. Clark, Ph.D.

This research was supported, in part, by NIH grants MH60447 and MH49762.

 

References

Cochran, W. G. 1968. “The effectiveness of adjustment by subclassification in removing bias in observational studies.” Biometrics 24: 295–313.

Drake, C. 1993. “Effects of misspecification of the propensity score on estimators of the treatment effect.” Biometrics 49: 1231–1236.

Fleiss, J. L. 1981. Statistical Methods for Rates and Proportions. New York: John Wiley & Sons.

Hedeker, D., O. Siddiqui, and F. B. Hu. 2000. “Random-effects regression analysis of correlated grouped-time survival data.” Statistical Methods in Medical Research 9: 161–179.

Katz, M. M., and G. L. Klerman. 1979. “Introduction: overview of the clinical studies program.” The American Journal of Psychiatry 136(1): 49–51.

Keller, M. B. 1988. “Undertreatment of major depression.” Psychopharmacology Bulletin 24(1): 75–80.

Keller, M. B., P. W. Lavori, T. I. Mueller, J. Endicott, W. Coryell, R. M. Hirschfeld, T. Shea. 1992. “Time to recovery, chronicity, and levels of psychopathology in major depression: A five-year prospective follow-up of 431 subjects. “Archives of General Psychiatry 49(10):809-816.

Leon, A. C., and D. Hedeker. 2005. “A mixed-effects quintile-stratified propensity adjustment for effectiveness analyses of ordered categorical doses.” Statistics in Medicine 24: 647–658.

Leon, A. C., and D. Hedeker. 2007a. “A comparison of mixed-effects quantile stratification propensity-adjustment strategies for longitudinal treatment effectiveness analyses of continuous outcomes.” Statistics in Medicine 26: 2650–2665.

Leon, A. C., and D. Hedeker. 2007b. “Quintile stratification based on a misspecified propensity score in longitudinal treatment effectiveness analyses of ordinal doses.” Computational Statistics and Data Analysis 51(12): 6114–6122.

Leon, A. C., D. Hedeker, and J. J. Teres. 2007. “Bias reduction in effectiveness analyses of longitudinal ordinal doses with a mixed-effects propensity adjustment.” Statistics in Medicine 26: 110–123.

Leon, A. C., T. L. Mueller, D. A. Solomon, and M. B. Keller. 2001. “A dynamic adaptation of the propensity score adjustment for effectiveness analyses of ordinal doses of treatment.” Statistics in Medicine 20(9/10): 1487–1498.

Leon, A. C., D. A. Solomon, T. I. Mueller, J. Endicott, J. P. Rice, J. D. Maser, W. Coryell, and M. B. Keller. 2003. “A 20-year longitudinal, observational study of somatic antidepressant treatment effectiveness.” The American Journal of Psychiatry 160: 727–733.

Mantel, N., and W. Haenszel. 1959. “Statistical aspects of the analysis of data from retrospective studies of disease.” Journal of the National Cancer Institute 22(4): 719–748.

Rosenbaum, P. R. 2002. Observational Studies. 2d ed. New York: Springer.

Rosenbaum, P. R., and D. B. Rubin. 1983. “The central role of the propensity score in observational studies for causal effects.” Biometrika 70(1): 41–55.

Rosenbaum, P. R., and D. B. Rubin. 1984. “Reducing bias in observational studies using subclassification on the propensity score.” Journal of the American Statistical Association 79(387): 516–24.

Spitzer, R. L., J. Endicott, and E. Robins. 1978. Research diagnostic criteria: rationale and reliability.” Archives of General Psychiatry 35(6): 773–782.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset