376 Handbook of Big Data
respect the knowledge encoded in the statistical model and not introduce new assumptions.
An estimator should be selected for analysis based on its performance (e.g., bias, variance,
robustness) as opposed to convenience or habit.
Common Pitfall: Confusing Estimation Methods with the Causal
Parameters
Causal models and causal parameters help to specify a statistical estimation problem
(i.e., the observed data, statistical model, and estimand) that is optimally informed by
background knowledge and aims to answer the underlying scientific or policy question.
However, there is nothing causal about the estimation step. A given estimand can be
estimated in many different ways, and alternative algorithms can be compared simply
based on their statistical properties, such as bias and variance. For example, (working)
marginal structural models are often used to define a target counterfactual parameter
equal, under needed causal assumptions, to a specific estimand. This estimand can
be estimated with inverse probability weights [31,53], regression of the outcome on
exposure and confounders, or double robust efficient methods [3,54]. There is nothing
more or less causal about these estimators.
20.8 Interpretation of the Results
The last step of the roadmap is interpreting the results. In our running example, the
identifiability assumptions did not hold. Nonetheless, the statistical estimand (Equa-
tion 20.3) always has a statistical interpretation as the difference in the expected
outcome, given the exposure and covariates in the adjustment set, and the expected
outcome, given the control and covariates in the adjustment set, standardized with
respect to the covariate distribution in the population. For our example, Ψ(P
0
)can
be interpreted as the marginal risk difference: the difference in the mortality risk
among patients with early versus delayed ART initiation but the same values of the
measured covariates (e.g., baseline CD4 count, age, and sex), averaged with respect
to the distribution of these covariates. This estimand can be considered as the best
approximation to the causal quantity of interest, given the limitations in the observed
data. If the identifiability assumptions hold, our estimate would be endowed with a
causal interpretation: a summary of how the distribution of the data would change
under a specific intervention. For our example, the causal interpretation would be the
difference in the 5-year counterfactual mortality risk if all patients initiated early ART
versus if all patients delayed ART initiation. Further interpretation in terms of the
impact of a real-world intervention or in terms of a randomized trial requires additional
assumptions.
Common Pitfall: Lack of Identifiability Is Different from Statistical Bias
During the identifiability step, we advocate that a clear distinction be made between
assumptions based on knowledge, encoded in the structural causal model M
F
,andthose
Tutorial for Causal Inference 377
based on convenience M
F∗
. This delineation emphasizes that the estimand may not
equal the causal parameter. The discrepancy depends on unmeasured quantities and
nontestable assumptions. In other words, the needed assumptions cannot be evaluated
statistically using the observed data alone [1]. Nonetheless, sensitivity analyses can help
in evaluating the potential magnitude of the deviations between the causal parameter
and the statistical estimand [55–58]. By contrast, the statistical bias of an estimator is a
statistical concept, characterizing how an estimator performs on average across multiple
repetitions the experiment. Statistical bias can be evaluated through simulations and
minimized with data-driven techniques.
20.9 Conclusion
In this chapter, we introduced a formal framework for causal inference [3,4]. Our
running example was to estimate the effect of early ART initiation (within 1 month
of diagnosis) on 5-year mortality risk among HIV+ adults in Sub-Saharan Africa. Our
structural causal model M
F
only reflected the causal ordering of our variables; we
did not make any exclusion restrictions, independence assumptions, or functional form
assumptions. Counterfactual outcomes were generated by deterministically intervening on
the data generating system, described by the structural causal model, to set A =1
(i.e., early initiation) and also to set A = 0 (i.e., delayed initiation). We focused on
the average treatment effect for this static exposure. The observed data O =(W, A, Y )
were assumed to be generated by sampling n independent times from a probability
distribution compatible with the structural causal model M
F
, which implied a non-
parametric statistical model M. Although our identifiability assumptions did not hold,
we still defined a statistical estimand Ψ(P
0
) as a best approximation of our wished for
causal quantity. We briefly discussed a simple (parametric) substitution estimator and
a targeted substitution estimator (TMLE), which allows for data-adaptive estimation
while obtaining valid inference. Because our needed identifiability assumptions were not
met, we interpreted our estimate as the marginal difference in the mortality risk, given
early ART initiation and the measured covariates, and the mortality risk, given delayed
ART initiation and the measured covariates, standardized with respect to the covariate
distribution.
This framework is easily extended to more complicated data structures. Consider,
for example, the following scientific questions, corresponding to interventions on multiple
exposure nodes and to alternate counterfactual treatment assignment mechanisms:
Longitudinal treatment effects [31,51,53,54,59–69]: How does cumulative time until ART
initiation affect mortality among recently diagnosed HIV+ adults? What is the effect of
routine HIV viral load monitoring, compared to routine CD4+ T cell count monitoring,
on mortality among patients initiating early ART? What would be impact of early ART
initiation on the 5-year mortality if there were no losses to follow-up?
Dynamic regimes (individualized treatment rules) [22–25,61,70–72]: How would mortal-
ity have differed if HIV+ adults initiated ART based on HIV RNA viral loads as opposed
to CD4+ T cell counts?
378 Handbook of Big Data
Direct and indirect effects [73–76]: What is the direct effect of early ART initiation on
5-year mortality that is not mediated through changes in HIV RNA viral load?
Stochastic interventions (nondeteriministic interventions) [26]: What would be the
5-year mortality if the distribution of time until ART initiation shifted toward shorter
wait times? What is the impact of early ART initiation on 5-year mortality if HIV RNA
viralload,theintermediate,remainedatthe value it would have been in the absence of
the exposure (i.e., the natural direct effect [77–79])?
Overall, access to unprecedented amounts of data does not undo the age-old adage:
“correlation is not causation.” Indeed, there are numerous sources of association (depen-
dence) between two variables: direct effects, indirect effects, measured confounding, unmea-
sured confounding, and selection bias. The methods, introduced here, allow researchers to
move from saying drug X is associated with an adverse side effect to saying (under the
necessary and transparently stated assumptions) an adverse side effect is caused by drug
X. Even if the needed identifiability assumptions are not expected to hold, this framework
helps us to estimate a statistical parameter, coming as close to the wished causal parameter.
In other words, this framework ensures that the scientific question is driving the analysis
and not the other way around.
Appendix: Extensions to Multiple Time Point Interventions
As an introduction to causal inference, we focused on causal parameters corresponding to a
static intervention on a single node. In this appendix, we step through the causal roadmap
for an example of a longitudinal effect, corresponding to a multiple time point intervention.
Step 1—Specify the scientific question: What is the effect of delayed ART initiation
on patient outcomes? As before, we want to be specific about the target population:
recently diagnosed HIV+ adults in Sub-Saharan Africa. We also need to be clear about
the definition and timing of the exposures. For simplicity, let us assume that the patients
have monthly clinic visits and therefore could initiate ART or not each month. (This
framework could easily be extended to shorter or longer time intervals.) Suppose the
outcome is viral suppression after 12 months of follow-up.
Step 2—Specify the causal model: Let baseline (t = 0) be the time that the patient
is diagnosed with HIV. Let L
0
represent the vector of baseline covariates, including
sociodemographics, clinical measurements, and social constructs. Likewise, let L
t
represent
the vector of time-updated covariates (e.g., clinical measurements). Let A
t
be an indicator
that the patient initiated ART at time t. For example, A
0
= 1 represents starting ART
on the same day as diagnosis (i.e., month 0), whereas A
1
= 1 represents initiation at the
first month visit. Finally, let Y be an indicator that the patient had undetectable HIV
RNA viral load at the end of follow-up. For simplicity, let us consider only three time
points and assume complete follow-up. Our structural causal model M
F
, only reflecting
the causal ordering, is given by
Endogenous nodes: X =(L
0
,A
0
,L
1
,A
1
,Y)
Exogenous nodes: U =(U
L
0
,U
A
0
,U
L
1
,U
A
1
,U
Y
) with some true joint distribution
P
U,0
. We place no assumptions on the set of possible distributions for U.(Duringthe
identifiability step, we will need to make some independence assumptions. However,
Tutorial for Causal Inference 379
we want to keep our true knowledge, as specified by structural causal model M
F
,
separate from the additional assumptions needed for identifiability.)
Structural equations:
L
0
= f
L
0
(U
L
0
)
A
0
= f
A
0
(L
0
,U
A
0
)
L
1
= f
L
1
(L
0
,A
0
,U
L
1
)
A
1
= f
A
1
(L
0
,A
0
,L
1
,U
A
1
)
Y = f
Y
(L
0
,A
0
,L
1
,A
1
,U
Y
).
We have not made any exclusion restrictions or independence assumptions. The
corresponding directed acyclic graph is given in Figure 20.5a.
Step 3—Specify the target causal quantity:LetY (a
0
,a
1
) denote the counterfactual
outcome (viral suppression) if a patient, possibly contrary to fact, had treatment history
(a
0
,a
1
). Counterfactuals are generated by intervening on the structural causal model:
L
0
= f
L
0
(U
L
0
)
A
0
= a
0
L
1
= f
L
1
(L
0
,a
0
,U
L
1
)
A
1
= a
1
Y = f
Y
(L
0
,a
0
,L
1
,a
1
,U
Y
).
For the two binary exposures (initiate or not at time t), the set of possible exposure
combinations is A = {10, 01, 00}. For example, Y (0, 1) corresponds to preventing ART
initiation at month 0 and starting ART at the 1 month clinic visit. Suppose our goal
is to contrast expected counterfactual outcome if, possibly contrary to fact, all patients
immediately initiated ART with the the expected counterfactual outcome if, possibly
contrary to fact, all patients delayed ART initiation until 1 month after diagnosis:
Ψ
F
(P
U,X,0
)=E
U,X,0
[Y (1, 0) Y (0, 1)].
L
0
A
0
L
1
A
1
Y
U
(a)
L
0
A
0
L
1
A
1
Y
(b)
FIGURE 20.5
Directed acyclic graph corresponding to the longitudinal effect when (a) we make no inde-
pendence assumptions on background factors and (b) when we assume that the background
factors are all independent. L
0
denotes baseline covariates; A
0
denotes whether the patient
initiated ART at t =0;L
1
denotes time-updated covariates; A
1
denotes whether the
patient initiated ART at t =1;andY denotes undetectable viral load.
380 Handbook of Big Data
Step 4—Specify the observed data and its link to the causal model : The observed data
consist of n i.i.d. copies of
O =
L
0
,A
0
,L
1
,A
1
,Y
P
0
.
We assume that the observed data were generated by sampling n independent times
from a data generating process compatible with M
F
. The resulting statistical model M,
describing the possible observed data distributions, is nonparametric.
Step 5—Assess identifiability: For the purposes of discussion, suppose that the unmeasured
factors U =(U
L
0
,U
A
0
,U
L
1
,U
A
1
,U
Y
) are all independent (Figure 20.5b). Even if this
assumption held, there is not one set of covariates that simultaneously satisfy the back-
door criterion for all intervention nodes. The baseline covariates L
0
alone fail, because
there is an unblocked back-door path from Y through L
1
to A
1
. In other words, the effect
of initiation at 1 month A
1
on the outcome Y is confounded by time-updated covariates
L
1
. The baseline and time-updated covariates (L
0
,L
1
) jointly fail, because we are losing
(blocking) the effect of early ART initiation A
0
on the outcome Y that goes through the
covariates L
1
. This challenge is generally known as time-dependent confounding [27,31,48]:
time-varying covariates confound the effect of future exposures on the outcome, but are
affected by past exposures.
To identify the effects of longitudinal interventions, we consider the problem sequentially.
For each A
k
in sequence, we ask if its effect on Y can be identified by conditioning on some
subset of the observed past. This leads to the sequential randomization assumption [27]:
Y (a
0
,a
1
)
|=
A
0
L
0
and Y (a
0
,a
1
)
|=
A
1
(L
0
,A
0
,L
1
).
In words, we assume that the counterfactual outcome Y (a
0
,a
1
) is independent from the
intervention A
k
at time k, given the observed past. With the sequential randomization
assumption as well a longitudinal version of the positivity assumption, the expectation
of counterfactual outcomes, indexed by multiple interventions, can be identified by the
longitudinal G-computation formula [27]:
E
U,X,0
[Y (a
0
,a
1
)] =
l
0
,l
1
E
0
(Y |A
1
= a
1
,L
1
= l
1
,A
0
= a
0
,L
0
= l
0
)
× P
0
(L
1
= l
1
|A
0
= a
0
,L
0
= l
0
)P
0
(L
0
= l
0
)=Ψ(P
0
).
Now we are averaging with respect to the appropriate distribution of covariates and
thereby capturing the effect of both exposures (a
1
,a
0
) on the outcome Y through the
covariates (L
0
,L
1
).
Step 6—Estimation and inference: As with single time point interventions, there are
a variety of methods to estimate statistical parameters, corresponding under the
necessary assumptions to longitudinal causal effects. Examples include longitudinal
IPTW, “parametric G-computation” (maximum likelihood estimation of the longitudinal
G-computation formula), and TMLE [31,51,53,54,59–69].
Step 7—Interpretation of the Results: As with the single time point setting, the strength of
our interpretations depends on rigorous evaluation of the needed assumptions. Even when
the identifiability assumptions do not hold, then we always have a statistical interpretation
of Ψ(P
0
).
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset