20. Tutorial for Causal Inference (4/6)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

376 Handbook of Big Data

respect the knowledge encoded in the statistical model and not introduce new assumptions.

An estimator should be selected for analysis based on its performance (e.g., bias, variance,

robustness) as opposed to convenience or habit.

Common Pitfall: Confusing Estimation Methods with the Causal

Parameters

Causal models and causal parameters help to specify a statistical estimation problem

(i.e., the observed data, statistical model, and estimand) that is optimally informed by

background knowledge and aims to answer the underlying scientiﬁc or policy question.

However, there is nothing causal about the estimation step. A given estimand can be

estimated in many diﬀerent ways, and alternative algorithms can be compared simply

based on their statistical properties, such as bias and variance. For example, (working)

marginal structural models are often used to deﬁne a target counterfactual parameter

equal, under needed causal assumptions, to a speciﬁc estimand. This estimand can

be estimated with inverse probability weights [31,53], regression of the outcome on

exposure and confounders, or double robust eﬃcient methods [3,54]. There is nothing

more or less causal about these estimators.

20.8 Interpretation of the Results

The last step of the roadmap is interpreting the results. In our running example, the

identiﬁability assumptions did not hold. Nonetheless, the statistical estimand (Equa-

tion 20.3) always has a statistical interpretation as the diﬀerence in the expected

outcome, given the exposure and covariates in the adjustment set, and the expected

outcome, given the control and covariates in the adjustment set, standardized with

respect to the covariate distribution in the population. For our example, Ψ(P

)can

be interpreted as the marginal risk diﬀerence: the diﬀerence in the mortality risk

among patients with early versus delayed ART initiation but the same values of the

measured covariates (e.g., baseline CD4 count, age, and sex), averaged with respect

to the distribution of these covariates. This estimand can be considered as the best

approximation to the causal quantity of interest, given the limitations in the observed

data. If the identiﬁability assumptions hold, our estimate would be endowed with a

causal interpretation: a summary of how the distribution of the data would change

under a speciﬁc intervention. For our example, the causal interpretation would be the

diﬀerence in the 5-year counterfactual mortality risk if all patients initiated early ART

versus if all patients delayed ART initiation. Further interpretation in terms of the

impact of a real-world intervention or in terms of a randomized trial requires additional

assumptions.

Common Pitfall: Lack of Identiﬁability Is Diﬀerent from Statistical Bias

During the identiﬁability step, we advocate that a clear distinction be made between

assumptions based on knowledge, encoded in the structural causal model M

,andthose

Tutorial for Causal Inference 377

based on convenience M

F∗

. This delineation emphasizes that the estimand may not

equal the causal parameter. The discrepancy depends on unmeasured quantities and

nontestable assumptions. In other words, the needed assumptions cannot be evaluated

statistically using the observed data alone [1]. Nonetheless, sensitivity analyses can help

in evaluating the potential magnitude of the deviations between the causal parameter

and the statistical estimand [55–58]. By contrast, the statistical bias of an estimator is a

statistical concept, characterizing how an estimator performs on average across multiple

repetitions the experiment. Statistical bias can be evaluated through simulations and

minimized with data-driven techniques.

20.9 Conclusion

In this chapter, we introduced a formal framework for causal inference [3,4]. Our

running example was to estimate the eﬀect of early ART initiation (within 1 month

of diagnosis) on 5-year mortality risk among HIV+ adults in Sub-Saharan Africa. Our

structural causal model M

only reﬂected the causal ordering of our variables; we

did not make any exclusion restrictions, independence assumptions, or functional form

assumptions. Counterfactual outcomes were generated by deterministically intervening on

the data generating system, described by the structural causal model, to set A =1

(i.e., early initiation) and also to set A = 0 (i.e., delayed initiation). We focused on

the average treatment eﬀect for this static exposure. The observed data O =(W, A, Y )

were assumed to be generated by sampling n independent times from a probability

distribution compatible with the structural causal model M

, which implied a non-

parametric statistical model M. Although our identiﬁability assumptions did not hold,

we still deﬁned a statistical estimand Ψ(P

) as a best approximation of our wished for

causal quantity. We brieﬂy discussed a simple (parametric) substitution estimator and

a targeted substitution estimator (TMLE), which allows for data-adaptive estimation

while obtaining valid inference. Because our needed identiﬁability assumptions were not

met, we interpreted our estimate as the marginal diﬀerence in the mortality risk, given

early ART initiation and the measured covariates, and the mortality risk, given delayed

ART initiation and the measured covariates, standardized with respect to the covariate

distribution.

This framework is easily extended to more complicated data structures. Consider,

for example, the following scientiﬁc questions, corresponding to interventions on multiple

exposure nodes and to alternate counterfactual treatment assignment mechanisms:

• Longitudinal treatment eﬀects [31,51,53,54,59–69]: How does cumulative time until ART

initiation aﬀect mortality among recently diagnosed HIV+ adults? What is the eﬀect of

routine HIV viral load monitoring, compared to routine CD4+ T cell count monitoring,

on mortality among patients initiating early ART? What would be impact of early ART

initiation on the 5-year mortality if there were no losses to follow-up?

• Dynamic regimes (individualized treatment rules) [22–25,61,70–72]: How would mortal-

ity have diﬀered if HIV+ adults initiated ART based on HIV RNA viral loads as opposed

to CD4+ T cell counts?

378 Handbook of Big Data

• Direct and indirect eﬀects [73–76]: What is the direct eﬀect of early ART initiation on

5-year mortality that is not mediated through changes in HIV RNA viral load?

• Stochastic interventions (nondeteriministic interventions) [26]: What would be the

5-year mortality if the distribution of time until ART initiation shifted toward shorter

wait times? What is the impact of early ART initiation on 5-year mortality if HIV RNA

viralload,theintermediate,remainedatthe value it would have been in the absence of

the exposure (i.e., the natural direct eﬀect [77–79])?

Overall, access to unprecedented amounts of data does not undo the age-old adage:

“correlation is not causation.” Indeed, there are numerous sources of association (depen-

dence) between two variables: direct eﬀects, indirect eﬀects, measured confounding, unmea-

sured confounding, and selection bias. The methods, introduced here, allow researchers to

move from saying drug X is associated with an adverse side eﬀect to saying (under the

necessary and transparently stated assumptions) an adverse side eﬀect is caused by drug

X. Even if the needed identiﬁability assumptions are not expected to hold, this framework

helps us to estimate a statistical parameter, coming as close to the wished causal parameter.

In other words, this framework ensures that the scientiﬁc question is driving the analysis

and not the other way around.

Appendix: Extensions to Multiple Time Point Interventions

As an introduction to causal inference, we focused on causal parameters corresponding to a

static intervention on a single node. In this appendix, we step through the causal roadmap

for an example of a longitudinal eﬀect, corresponding to a multiple time point intervention.

Step 1—Specify the scientiﬁc question: What is the eﬀect of delayed ART initiation

on patient outcomes? As before, we want to be speciﬁc about the target population:

recently diagnosed HIV+ adults in Sub-Saharan Africa. We also need to be clear about

the deﬁnition and timing of the exposures. For simplicity, let us assume that the patients

have monthly clinic visits and therefore could initiate ART or not each month. (This

framework could easily be extended to shorter or longer time intervals.) Suppose the

outcome is viral suppression after 12 months of follow-up.

Step 2—Specify the causal model: Let baseline (t = 0) be the time that the patient

is diagnosed with HIV. Let L

represent the vector of baseline covariates, including

sociodemographics, clinical measurements, and social constructs. Likewise, let L

represent

the vector of time-updated covariates (e.g., clinical measurements). Let A

be an indicator

that the patient initiated ART at time t. For example, A

= 1 represents starting ART

on the same day as diagnosis (i.e., month 0), whereas A

= 1 represents initiation at the

ﬁrst month visit. Finally, let Y be an indicator that the patient had undetectable HIV

RNA viral load at the end of follow-up. For simplicity, let us consider only three time

points and assume complete follow-up. Our structural causal model M

, only reﬂecting

the causal ordering, is given by

Endogenous nodes: X =(L

,Y)

Exogenous nodes: U =(U

) with some true joint distribution

U,0

. We place no assumptions on the set of possible distributions for U.(Duringthe

identiﬁability step, we will need to make some independence assumptions. However,

Tutorial for Causal Inference 379

we want to keep our true knowledge, as speciﬁed by structural causal model M

separate from the additional assumptions needed for identiﬁability.)

Structural equations:

= f

)

= f

)

= f

)

= f

)

Y = f

We have not made any exclusion restrictions or independence assumptions. The

corresponding directed acyclic graph is given in Figure 20.5a.

Step 3—Specify the target causal quantity:LetY (a

) denote the counterfactual

outcome (viral suppression) if a patient, possibly contrary to fact, had treatment history

). Counterfactuals are generated by intervening on the structural causal model:

= f

)

= a

= f

)

= a

Y = f

For the two binary exposures (initiate or not at time t), the set of possible exposure

combinations is A = {10, 01, 00}. For example, Y (0, 1) corresponds to preventing ART

initiation at month 0 and starting ART at the 1 month clinic visit. Suppose our goal

is to contrast expected counterfactual outcome if, possibly contrary to fact, all patients

immediately initiated ART with the the expected counterfactual outcome if, possibly

contrary to fact, all patients delayed ART initiation until 1 month after diagnosis:

U,X,0

)=E

U,X,0

[Y (1, 0) − Y (0, 1)].

(a)

(b)

FIGURE 20.5

Directed acyclic graph corresponding to the longitudinal eﬀect when (a) we make no inde-

pendence assumptions on background factors and (b) when we assume that the background

factors are all independent. L

denotes baseline covariates; A

denotes whether the patient

initiated ART at t =0;L

denotes time-updated covariates; A

denotes whether the

patient initiated ART at t =1;andY denotes undetectable viral load.

380 Handbook of Big Data

Step 4—Specify the observed data and its link to the causal model : The observed data

consist of n i.i.d. copies of

O =





∼ P

We assume that the observed data were generated by sampling n independent times

from a data generating process compatible with M

. The resulting statistical model M,

describing the possible observed data distributions, is nonparametric.

Step 5—Assess identiﬁability: For the purposes of discussion, suppose that the unmeasured

factors U =(U

) are all independent (Figure 20.5b). Even if this

assumption held, there is not one set of covariates that simultaneously satisfy the back-

door criterion for all intervention nodes. The baseline covariates L

alone fail, because

there is an unblocked back-door path from Y through L

to A

. In other words, the eﬀect

of initiation at 1 month A

on the outcome Y is confounded by time-updated covariates

. The baseline and time-updated covariates (L

) jointly fail, because we are losing

(blocking) the eﬀect of early ART initiation A

on the outcome Y that goes through the

covariates L

. This challenge is generally known as time-dependent confounding [27,31,48]:

time-varying covariates confound the eﬀect of future exposures on the outcome, but are

aﬀected by past exposures.

To identify the eﬀects of longitudinal interventions, we consider the problem sequentially.

For each A

in sequence, we ask if its eﬀect on Y can be identiﬁed by conditioning on some

subset of the observed past. This leads to the sequential randomization assumption [27]:

Y (a

)



and Y (a

)



In words, we assume that the counterfactual outcome Y (a

) is independent from the

intervention A

at time k, given the observed past. With the sequential randomization

assumption as well a longitudinal version of the positivity assumption, the expectation

of counterfactual outcomes, indexed by multiple interventions, can be identiﬁed by the

longitudinal G-computation formula [27]:

U,X,0

[Y (a

)] =



(Y |A

= a

= l

= a

= l

)

× P

= l

= a

= l

)=Ψ(P

Now we are averaging with respect to the appropriate distribution of covariates and

thereby capturing the eﬀect of both exposures (a

) on the outcome Y through the

covariates (L

Step 6—Estimation and inference: As with single time point interventions, there are

a variety of methods to estimate statistical parameters, corresponding under the

necessary assumptions to longitudinal causal eﬀects. Examples include longitudinal

IPTW, “parametric G-computation” (maximum likelihood estimation of the longitudinal

G-computation formula), and TMLE [31,51,53,54,59–69].

Step 7—Interpretation of the Results: As with the single time point setting, the strength of

our interpretations depends on rigorous evaluation of the needed assumptions. Even when

the identiﬁability assumptions do not hold, then we always have a statistical interpretation

of Ψ(P

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 20. Tutorial for Causal Inference (4/6)

Create new playlist

Sign In

Sign Up

Table of Contents for
20. Tutorial for Causal Inference (4/6)