21
A Review of Some Recent Advances in Causal
Inference
Marloes H. Maathuis and Preetam Nandy
CONTENTS
21.1 Introduction .................................................................... 388
21.1.1 Causal versus Noncausal Research Questions ......................... 388
21.1.2 Observational versus Experimental Data ............................. 389
21.1.2.1 Observational Data ......................................... 389
21.1.2.2 Experimental Data ......................................... 389
21.1.3 Problem Formulation .................................................. 389
21.1.3.1 Outline of This Chapter .................................... 390
21.2 Estimating Causal Effects When the Causal Structure Is Known ............ 390
21.2.1 Graph Terminology .................................................... 391
21.2.2 Structural Equation Model ............................................ 391
21.2.3 Postintervention Distributions and Causal Effects .................... 393
21.2.3.1 Truncated Factorization Formula .......................... 393
21.2.3.2 Defining the Total Effect ................................... 393
21.2.3.3 Computing the Total Effect ................................ 394
21.3 Causal Structure Learning ..................................................... 394
21.3.1 Constraint-Based Methods ............................................ 395
21.3.2 Score-Based Methods .................................................. 396
21.3.3 Hybrid Methods ....................................................... 397
21.3.4 Learning SEMs with Additional Restrictions ......................... 397
21.4 Estimating the Size of Causal Effects When the Causal Structure Is Unknown 398
21.4.1 IDA .................................................................... 398
21.4.2 JointIDA ............................................................... 398
21.4.3 Application ............................................................ 399
21.5 Extensions ...................................................................... 400
21.5.1 Local Causal Structure Learning ...................................... 401
21.5.2 Causal Structure Learning in the Presence of Hidden Variables and
Feedback Loops ........................................................ 401
21.5.3 Time Series Data ...................................................... 402
21.5.4 Causal Structure Learning from Heterogeneous Data ................ 402
21.5.5 Covariate Adjustment ................................................. 402
21.5.6 Measures of Uncertainty ............................................... 402
21.6 Summary ....................................................................... 403
References ............................................................................. 403
387
388 Handbook of Big Data
21.1 Introduction
Causal questions are fundamental in all parts of science. Answering such questions from
observational data is notoriously difficult, but there has been a lot of recent interest and
progress in this field. This chapter gives a selective review of some of these results, intended
for researchers who are not familiar with graphical models and causality, and with a focus
on methods that are applicable to large datasets.
To clarify the problem formulation, we first discuss the difference between causal and
noncausal questions, and between observational and experimental data. We then formulate
the problem setting and give an overview of the rest of this chapter.
21.1.1 Causal versus Noncausal Research Questions
We use a small hypothetical example to illustrate the concepts.
Example 21.1 Suppose that there is a new rehabilitation program for prisoners, aimed at
lowering the recidivism rate. Among a random sample of 1500 prisoners, 500 participated
in the program. All prisoners were followed for a period of 2 years after release from prison,
and it was recorded whether or not they were rearrested within this period. Table 21.1 shows
the (hypothetical) data. We note that the rearrest rate among the participants of the program
(20%) is significantly lower than the rearrest rate among the nonparticipants (50%).
We can ask various questions based on these data. For example:
1. Can we predict whether a prisoner will be rearrested, based on participation in
the program (and possibly other variables)?
2. Does the program lower the rearrest rate?
3. What would the rearrest rate be if the program were compulsory for all prisoners?
Question 1 is noncausal, because it involves a standard prediction or classification problem.
We note that this question can be very relevant in practice, for example in parole
considerations. However, because we are interested in causality here, we will not consider
questionsofthistype.
Questions 2 and 3 are causal. Question 2 asks if the program is the cause of the lower
rearrest rate among the participants. In other words, it asks about the mechanism behind the
data. Question 3 asks a prediction of the rearrest rate after some novel outside intervention
to the system, namely after making the program compulsory for all prisoners. To make such
a prediction, one needs to understand the causal structure of the system.
Example 21.2 We consider gene expression levels of yeast cells. Suppose that we want
to predict the average gene expression levels after knocking out one of the genes, or after
knocking out multiple genes at a time. These are again causal questions, because we want
to make predictions after interventions to the system.
Thus, causal questions are about the mechanism behind the data or about predictions after
a novel intervention is applied to the system. They arise in all parts of science. Application
TABLE 21.1
Hypothetical data about a rehabilitation program for prisoners.
Rearrested Not Rearrested Rearrest Rate (%)
Participants 100 400 20
Nonparticipants 500 500 50
A Review of Some Recent Advances in Causal Inference 389
areas involving big data include, for example, systems biology (e.g., [12,19,30,32,40,62]),
neuroscience (e.g., [8,20,49,58]), climate science (e.g., [16,17]), and marketing (e.g., [7]).
21.1.2 Observational versus Experimental Data
Going back to the prisoners example, which of the three posed questions can we answer? This
depends on the origin of the data, and brings us to the distinction between observational
and experimental data.
21.1.2.1 Observational Data
Suppose first that participation in the program was voluntary. Then we would have so-called
observational data, because the subjects (prisoners) chose their own treatment (rehabilita-
tion program or not), while the researchers just observed the results. From observational
data, we can easily answer question 1. It is difficult, however, to answer questions 2 and 3.
Let us first consider question 2. Because the participants form a self-selected subgroup,
there may be many differences between the participants and the nonparticipants. For
example, the participants may be more motivated to change their lives, and this may
contribute to the difference in rearrest rates. In this case, the effects of the program and
the motivation of the prisoners are said to be mixed-up or confounded.
Next, let us consider question 3. At first sight, one may think that the answer is simply
20%, because this was the rearrest rate among the participants of the program. But again
we have to keep in mind that the participants form a self-selected subgroup that is likely to
have special characteristics. Hence, the rearrest rate of this subgroup cannot be extrapolated
to the entire prisoners population.
21.1.2.2 Experimental Data
Now suppose that it was up to the researchers to decide which prisoners participated in the
program. For example, suppose that the researchers rolled a die for each prisoner, and let
him/her participate if the outcome was 1 or 2. Then we would have a so-called randomized
controlled experiment and experimental data.
Let us look again at question 2. Because of the randomization, the motivation level of the
prisoners is likely to be similar in the two groups. Moreover, any other factors of importance
(such as social background, type of crime committed, and number of earlier crimes) are
likely to be similar in the two groups. Hence, the groups are equal in all respects, except
for participation in the program. The observed difference in rearrest rate must therefore be
due to the program. This answers question 2.
Finally, the answer to question 3 is now 20%, because the randomized treatment
assignment ensures that the participants form a representative sample of the population.
Thus, causal questions are best answered by experimental data, and we should work
with such data whenever possible. Experimental data are not always available, however,
because randomized controlled experiments can be unethical, infeasible, time consuming, or
expensive. On the other hand, observational data are often relatively cheap and abundant.
In this chapter, we therefore consider the problem of answering causal questions about
large-scale systems from observational data.
21.1.3 Problem Formulation
It is relatively straightforward to make standard predictions based on observational data
(see the observational world in Figure 21.1), or to estimate causal effects from randomized
controlled experiments (see the experimental world in Figure 21.1). But we want to
390 Handbook of Big Data
Observational
data
Experimental
data
Observational
distribution
Post-intervention
distribution
Prediction/
classification
Causal
effects
Observational world
Experimental world
Causal
Assumptions
FIGURE 21.1
We want to estimate causal effects from observational data. This means that we need to
move from the observational world to the experimental world. This can only be done by
imposing causal assumptions.
estimate causal eects from observational data. This means that we need to move from
the observational world to the experimental world. This step is fundamentally impossible
without causal assumptions, even in the large sample limit with perfect knowledge about
the observational distribution (cf. Section 2 of [43]). In other words, causal assumptions are
needed to deduce the postintervention distribution from the observational distribution. In
this chapter, we assume that the data were generated from a (known or unknown) causal
structure that can be represented by a directed acyclic graph (DAG).
21.1.3.1 Outline of This Chapter
In the next section, we assume that the data were generated from a known DAG. In
particular, we discuss the framework of a structural equation model (SEM) and its
corresponding causal DAG. We also discuss the estimation of causal effects under such
a model. In large-scale networks, however, the causal DAG is often unknown. Next, we
therefore discuss causal structure learning, that is, learning information about the causal
structure from observational data. We then combine these two parts and discuss methods
to estimate (bounds on) causal effects from observational data when the causal structure is
unknown. We also illustrate this method on a yeast gene expression dataset. We close by
mentioning several extensions of the discussed work.
21.2 Estimating Causal Effects When the Causal
Structure Is Known
Causal structures can be represented by graphs, where the random variables are represented
by nodes (or vertices), and causal relationships between the variables are represented by
edges between the corresponding nodes. Such causal graphs have two important practical
A Review of Some Recent Advances in Causal Inference 391
advantages. First, a causal graph provides a transparent and compact description of the
causal assumptions that are being made. This allows these assumptions to be discussed and
debated among researchers. Next, after agreeing on a causal graph, one can easily determine
causal effects. In particular, we can read off from the graph which sets of variables can or
cannot be used for covariate adjustment to obtain a given causal effect. We refer to [43,44]
for further details on the material in this section.
21.2.1 Graph Terminology
We consider graphs with directed edges ()andundirected edges (). There can be at most
one edge between any pair of distinct nodes. If all edges are directed (undirected), then the
graph is called directed (undirected). A partially directed graph can contain both directed
and undirected edges. The skeleton of a partially directed graph is the undirected graph
that results from replacing all directed edges by undirected edges.
Two nodes are adjacent if they are connected by an edge. If X Y ,thenX is a parent
of Y . The adjacency set and the parent set of a node X in a graph G are denoted by
adj(X, G)andpa(X, G), respectively. A graph is complete if every pair of nodes is adjacent.
A path in a graph G is a distinct sequence of nodes, such that all successive pairs of
nodes in the sequence are adjacent in G.Adirected path from X to Y is a path between X
and Y in which all edges point toward Y ,thatis,X ···Y . A directed path from X
to Y together with an edge Y X forms a directed cycle. A directed graph is acyclic if it
does not contain directed cycles. A directed acyclic graph is also called a DAG.
AnodeX is a collider on a path if the path has two colliding arrows at X,thatis,the
path contains X
.OtherwiseX is a noncollider on the path. We emphasize that the
collider status of a node is relative to a path; a node can be a collider on one path, while it
is a noncollider on another. The collider X is unshielded if the neighbors of X on the path
are not adjacent to each other in the graph, that is, the path contains W X Z and
W and Z are not adjacent in the graph.
21.2.2 Structural Equation Model
We consider a collection of random variables X
1
,...,X
p
that are generated by structural
equations (see, e.g., [6,69]):
X
i
g
i
(S
i
,
i
) i =1,...,p, (21.1)
where S
i
⊆{X
1
,...,X
p
}{X
i
} and
i
is some random noise. We interpret these equations
causally, as describing how each X
i
is generated from the variables in S
i
and the noise
i
. Thus, changes to the variables in S
i
can lead to changes in X
i
, but not the other way
around. We use the notation in Equation 21.1 to emphasize this asymmetric relationship.
Moreover, we assume that the structural equations are autonomous, in the sense that we can
change one structural equation without affecting the others. This will allow the modeling
of local interventions to the system.
The structural equations correspond to a directed graph G that is generated as follows:
the nodes are given by X
1
,...,X
p
, and the edges are drawn so that S
i
is the parent set of
X
i
, i =1,...,p. The graph G then describes the causal structure and is called the causal
graph: the presence of an edge X
j
X
i
means that X
j
is a potential direct cause of X
i
(i.e., X
j
may play a role in the generating mechanism of X
i
), and the absence of an edge
X
k
X
i
means that X
k
is definitely not a direct cause of X
i
(i.e., X
k
does not play a role
in the generating mechanism of X
i
).
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset