21. A Review of Some Recent Advances in Causal Inference (1/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

A Review of Some Recent Advances in Causal

Inference

Marloes H. Maathuis and Preetam Nandy

CONTENTS

21.1 Introduction .................................................................... 388

21.1.1 Causal versus Noncausal Research Questions ......................... 388

21.1.2 Observational versus Experimental Data ............................. 389

21.1.2.1 Observational Data ......................................... 389

21.1.2.2 Experimental Data ......................................... 389

21.1.3 Problem Formulation .................................................. 389

21.1.3.1 Outline of This Chapter .................................... 390

21.2 Estimating Causal Eﬀects When the Causal Structure Is Known ............ 390

21.2.1 Graph Terminology .................................................... 391

21.2.2 Structural Equation Model ............................................ 391

21.2.3 Postintervention Distributions and Causal Eﬀects .................... 393

21.2.3.1 Truncated Factorization Formula .......................... 393

21.2.3.2 Deﬁning the Total Eﬀect ................................... 393

21.2.3.3 Computing the Total Eﬀect ................................ 394

21.3 Causal Structure Learning ..................................................... 394

21.3.1 Constraint-Based Methods ............................................ 395

21.3.2 Score-Based Methods .................................................. 396

21.3.3 Hybrid Methods ....................................................... 397

21.3.4 Learning SEMs with Additional Restrictions ......................... 397

21.4 Estimating the Size of Causal Eﬀects When the Causal Structure Is Unknown 398

21.4.1 IDA .................................................................... 398

21.4.2 JointIDA ............................................................... 398

21.4.3 Application ............................................................ 399

21.5 Extensions ...................................................................... 400

21.5.1 Local Causal Structure Learning ...................................... 401

21.5.2 Causal Structure Learning in the Presence of Hidden Variables and

Feedback Loops ........................................................ 401

21.5.3 Time Series Data ...................................................... 402

21.5.4 Causal Structure Learning from Heterogeneous Data ................ 402

21.5.5 Covariate Adjustment ................................................. 402

21.5.6 Measures of Uncertainty ............................................... 402

21.6 Summary ....................................................................... 403

References ............................................................................. 403

387

388 Handbook of Big Data

21.1 Introduction

Causal questions are fundamental in all parts of science. Answering such questions from

observational data is notoriously diﬃcult, but there has been a lot of recent interest and

progress in this ﬁeld. This chapter gives a selective review of some of these results, intended

for researchers who are not familiar with graphical models and causality, and with a focus

on methods that are applicable to large datasets.

To clarify the problem formulation, we ﬁrst discuss the diﬀerence between causal and

noncausal questions, and between observational and experimental data. We then formulate

the problem setting and give an overview of the rest of this chapter.

21.1.1 Causal versus Noncausal Research Questions

We use a small hypothetical example to illustrate the concepts.

Example 21.1 Suppose that there is a new rehabilitation program for prisoners, aimed at

lowering the recidivism rate. Among a random sample of 1500 prisoners, 500 participated

in the program. All prisoners were followed for a period of 2 years after release from prison,

and it was recorded whether or not they were rearrested within this period. Table 21.1 shows

the (hypothetical) data. We note that the rearrest rate among the participants of the program

(20%) is signiﬁcantly lower than the rearrest rate among the nonparticipants (50%).

We can ask various questions based on these data. For example:

1. Can we predict whether a prisoner will be rearrested, based on participation in

the program (and possibly other variables)?

2. Does the program lower the rearrest rate?

3. What would the rearrest rate be if the program were compulsory for all prisoners?

Question 1 is noncausal, because it involves a standard prediction or classiﬁcation problem.

We note that this question can be very relevant in practice, for example in parole

considerations. However, because we are interested in causality here, we will not consider

questionsofthistype.

Questions 2 and 3 are causal. Question 2 asks if the program is the cause of the lower

rearrest rate among the participants. In other words, it asks about the mechanism behind the

data. Question 3 asks a prediction of the rearrest rate after some novel outside intervention

to the system, namely after making the program compulsory for all prisoners. To make such

a prediction, one needs to understand the causal structure of the system.

Example 21.2 We consider gene expression levels of yeast cells. Suppose that we want

to predict the average gene expression levels after knocking out one of the genes, or after

knocking out multiple genes at a time. These are again causal questions, because we want

to make predictions after interventions to the system.

Thus, causal questions are about the mechanism behind the data or about predictions after

a novel intervention is applied to the system. They arise in all parts of science. Application

TABLE 21.1

Hypothetical data about a rehabilitation program for prisoners.

Rearrested Not Rearrested Rearrest Rate (%)

Participants 100 400 20

Nonparticipants 500 500 50

A Review of Some Recent Advances in Causal Inference 389

areas involving big data include, for example, systems biology (e.g., [12,19,30,32,40,62]),

neuroscience (e.g., [8,20,49,58]), climate science (e.g., [16,17]), and marketing (e.g., [7]).

21.1.2 Observational versus Experimental Data

Going back to the prisoners example, which of the three posed questions can we answer? This

depends on the origin of the data, and brings us to the distinction between observational

and experimental data.

21.1.2.1 Observational Data

Suppose ﬁrst that participation in the program was voluntary. Then we would have so-called

observational data, because the subjects (prisoners) chose their own treatment (rehabilita-

tion program or not), while the researchers just observed the results. From observational

data, we can easily answer question 1. It is diﬃcult, however, to answer questions 2 and 3.

Let us ﬁrst consider question 2. Because the participants form a self-selected subgroup,

there may be many diﬀerences between the participants and the nonparticipants. For

example, the participants may be more motivated to change their lives, and this may

contribute to the diﬀerence in rearrest rates. In this case, the eﬀects of the program and

the motivation of the prisoners are said to be mixed-up or confounded.

Next, let us consider question 3. At ﬁrst sight, one may think that the answer is simply

20%, because this was the rearrest rate among the participants of the program. But again

we have to keep in mind that the participants form a self-selected subgroup that is likely to

have special characteristics. Hence, the rearrest rate of this subgroup cannot be extrapolated

to the entire prisoners population.

21.1.2.2 Experimental Data

Now suppose that it was up to the researchers to decide which prisoners participated in the

program. For example, suppose that the researchers rolled a die for each prisoner, and let

him/her participate if the outcome was 1 or 2. Then we would have a so-called randomized

controlled experiment and experimental data.

Let us look again at question 2. Because of the randomization, the motivation level of the

prisoners is likely to be similar in the two groups. Moreover, any other factors of importance

(such as social background, type of crime committed, and number of earlier crimes) are

likely to be similar in the two groups. Hence, the groups are equal in all respects, except

for participation in the program. The observed diﬀerence in rearrest rate must therefore be

due to the program. This answers question 2.

Finally, the answer to question 3 is now 20%, because the randomized treatment

assignment ensures that the participants form a representative sample of the population.

Thus, causal questions are best answered by experimental data, and we should work

with such data whenever possible. Experimental data are not always available, however,

because randomized controlled experiments can be unethical, infeasible, time consuming, or

expensive. On the other hand, observational data are often relatively cheap and abundant.

In this chapter, we therefore consider the problem of answering causal questions about

large-scale systems from observational data.

21.1.3 Problem Formulation

It is relatively straightforward to make standard predictions based on observational data

(see the observational world in Figure 21.1), or to estimate causal eﬀects from randomized

controlled experiments (see the experimental world in Figure 21.1). But we want to

390 Handbook of Big Data

Observational

data

Experimental

data

Observational

distribution

Post-intervention

distribution

Prediction/

classiﬁcation

Causal

eﬀects

Observational world

Experimental world

Causal

Assumptions

FIGURE 21.1

We want to estimate causal eﬀects from observational data. This means that we need to

move from the observational world to the experimental world. This can only be done by

imposing causal assumptions.

estimate causal eﬀects from observational data. This means that we need to move from

the observational world to the experimental world. This step is fundamentally impossible

without causal assumptions, even in the large sample limit with perfect knowledge about

the observational distribution (cf. Section 2 of [43]). In other words, causal assumptions are

needed to deduce the postintervention distribution from the observational distribution. In

this chapter, we assume that the data were generated from a (known or unknown) causal

structure that can be represented by a directed acyclic graph (DAG).

21.1.3.1 Outline of This Chapter

In the next section, we assume that the data were generated from a known DAG. In

particular, we discuss the framework of a structural equation model (SEM) and its

corresponding causal DAG. We also discuss the estimation of causal eﬀects under such

a model. In large-scale networks, however, the causal DAG is often unknown. Next, we

therefore discuss causal structure learning, that is, learning information about the causal

structure from observational data. We then combine these two parts and discuss methods

to estimate (bounds on) causal eﬀects from observational data when the causal structure is

unknown. We also illustrate this method on a yeast gene expression dataset. We close by

mentioning several extensions of the discussed work.

21.2 Estimating Causal Eﬀects When the Causal

Structure Is Known

Causal structures can be represented by graphs, where the random variables are represented

by nodes (or vertices), and causal relationships between the variables are represented by

edges between the corresponding nodes. Such causal graphs have two important practical

A Review of Some Recent Advances in Causal Inference 391

advantages. First, a causal graph provides a transparent and compact description of the

causal assumptions that are being made. This allows these assumptions to be discussed and

debated among researchers. Next, after agreeing on a causal graph, one can easily determine

causal eﬀects. In particular, we can read oﬀ from the graph which sets of variables can or

cannot be used for covariate adjustment to obtain a given causal eﬀect. We refer to [43,44]

for further details on the material in this section.

21.2.1 Graph Terminology

We consider graphs with directed edges (→)andundirected edges (−). There can be at most

one edge between any pair of distinct nodes. If all edges are directed (undirected), then the

graph is called directed (undirected). A partially directed graph can contain both directed

and undirected edges. The skeleton of a partially directed graph is the undirected graph

that results from replacing all directed edges by undirected edges.

Two nodes are adjacent if they are connected by an edge. If X → Y ,thenX is a parent

of Y . The adjacency set and the parent set of a node X in a graph G are denoted by

adj(X, G)andpa(X, G), respectively. A graph is complete if every pair of nodes is adjacent.

A path in a graph G is a distinct sequence of nodes, such that all successive pairs of

nodes in the sequence are adjacent in G.Adirected path from X to Y is a path between X

and Y in which all edges point toward Y ,thatis,X →···→Y . A directed path from X

to Y together with an edge Y → X forms a directed cycle. A directed graph is acyclic if it

does not contain directed cycles. A directed acyclic graph is also called a DAG.

AnodeX is a collider on a path if the path has two colliding arrows at X,thatis,the

path contains → X ←

.OtherwiseX is a noncollider on the path. We emphasize that the

collider status of a node is relative to a path; a node can be a collider on one path, while it

is a noncollider on another. The collider X is unshielded if the neighbors of X on the path

are not adjacent to each other in the graph, that is, the path contains W → X ← Z and

W and Z are not adjacent in the graph.

21.2.2 Structural Equation Model

We consider a collection of random variables X

,...,X

that are generated by structural

equations (see, e.g., [6,69]):

← g

, 

) i =1,...,p, (21.1)

where S

⊆{X

,...,X

}{X

} and 

is some random noise. We interpret these equations

causally, as describing how each X

is generated from the variables in S

and the noise



. Thus, changes to the variables in S

can lead to changes in X

, but not the other way

around. We use the notation ← in Equation 21.1 to emphasize this asymmetric relationship.

Moreover, we assume that the structural equations are autonomous, in the sense that we can

change one structural equation without aﬀecting the others. This will allow the modeling

of local interventions to the system.

The structural equations correspond to a directed graph G that is generated as follows:

the nodes are given by X

,...,X

, and the edges are drawn so that S

is the parent set of

, i =1,...,p. The graph G then describes the causal structure and is called the causal

graph: the presence of an edge X

→ X

means that X

is a potential direct cause of X

(i.e., X

may play a role in the generating mechanism of X

), and the absence of an edge

→ X

means that X

is deﬁnitely not a direct cause of X

(i.e., X

does not play a role

in the generating mechanism of X

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 21. A Review of Some Recent Advances in Causal Inference (1/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
21. A Review of Some Recent Advances in Causal Inference (1/5)