396 Handbook of Big Data
between them is removed. Next, for each pair of nodes (X
i
,X
j
)thatarestilladjacent,
it tests conditional independence of the corresponding random variables given all possible
subsets of size 1 of adj(X
i
, G
∗
) {X
j
} and of adj(X
j
, G
∗
) {X
i
},whereG
∗
is the current
graph. Again, it removes the edge if such a conditional independence is deemed to be
true. The algorithm continues in this way, considering conditioning sets of increasing size,
until the size of the conditioning sets is larger than the size of the adjacency sets of the
nodes.
This procedure gives the correct skeleton when using perfect conditional independence
information. To see this, note that at any point in the procedure, the current graph is a
supergraph of the skeleton of the CPDAG. By construction of the algorithm, this ensures
that X
i
⊥⊥ X
j
|pa(X
i
, G)andX
i
⊥⊥ X
j
|pa(X
j
, G) were assessed.
After applying certain edge orientation rules, the output of the PC algorithm is a
partially directed graph, the estimated CPDAG. This output depends on the ordering
of the variables (except in the limit of an infinite sample size), because the ordering
determines which conditional independence tests are done. This issue was studied in [14],
where it was shown that the order-dependence can be very severe in high-dimensional
settings with many variables and a small sample size (see Section 21.4.3 for a data
example). Moreover, an order-independent version of the PC algorithm, called PC-stable,
was proposed in [14]. This version is now the default implementation in the R-package
pcalg [29].
We note that the user has to specify a significance level α for the conditional
independence tests. Because of multiple testing, this parameter does not play the role of an
overall significance level. It should rather be viewed as a tuning parameter for the algorithm,
where smaller values of α typically lead to sparser graphs.
The PC and PC-stable algorithms are computationally feasible for sparse graphs with
thousands of variables. Both PC and PC-stable were shown to be consistent in sparse high-
dimensional settings, when the joint distribution is multivariate Gaussian and conditional
independence is assessed by testing for zero partial correlation [14,28]. By using Rank
correlation, consistency can be achieved in sparse high-dimensional settings for a broader
class of Gaussian copula or nonparanormal models [21].
21.3.2 Score-Based Methods
Score-based methods learn the CPDAG by (greedily) searching for an optimally scoring
DAG, where the score measures how well the data fit to the DAG, while penalizing the
complexity of the DAG.
A prominent example of such an algorithm is the greedy equivalence search (GES)
algorithm [10]. GES is a grow–shrink algorithm that consists of two phases: a forward
phase and a backward phase. The forward phase starts with an initial estimate (often the
empty graph) of the CPDAG, and sequentially adds single edges, each time choosing the
edge addition that yields the maximum improvement of the score, until the score can no
longer be improved. The backward phase starts with the output of the forward phase, and
sequentially deletes single edges, each time choosing the edge deletion that yields a maximum
improvement of the score, until the score can no longer be improved. A computational
advantage of GES over the traditional DAG-search methods is that it searches over the
space of all possible CPDAGs, instead of over the space of all possible DAGs.
The GES algorithm requires the scoring criterion to be score equivalent, meaning
that every DAG in a Markov equivalence class gets the same score. Moreover, the
choice of scoring criterion is crucial for computational and statistical performances. The
so-called decomposability property of a scoring criterion allows fast updates of scores during
the forward and the backward phase. For example, (penalized) log-likelihood scores are