Variable elimination

Let's try to do some inference tasks over the restaurant network in Fig 3.1. Let's say we want to find P(C). We know the following from the chain rule of probability:

Variable elimination

Also, we know that the random variables L and Q are independent of each other if C is not observed. So, we can write the preceding equation as follows:

Variable elimination

Now, we can see that we know the probability values involved in the product for the computation of P(C). We have the values of Variable elimination from the CPD of C, the values of P(l) from the CPD of L, and the values of P(q) from the CPD of Q. Summing up the product of these probabilities, we can easily find the probability of C.

We can also note that the computational cost for this computation would be Variable elimination, where Variable elimination represents the number of states of the variable X. We can see that in order to compute the probability of each state of C, we need to compute the product for each combination of states L and Q, and then add them together. This means that for each state of C, we have Variable elimination products and Variable elimination additions. Here, 2 appears in the number of products because there are two product operations in the equation. Also, we need to do this computation Variable elimination times for each state of C.

Now, let's take the example of another simple model Variable elimination and try to find P(D). We can find P(D) simply as follows:

Variable elimination

However, we can see that the complexity of computing the values of P(D) is now Variable elimination, and for much more complex models, our complexity will be too high. Now, let's see how we can use the concept of dynamic programming to avoid computing the same values multiple times and to reduce our complexity. To see the scope of using dynamic programming in this problem, let's first simply unroll the summation and check what values we are computing. For simplicity, we will assume that each of the variables has only two states. Unrolling the summation, we get the following:

Variable elimination = Variable elimination

Variable elimination

Variable elimination

Variable elimination

Variable elimination

Variable elimination

Variable elimination

Variable elimination

Variable elimination Variable elimination

Variable elimination

Variable elimination

Variable elimination

Variable elimination

Variable elimination

Variable elimination

Variable elimination

To calculate P(D), we must calculate Variable elimination and Variable elimination separately. After unrolling the summations, we can see that we have many computations that we have been doing multiple times if we take the simple linear approach. In the preceding equations, we can see that we have computed Variable elimination four times, Variable elimination four times, and so on. Let's first group these computations together:

Variable elimination Variable elimination

Variable elimination

Variable elimination

Variable elimination

Variable elimination Variable elimination

Variable elimination

Variable elimination

Variable elimination

Now, replace these with symbols that we will compute only once and use them everywhere. Replacing Variable elimination with Variable elimination and Variable elimination with Variable elimination, we get:

Variable elimination Variable elimination

Variable elimination

Variable elimination

Variable elimination

Variable elimination Variable elimination

Variable elimination

Variable elimination

Variable elimination

Again, grouping common parts together, we get:

Variable elimination Variable elimination

Variable elimination

Variable elimination Variable elimination

Variable elimination

Now, replacing Variable elimination with Variable elimination and replacing Variable elimination with Variable elimination, we get:

Variable elimination Variable elimination

Variable elimination

Variable elimination Variable elimination

Variable elimination

Notice how, instead of doing a summation over the complete product of Variable elimination here, we did the summation over parts of it:

Variable elimination
Variable elimination

Here, we have been able to push the summations inside the equation because not all the terms in the equation have all the variables. So, only the terms P(a) and P(b|a) depend on A. So we can simply sum them over the values of A.

To make things clearer, let's see another run of variable elimination on the restaurant model:

Step

Variable eliminated

Factors involved

Intermediate factor

New factor

1

L

Variable elimination

Variable elimination

Variable elimination

2

Q

Variable elimination

Variable elimination

Variable elimination

3

C

Variable elimination

Variable elimination

Variable elimination

4

N

Variable elimination

Variable elimination

Variable elimination

This method helps us to significantly reduce the computation required to compute the probabilities. In this case, we just need to compute Variable elimination, which requires two multiplications and two additions, and Variable elimination, which requires four multiplications and two additions. We can then compute P(D). Hence, we just need a total of 12 computations to compute P(D). However, in the case of computing P(D) from the joint probability distribution, we require 16 * 3 = 48 multiplications and 14 additions. Hence, we see that using variable elimination brings about huge improvement in the complexity of making the inference.

Now, let's see how to make the inference using variable elimination with pgmpy:

In [1]: from pgmpy.models import BayesianModel
In [2]: from pgmpy.inference import VariableElimination
In [3]: from pgmpy.factors import TabularCPD

# Now first create the model.
In [3]: restaurant = BayesianModel(
                               [('location', 'cost'),
                                ('quality', 'cost'), 
                                ('cost', 'no_of_people'), 
                                ('location', 'no_of_people')])
In [4]: cpd_location = TabularCPD('location', 2, [[0.6, 0.4]])
In [5]: cpd_quality = TabularCPD('quality', 3, [[0.3, 0.5, 0.2]])
In [6]: cpd_cost = TabularCPD('cost', 2, 
                              [[0.8, 0.6, 0.1, 0.6, 0.6, 0.05], 
                               [0.2, 0.1, 0.9, 0.4, 0.4, 0.95]], 
                               ['location', 'quality'], [2, 3])
In [7]: cpd_no_of_people = TabularCPD(
                            'no_of_people', 2,
                            [[0.6, 0.8, 0.1, 0.6], 
                             [0.4, 0.2, 0.9, 0.4]],
                             ['cost', 'location'], [2, 2])
In [8]: restaurant.add_cpds(cpd_location, cpd_quality, 
                            cpd_cost, cpd_no_of_people)

# Creating the inference object of the model
In [9]: restaurant_inference = VariableElimination(restaurant)

# Doing simple queries over one or multiple variables.
In [10]: restaurant_inference.query(variables=['location'])
Out[10]: {'location': <Factor representing phi(location:2) at 
                                                0x7fea25e02898>}
In [11]: restaurant_inference.query(
                         variables=['location', 'no_of_people'])
Out[11]: {'location': <Factor representing phi(location:2) at
                                                0x7fea25e02b00>, 
          'no_of_people': <Factor representing phi(no_of_people:2)  
                                             at 0x7fea25e026a0>}

# We can also specify the order in which the variables are to be 
# eliminated. If not specified pgmpy automatically computes the 
# best possible elimination order.
In [12]: restaurant_inference.query(variables=['no_of_people'],
               elimination_order=['location', 'cost',  'quality'])
Out[12]: {'no_of_people': <Factor representing phi(no_of_people:2) 
                                               at 0x7fea25e02160>

We saw the case of making an inference when no condition was given. Now, let's take a case where some evidence is given; let's say we know that the cost of the restaurant is high and we want to compute the probability of the number of people in the restaurant. Basically, we want to compute Variable elimination. For this, we could simply use the probability theory and first compute Variable elimination and then normalize this over N again to get Variable elimination and then compute Variable elimination as:

Variable elimination

Now, the question is, how do we compute Variable elimination? We first reduce all the factors involving C to Variable elimination and then do our normal variable elimination.

In pgmpy, we can simply pass another argument to the query method for evidence. Let's see how to find Variable elimination using pgmpy:

# If we have some evidence for the network we can simply pass it 
# as an argument to the query method in the form of 
# {variable: state}
In [13]: restaurant_inference.query(variables=['no_of_people'],
                                    evidence={'location': 1})
Out[13]: {'no_of_people': <Factor representing phi(no_of_people:2) 
                                               at 0x7fea25e02588>}
In [14]: restaurant_inference.query(
                      variables=['no_of_people'], 
                      evidence={'location': 1, 'quality': 1})
Out[14]: {'no_of_people': <Factor representing phi(no_of_people:2) 
                                               at 0x7fea25e02d30>}
# In this case also we can simply pass the elimination order for 
# the variables.
In [15]:restaurant_inference.query(
                      variables=['no_of_people'],
                      evidence={'location': 1},
                      elimination_order=['quality', 'cost'])
Out[15]: {'no_of_people': <Factor representing phi(no_of_people:2) 
                                               at 0x7fea25e02eb8>}

Analysis of variable elimination

We have already seen that variable elimination is much more efficient for calculating probability distributions than normalizing and marginalizing the joint probability distribution. Now, let's do an exact analysis to find the complexity of variable elimination.

Let's start by putting the variable elimination algorithm in simple terms. In variable elimination, we start by choosing a variable Analysis of variable elimination, then we compute the factor product Analysis of variable elimination for all the factors involving that variable, and then eliminate that variable by summing it up, resulting in a new factor Analysis of variable elimination whose scope is Analysis of variable elimination.

Now, let's consider that we have a network with n variables and m factors. In the case of a Bayesian network, the number of CPDs will always be equal to the number of variables, therefore, m = n for a Bayesian network. However, in the case of a Markov network, the number of factors can be more than the number of variables in the network. For simplicity, let's assume that we will be eliminating all the variables in the network.

In variable elimination, we have been performing just two types of operations, multiplication and addition. So, to find the overall complexity, let's start by counting these operations. For the multiplication step, we multiply each of the initial m factors and the intermediately formed n factors exactly once. So, the total number of multiplication steps would be Analysis of variable elimination, where Analysis of variable elimination is the size of the intermediate factor Analysis of variable elimination. Also, let's define Analysis of variable elimination. Therefore, Analysis of variable elimination will always be true. Now, if we calculate the total number of addition operations, we will be iterating over each of the Analysis of variable elimination once, resulting in a total of Analysis of variable elimination addition operations. So, we see that the total number of operations for Variable Elimination comes out to be Analysis of variable elimination. Hence, the complexity of the overall operation is Analysis of variable elimination, because Analysis of variable elimination.

Now, let's try to analyze the variable elimination algorithm using the graph structure. We can treat a Bayesian model as a Markov model having undirected edges between all the variables in each of the CPD defining the parameters of the network. Now, let's try to see what happens when we run variable elimination over this network. We choose any variable X and multiply all other factors involving that factor X, resulting in a factor Analysis of variable elimination with the scope Analysis of variable elimination. After this, we eliminate the variable X and have a resulting factor Analysis of variable elimination with scope Analysis of variable elimination Analysis of variable elimination. Now, as we have a factor with scope Y, we need to have edges in the network between each of the variables in Y. So, we add extra edges to the network, which are known as fill edges. For the elimination of the next variable, we use this new network structure and perform similar operations on it.

Let's see an example on our late-for-school model showing the graph structure during the various steps of variable elimination:

Analysis of variable elimination

Fig 3.2(a): Initial state of the network

Fig 3.2(b): After eliminating Traffic Accident (A)

Analysis of variable elimination

Fig 3.2(c): After eliminating Heavy Rain (R)

Fig 3.2(d): After eliminating Traffic Jam (J)

Analysis of variable elimination

Fig 3.2(e): After eliminating Long Queues (Q)

Fig 3.2(f): After eliminating Getting Up Late (G)

An induced graph is also defined as the undirected graph constructed by the union of all the graphs formed in each step of variable elimination on the network. Fig 3.3 shows the induced graph for the preceding variable elimination:

Analysis of variable elimination

Fig 3.3: The induced graph formed by running the preceding variable elimination

We can also check the induced graph using pgmpy:

In [16]: induced_graph = restaurant_inference.induced_graph(
                 ['cost', 'location', 'no_of_people', 'quality'])
In [17]: induced_graph.nodes()
Out[17]: ['location', 'quality', 'cost', 'no_of_people']
In [18]: induced_graph.edges()
Out[18]: 
[('location', 'quality'), 
 ('location', 'cost'), 
 ('location', 'no_of_people'), 
 ('quality', 'cost'), 
 ('quality', 'no_of_people'), 
 ('cost', 'no_of_people')]

Finding elimination ordering

In variable elimination, the order in which we eliminate the variables has a huge impact on the computational cost of running the algorithm. Let's look at the difference in the elimination ordering on the late-for-school model. The steps of variable elimination with the elimination order A, G, J, L, Q, R are shown in the following table:

Step

Variable eliminated

Factors involved

Intermediate factors

New factor

1

A

Finding elimination ordering

Finding elimination ordering

Finding elimination ordering

2

G

Finding elimination ordering

Finding elimination ordering

Finding elimination ordering

3

J

Finding elimination ordering

Finding elimination ordering

Finding elimination ordering

4

L

Finding elimination ordering

Finding elimination ordering

Finding elimination ordering

5

Q

Finding elimination ordering

Finding elimination ordering

Finding elimination ordering

6

R

Finding elimination ordering

Finding elimination ordering

Finding elimination ordering

The steps of variable elimination with the elimination order R, Q, L, J, G, A are shown in the following table:

Step

Variable eliminated

Factors involved

Intermediate factors

New factor

1

R

Finding elimination ordering

Finding elimination ordering

Finding elimination ordering

2

Q

Finding elimination ordering

Finding elimination ordering

Finding elimination ordering

3

L

Finding elimination ordering

Finding elimination ordering

Finding elimination ordering

4

J

Finding elimination ordering

Finding elimination ordering

Finding elimination ordering

5

G

Finding elimination ordering

Finding elimination ordering

Finding elimination ordering

6

A

Finding elimination ordering

Finding elimination ordering

Finding elimination ordering

For every intermediate factor, we add filled edges between all the variables in their scope, so we can say that every intermediate factor introduces a clique in the induced graph. Hence, the scope of every intermediate factor generated during the variable elimination process is a clique in the induced graph. Also, notice that every maximal clique in the induced graph is the scope of some intermediate factor generated during the variable elimination process. Therefore, having a larger maximal clique in the induced graph also means having a larger intermediate factor, which means higher computation cost.

Let's see a few definitions related to induced graphs:

  • Width: This is defined as the number of nodes in the largest clique of the graph minus 1
  • Induced width: This is defined as the width of an induced graph over some network, given an elimination ordering
  • Tree width: The tree width of a network is defined as its minimal induced width

We have seen how the computation complexity of the variable elimination operation relates to the choice of elimination order, and how this relates to the tree width of the induced graph. A smaller tree width ensures a better complexity compared to an elimination order with a higher tree width. So, our problem has now been reduced to selecting an elimination order that keeps the tree width minimal.

Unfortunately, finding the elimination for the minimal tree width is NP -complete, so there is no easy way to find the complexity of the inference over a network by simply looking at the network structure. However, there are many other techniques that we can use to find good elimination orderings.

Using the chordal graph property of induced graphs

We define a graph as being chordal if it contains no cycle of length greater than three, and if there are no edges between two nonadjacent nodes of each cycle. In other words, every minimal cycle in a chordal graph is three in length.

Now, if you look carefully, you will see that every induced graph is a chordal graph. Also, the converse of this theorem holds, that every chordal graph on these variables corresponds to some elimination ordering. The proof of both of these theorems is beyond the scope of this book.

So, to find the elimination order, we use the maximum cardinality search algorithm. In this algorithm, we basically iterate Using the chordal graph property of induced graphs times, and in each iteration, we try to find the variable with the largest number of marked variables and then mark that variable. This results in elimination ordering.

Minimum fill/size/weight/search

Another approach to find elimination ordering is to take the greedy approach and, in each step, select a variable that seems to be the best option for that step. So, for each iteration, we compute a cost function to eliminate each of the nodes and select the node that results in the minimum cost. Some of the cost criteria are as follows:

  • Min-neighbors: The cost of a node is defined by the number of neighbors it has in the graph.
  • Min-weight: The cost of a node is the product of the cardinality of its neighbors.
  • Min-fill: The cost of node elimination is the number of edges that need to be added to the graph for the elimination of that node.
  • Weighted-min-fill: The cost of node elimination is the sum of the weights of the edges that need to be added to the graph for its elimination. The weight of an edge is defined as the product of the weights of the nodes between which it lies.

Here, we have seen two different approaches to finding good elimination ordering. The second heuristic approach of going the greedy way doesn't seem to be a very good approach to get globally optimized elimination ordering. However, it turns out that it gives very good results in most of the cases as compared to our maximum cardinality search algorithm.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset