Bayesian and Markov networks

Until now, we have discussed two different models for representing graphical models. Each of these can represent independence constraints that the other cannot. In this section, we will look at the relationship between these two models.

Converting Bayesian models into Markov models

Both Bayesian models and Markov models parameterize a probability distribution using a graphical model. Further, these structures also encode the independencies among the random variable. So, when converting a Bayesian model into a Markov one, we have to look from the following two perspectives:

  • From the perspective of parameterization, that is, representing the probability distribution of the Bayesian model Converting Bayesian models into Markov models using a fully parameterized Markov model
  • From the perspective of independencies, that is, representing the independence constraints encoded by the Bayesian model using the Markov model

From the first perspective, conversion of the Bayesian model into the Markov model is fairly simple. Let's begin by considering the case of a probability distribution Converting Bayesian models into Markov models, where B is a parameterized Bayesian network over a graph G. If we see the parameterization of the Bayesian network, we can also think of it as a parameterization of a Gibbs distribution. We can think of a CPD over a variable Converting Bayesian models into Markov models to be a factor with a scope Converting Bayesian models into Markov models. This set of factors defines a Gibbs distribution with the partition function being equal to 1.

Looking from the second perspective, let's try to find out what kind of undirected graph would be an I-Map for this Gibbs distribution. To understand it more clearly, let's go back to our previous Bayesian network example and try to convert it into a Markov network:

Converting Bayesian models into Markov models

Fig 2.5 Simple Bayesian model

Let's try to convert this Bayesian model into a Markov model simply by replacing directed edges with undirected ones and start by replacing the edges (A, J) and (R, J) with undirected edges. However, this representation has a problem. The Markov Blanket of node A would be J. Thus, this representation asserts that A would be independent of all the nodes in the model expect J, given J or specifically Converting Bayesian models into Markov models. However, the Bayesian Network asserts the exact opposite of this, that is, Converting Bayesian models into Markov models. Thus, it requires an additional undirected edge between A and R. Similarly, replacing directed edges with undirected edges and adding extra edges where required, we get the network in the following figure:

Converting Bayesian models into Markov models

Fig 2.6 Moral graph of Bayesian model represented in Fig 2.5

Hence, we can conclude that to convert a Bayesian model into a Markov model, we need to do the following:

  • Replace all the directed edges between the nodes with undirected edges
  • Add additional undirected edges between nodes that are parents of the node

This new structure created by replacing directed edges and adding new edges is known as the moral graph of the Bayesian network and is also known as the moralization of the network.

We can see that the moral graph H of a Bayesian model G loses some information regarding the independencies. For example, Converting Bayesian models into Markov models in the graph G, but not in H. However, Converting Bayesian models into Markov models, so we can say that H is an I-Map for this Gibbs distribution. Note that moral graphs don't always lose information about the independencies. For example, if there had been an edge between A and R already, then no information regarding independencies would have been lost.

In pgmpy, a Bayesian model can be converted into a Markov model as follows:

In [1]: from pgmpy.models import BayesianModel
In [2]: from pgmpy.factors import TabularCPD

# Creating the above bayesian network
In [2]: model = BayesianModel()
In [3]: model.add_nodes_from(['Rain', 'TrafficJam'])
In [4]: model.add_edge('Rain', 'TrafficJam')
In [5]: model.add_edge('Accident', 'TrafficJam')
In [6]: cpd_rain = TabularCPD('Rain', 2, [[0.4], [0.6]])
In [7]: cpd_accident = TabularCPD('Accident', 2, [[0.2], [0.8]])
In [8]: cpd_traffic_jam = TabularCPD(

                  'TrafficJam', 2,

                  [[0.9, 0.6, 0.7, 0.1],

                   [0.1, 0.4, 0.3, 0.9]],

                  evidence=['Rain', 'Accident'],

                  evidence_card=[2, 2])
In [9]: model.add_cpds(cpd_rain, cpd_accident, cpd_traffic_jam)
In [10]: model.add_node('LongQueues')
In [11]: model.add_edge('TrafficJam', 'LongQueues')
In [12]: cpd_long_queues = TabularCPD('LongQueues', 2,

                                         [[0.9, 0.2],

                                          [0.1, 0.8]], 
                                         evidence=['TrafficJam'],

                                         evidence_card=[2])
In [13]: model.add_cpds(cpd_long_queues)
In [14]: model.add_nodes_from(['GettingUpLate', 'LateForSchool'])
In [15]: model.add_edges_from([('GettingUpLate', 'LateForSchool'),

                               ('TrafficJam', 'LateForSchool')])
In [16]: cpd_getting_up_late = TabularCPD('GettingUpLate', 2,

                                          [[0.6], [0.4]])
In [17]: cpd_late_for_school = TabularCPD(

                         'LateForSchool', 2,

                         [[0.9, 0.45, 0.8, 0.1],

                          [0.1, 0.55, 0.2, 0.9]],

                         evidence=['GettingUpLate',TrafficJam'],

                         evidence_card=[2, 2])
In [18]: model.add_cpds(cpd_getting_up_late, cpd_late_for_school)

# Conversion from BayesianModel to MarkovModel is accomplished by

In [19]: mm = model.to_markov_model()
In [20]: mm.edges()
Out[20]: 
[('TrafficJam', 'Accident'),
 ('TrafficJam', 'LongQueues'),
 ('TrafficJam', 'LateForSchool'),
 ('TrafficJam', 'Rain'),
 ('TrafficJam', 'GettingUpLate'),
 ('LateForSchool', 'GettingUpLate'),
 ('Accident', 'Rain')]

Converting Markov models into Bayesian models

The conversion of a Markov model into a Bayesian model is not as simple as the case of converting a Bayesian model into a Markov model.

Let's start with our simple Markov model example and try to convert it into a Bayesian model. In this section, we will be looking from the perspective of independencies, that is, finding a Bayesian model that is an I-Map of the corresponding Markov model:

Converting Markov models into Bayesian models

Fig 2.7(a) Markov model

Fig 2.7(b) Bayesian model formed by changing the directed edges into undirected ones

We can simply replace all the undirected edges in the Markov model (Fig 2.7(a)) with directed edges (Fig 2.7(b)). However, does this Bayesian model encode all the independencies of the corresponding Markov model? Before getting into this, let's take a more simple example of the Markov model formed by removing the node C:

Converting Markov models into Bayesian models

Fig 2.8(a) Markov model formed by removing node C

Fig 2.8(b) Bayesian model formed by changing the directed edges into undirected ones

Fig 2.8(a) represents a Markov model formed by removing the node C. Fig 2.8(b) is formed just by converting the undirected edges into directed edges. The Markov model encodes the independence assertion that Converting Markov models into Bayesian models, which is also encoded in the corresponding Bayesian model. So, the Bayesian model formed is a perfect I-Map of the Markov model. Now, let's go back to our previous example and examine the independencies encoded in both, the Markov model and the Bayesian model formed simply by converting undirected edges into directed ones.

The Markov model H encodes Converting Markov models into Bayesian models, but the corresponding Bayesian model G encodes Converting Markov models into Bayesian models, which is not true for H, where Converting Markov models into Bayesian models. So, for G to be an I-Map for H, there should be a directed edge between B and D. However, why did simply converting the undirected edges into direct edges not suffice as in the example in Fig 2.8?

We can see that the example in Fig 2.7 is a non-triangulated (non-chordal) graph. A triangulated or chordal graph is a graph in which each of its cycles of four or more vertices has a chord (an edge that is not part of the cycle but connects two vertices of the cycle). By simply converting edges of a non-triangulated graph into directed ones, we introduce immoralities. An immorality is a v-structure (Converting Markov models into Bayesian models), if there is no directed edge between X and Y. So why does the introduction of immorality pose an issue? To get the answer to this question, let's look at the example again. Before the introduction of immorality or conversion of edges into directed ones, we had Converting Markov models into Bayesian models, but after the addition of immorality, we had Converting Markov models into Bayesian models.

So, we can conclude that the process of converting a Markov models to a Bayesian model requires us to add edges to the network to make it chordal. This process is known as triangulation.

In pgmpy, we can convert a Markov model into a Bayesian model in the following way:

In [1]: from pgmpy.models import MarkovModel
In [2]: from pgmpy.factors import Factor
In [3]: model = MarkovModel()

# Fig 2.7(a) represents the Markov model
In [4]: model.add_nodes_from(['A', 'B', 'C', 'D'])
In [5]: model.add_edges_from([('A', 'B'), ('B', 'C'), 
                              ('C', 'D'), ('D', 'A')])

# Adding some factors.
In [6]: phi_A_B = Factor(['A', 'B'], [2, 2], [1, 100, 100, 1])
In [7]: phi_B_C = Factor(['B', 'C'], [2, 2], [100, 1, 1, 100])
In [8]: phi_C_D = Factor(['C', 'D'], [2, 2], [1, 100, 100, 1])
In [9]: phi_D_A = Factor(['D', 'A'], [2, 2], [100, 1, 1, 100])
In [10]: model.add_factors(phi_A_B, phi_B_C, phi_C_D, phi_D_A)
In [11]: bayesian_model = model.to_bayesian_model()
In [12]: bayesian_model.edges()
Out[12]: [('D', 'C'), ('D', 'B'), ('D', 'A'), 
          ('B', 'C'), ('B', 'A')]

Chordal graphs

As we have seen, in the case of converting a Bayesian model into a Markov model, we lost some of the independence conditions. The same holds true in this case as well, and we can see from the example that we lose the following conditions:

  • Statistical independence between parents of the same node in a Bayesian network is lost when it is converted into a Markov one due to the introduction of immorality
  • Addition of extra edges to convert a Markov model into a Bayesian one leads to the loss of local independence information

We also see that for the perfect conversion of the model, we must have a chordal graph. The process of converting a non-chordal graph into a chordal one is called triangulation. A triangulated graph can be obtained from an undirected graph by adding links.

In pgmpy, we can triangulate a graph as follows:

In [1]: from pgmpy.models import MarkovModel
In [2]: from pgmpy.factors import Factor
In [3]: import numpy as np
In [4]: model = MarkovModel()

# Fig 2.7(a) represents the MarkovModel
In [6]: model.add_nodes_from(['A', 'B', 'C', 'D'])
In [7]: model.add_edges_from(
                [('A', 'B'), ('B', 'C'), ('C', 'D'), ('D', 'A')])

# Adding some factors
In [8]: phi_A_B = Factor(['A', 'B'], [2, 2], [1, 100, 100, 1])
In [9]: phi_B_C = Factor(['B', 'C'], [2, 2], [100, 1, 1, 100])
In [10]: phi_C_D = Factor(['C', 'D'], [2, 2], [1, 100, 100, 1])
In [11]: phi_D_A = Factor(['D', 'A'], [2, 2], [100, 1, 1, 100])
In [12]: model.add_factors(phi_A_B, phi_B_C, phi_C_D, phi_D_A)
In [13]: chordal_graph = model.triangulate()

# Fig 2.9 represents the chordal graph created by triangulation
In [14]: chordal_graph.edges()
Out[14]: [('C', 'D'), ('C', 'B'), ('D', 'B'),
          ('D', A'), ('A, 'B')]

The following is the chordal graph formed by triangulating the Markov model:

Chordal graphs

Fig 2.9 Chordal graph

There are six heuristics presented in Heuristic Algorithms for the Triangulation of Graphs by Andres Cano and Serafn Moral to add links in an undirected graph to triangulate it. The detailed explanation of these heuristics is beyond the scope of this book (for a detailed explanation, you can go through this paper). These heuristics are also implemented in pgmpy in the following way:

# For creating a chordal graph using first heuristic there are six
# heuristics that are implemented in pgmpy and can be used by
# passing the keyword argument heuristic as H1, H2, H3, H4, H5, H6
In [15]: chordal_graph = model.triangulate(heuristic='H1')

Tip

If no heuristics are provided, H6 must be used by default.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset