Until now, we have discussed two different models for representing graphical models. Each of these can represent independence constraints that the other cannot. In this section, we will look at the relationship between these two models.
Both Bayesian models and Markov models parameterize a probability distribution using a graphical model. Further, these structures also encode the independencies among the random variable. So, when converting a Bayesian model into a Markov one, we have to look from the following two perspectives:
From the first perspective, conversion of the Bayesian model into the Markov model is fairly simple. Let's begin by considering the case of a probability distribution , where B is a parameterized Bayesian network over a graph G. If we see the parameterization of the Bayesian network, we can also think of it as a parameterization of a Gibbs distribution. We can think of a CPD over a variable to be a factor with a scope . This set of factors defines a Gibbs distribution with the partition function being equal to 1.
Looking from the second perspective, let's try to find out what kind of undirected graph would be an I-Map for this Gibbs distribution. To understand it more clearly, let's go back to our previous Bayesian network example and try to convert it into a Markov network:
Let's try to convert this Bayesian model into a Markov model simply by replacing directed edges with undirected ones and start by replacing the edges (A, J) and (R, J) with undirected edges. However, this representation has a problem. The Markov Blanket of node A would be J. Thus, this representation asserts that A would be independent of all the nodes in the model expect J, given J or specifically . However, the Bayesian Network asserts the exact opposite of this, that is, . Thus, it requires an additional undirected edge between A and R. Similarly, replacing directed edges with undirected edges and adding extra edges where required, we get the network in the following figure:
Hence, we can conclude that to convert a Bayesian model into a Markov model, we need to do the following:
This new structure created by replacing directed edges and adding new edges is known as the moral graph of the Bayesian network and is also known as the moralization of the network.
We can see that the moral graph H of a Bayesian model G loses some information regarding the independencies. For example, in the graph G, but not in H. However, , so we can say that H is an I-Map for this Gibbs distribution. Note that moral graphs don't always lose information about the independencies. For example, if there had been an edge between A and R already, then no information regarding independencies would have been lost.
In pgmpy
, a Bayesian model can be converted into a Markov model as follows:
In [1]: from pgmpy.models import BayesianModel In [2]: from pgmpy.factors import TabularCPD # Creating the above bayesian network In [2]: model = BayesianModel() In [3]: model.add_nodes_from(['Rain', 'TrafficJam']) In [4]: model.add_edge('Rain', 'TrafficJam') In [5]: model.add_edge('Accident', 'TrafficJam') In [6]: cpd_rain = TabularCPD('Rain', 2, [[0.4], [0.6]]) In [7]: cpd_accident = TabularCPD('Accident', 2, [[0.2], [0.8]]) In [8]: cpd_traffic_jam = TabularCPD( 'TrafficJam', 2, [[0.9, 0.6, 0.7, 0.1], [0.1, 0.4, 0.3, 0.9]], evidence=['Rain', 'Accident'], evidence_card=[2, 2]) In [9]: model.add_cpds(cpd_rain, cpd_accident, cpd_traffic_jam) In [10]: model.add_node('LongQueues') In [11]: model.add_edge('TrafficJam', 'LongQueues') In [12]: cpd_long_queues = TabularCPD('LongQueues', 2, [[0.9, 0.2], [0.1, 0.8]], evidence=['TrafficJam'], evidence_card=[2]) In [13]: model.add_cpds(cpd_long_queues) In [14]: model.add_nodes_from(['GettingUpLate', 'LateForSchool']) In [15]: model.add_edges_from([('GettingUpLate', 'LateForSchool'), ('TrafficJam', 'LateForSchool')]) In [16]: cpd_getting_up_late = TabularCPD('GettingUpLate', 2, [[0.6], [0.4]]) In [17]: cpd_late_for_school = TabularCPD( 'LateForSchool', 2, [[0.9, 0.45, 0.8, 0.1], [0.1, 0.55, 0.2, 0.9]], evidence=['GettingUpLate',TrafficJam'], evidence_card=[2, 2]) In [18]: model.add_cpds(cpd_getting_up_late, cpd_late_for_school) # Conversion from BayesianModel to MarkovModel is accomplished by In [19]: mm = model.to_markov_model() In [20]: mm.edges() Out[20]: [('TrafficJam', 'Accident'), ('TrafficJam', 'LongQueues'), ('TrafficJam', 'LateForSchool'), ('TrafficJam', 'Rain'), ('TrafficJam', 'GettingUpLate'), ('LateForSchool', 'GettingUpLate'), ('Accident', 'Rain')]
The conversion of a Markov model into a Bayesian model is not as simple as the case of converting a Bayesian model into a Markov model.
Let's start with our simple Markov model example and try to convert it into a Bayesian model. In this section, we will be looking from the perspective of independencies, that is, finding a Bayesian model that is an I-Map of the corresponding Markov model:
We can simply replace all the undirected edges in the Markov model (Fig 2.7(a)) with directed edges (Fig 2.7(b)). However, does this Bayesian model encode all the independencies of the corresponding Markov model? Before getting into this, let's take a more simple example of the Markov model formed by removing the node C:
Fig 2.8(a) represents a Markov model formed by removing the node C. Fig 2.8(b) is formed just by converting the undirected edges into directed edges. The Markov model encodes the independence assertion that , which is also encoded in the corresponding Bayesian model. So, the Bayesian model formed is a perfect I-Map of the Markov model. Now, let's go back to our previous example and examine the independencies encoded in both, the Markov model and the Bayesian model formed simply by converting undirected edges into directed ones.
The Markov model H encodes , but the corresponding Bayesian model G encodes , which is not true for H, where . So, for G to be an I-Map for H, there should be a directed edge between B and D. However, why did simply converting the undirected edges into direct edges not suffice as in the example in Fig 2.8?
We can see that the example in Fig 2.7 is a non-triangulated (non-chordal) graph. A triangulated or chordal graph is a graph in which each of its cycles of four or more vertices has a chord (an edge that is not part of the cycle but connects two vertices of the cycle). By simply converting edges of a non-triangulated graph into directed ones, we introduce immoralities. An immorality is a v-structure (), if there is no directed edge between X and Y. So why does the introduction of immorality pose an issue? To get the answer to this question, let's look at the example again. Before the introduction of immorality or conversion of edges into directed ones, we had , but after the addition of immorality, we had .
So, we can conclude that the process of converting a Markov models to a Bayesian model requires us to add edges to the network to make it chordal. This process is known as triangulation.
In pgmpy
, we can convert a Markov model into a Bayesian model in the following way:
In [1]: from pgmpy.models import MarkovModel In [2]: from pgmpy.factors import Factor In [3]: model = MarkovModel() # Fig 2.7(a) represents the Markov model In [4]: model.add_nodes_from(['A', 'B', 'C', 'D']) In [5]: model.add_edges_from([('A', 'B'), ('B', 'C'), ('C', 'D'), ('D', 'A')]) # Adding some factors. In [6]: phi_A_B = Factor(['A', 'B'], [2, 2], [1, 100, 100, 1]) In [7]: phi_B_C = Factor(['B', 'C'], [2, 2], [100, 1, 1, 100]) In [8]: phi_C_D = Factor(['C', 'D'], [2, 2], [1, 100, 100, 1]) In [9]: phi_D_A = Factor(['D', 'A'], [2, 2], [100, 1, 1, 100]) In [10]: model.add_factors(phi_A_B, phi_B_C, phi_C_D, phi_D_A) In [11]: bayesian_model = model.to_bayesian_model() In [12]: bayesian_model.edges() Out[12]: [('D', 'C'), ('D', 'B'), ('D', 'A'), ('B', 'C'), ('B', 'A')]
As we have seen, in the case of converting a Bayesian model into a Markov model, we lost some of the independence conditions. The same holds true in this case as well, and we can see from the example that we lose the following conditions:
We also see that for the perfect conversion of the model, we must have a chordal graph. The process of converting a non-chordal graph into a chordal one is called triangulation. A triangulated graph can be obtained from an undirected graph by adding links.
In pgmpy
, we can triangulate a graph as follows:
In [1]: from pgmpy.models import MarkovModel In [2]: from pgmpy.factors import Factor In [3]: import numpy as np In [4]: model = MarkovModel() # Fig 2.7(a) represents the MarkovModel In [6]: model.add_nodes_from(['A', 'B', 'C', 'D']) In [7]: model.add_edges_from( [('A', 'B'), ('B', 'C'), ('C', 'D'), ('D', 'A')]) # Adding some factors In [8]: phi_A_B = Factor(['A', 'B'], [2, 2], [1, 100, 100, 1]) In [9]: phi_B_C = Factor(['B', 'C'], [2, 2], [100, 1, 1, 100]) In [10]: phi_C_D = Factor(['C', 'D'], [2, 2], [1, 100, 100, 1]) In [11]: phi_D_A = Factor(['D', 'A'], [2, 2], [100, 1, 1, 100]) In [12]: model.add_factors(phi_A_B, phi_B_C, phi_C_D, phi_D_A) In [13]: chordal_graph = model.triangulate() # Fig 2.9 represents the chordal graph created by triangulation In [14]: chordal_graph.edges() Out[14]: [('C', 'D'), ('C', 'B'), ('D', 'B'), ('D', A'), ('A, 'B')]
The following is the chordal graph formed by triangulating the Markov model:
There are six heuristics presented in Heuristic Algorithms for the Triangulation of Graphs by Andres Cano and Serafn Moral to add links in an undirected graph to triangulate it. The detailed explanation of these heuristics is beyond the scope of this book (for a detailed explanation, you can go through this paper). These heuristics are also implemented in pgmpy
in the following way:
# For creating a chordal graph using first heuristic there are six # heuristics that are implemented in pgmpy and can be used by # passing the keyword argument heuristic as H1, H2, H3, H4, H5, H6 In [15]: chordal_graph = model.triangulate(heuristic='H1')