Chapter 4. Approximate Inference

In the previous chapter, we saw algorithms for exact inference on graphical models. The computational complexity of calculating exact inference is exponential to the tree width of the network. Hence, for much larger networks whose tree width is large, exact inference becomes infeasible. Also, in many of our real-life problems, we are not particularly concerned about the exact probabilities of random variables. Rather, we are much more interested in the relative probabilities of the states of variables. Therefore, in this chapter, we will discuss algorithms to perform approximate inference over networks. There are many algorithms for approximate inference, but the approach to find an approximate distribution remains the same in all of them. In most of these, we usually define a target class Q of easy distributions, and then from this class, we try to find the distribution that is closest to our actual distribution Approximate Inference and answer inference queries from this estimated distribution.

In this chapter, we will discuss:

  • Approximate inference as an optimization problem
  • Solving optimization problems using Lagrange multipliers
  • Deriving a clique tree algorithm from an optimization problem
  • The loopy belief propagation algorithm with code examples
  • The expectation propagation algorithm with code examples
  • The mean field algorithm with code examples
  • The full particles and collapsed particles sampling methods
  • The Markov chain Monte Carlo method

The optimization problem

Let's start with a little recap of exact inference. Assume that we have a factorized distribution in the following form:

The optimization problem

Here, Z is the partition function, The optimization problem are the factors in the network, and The optimization problem is the scope of the factor The optimization problem. In the case of exact inference, we computed The optimization problem and then answered queries over this distribution.

In the case of belief propagation, the end result of running the algorithm was a set of beliefs on the clusters and sepsets. This set of beliefs was able to represent the joint distribution The optimization problem. So, in the case of exact inference, we tried to find a set of calibrated beliefs that was able to represent our joint distribution exactly. For approximate algorithms, we will try to select the set of beliefs from all the sets of beliefs that conform to the cluster tree and are best able to represent our original distribution The optimization problem.

So now the question is, how do we compare the similarity between these two distributions? There are many methods that we can use to compute the relative similarity of the two distributions, for example, Euclidean distance, The optimization problem distance, and relative entropy. However, the problem with most of these methods is that we need to answer hard queries on The optimization problem to compute the distance, and the whole purpose of approximate inference is to avoid computing the exact joint distribution. By using relative entropy to measure the similarity between the distributions, we can avoid answering hard queries on The optimization problem. Now, let's see how relative entropy is defined over distributions.

The relative entropy between two distributions The optimization problem and The optimization problem is defined as follows:

The optimization problem

The relative entropy is always non-negative and is 0 only when The optimization problem. Also, the relative entropy is a nonsymmetrical quantity, so The optimization problem.

Now, in our case of approximate inference, we will use The optimization problem (not The optimization problem) because computing it also requires computing The optimization problem). Then, we can find the value of Q, which minimizes The optimization problem.

Summarizing our complete optimization problem, let's assume that we have a cluster tree T for a distribution The optimization problem and are given following the set of beliefs:

The optimization problem

Here, The optimization problem denotes the clusters in T, The optimization problem denotes beliefs over The optimization problem, and The optimization problem denotes beliefs over The optimization problem. This set of beliefs represents a distribution Q as follows:

The optimization problem

As the cluster tree is calibrated, it satisfies the marginal consistency constraints and therefore The optimization problem for each The optimization problem are the marginals of The optimization problem and The optimization problem. Therefore, the set of calibrated beliefs Q satisfies the following equations:

The optimization problem
The optimization problem

Now, we can define our optimization problem by selecting Q from the space of calibrated sets Q:

The optimization problem

This must be done such that it minimizes The optimization problem with the following constraints:

The optimization problem

To solve this optimization problem, we examine the different configurations of beliefs that satisfy the marginal consistency constraints and select the one that minimizes our objective entropy function The optimization problem.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset