Gibbs sampling

In the Gibbs sampling algorithm, we start by reducing all the factors with the observed variables. After this, we generate a sample for each unobserved variable on the prior using some sampling method, for example, by using a mutilated Bayesian network. After generating the first sample, we iterate over each of the unobserved variables to generate a new value for a variable, given our current sample for all the other variables.

Let's take the example of our restaurant model to make this clearer. Assume that we have already observed that the cost of the restaurant is high. So, we will have the CPDs: Gibbs sampling. We start by generating our first sample with forward sampling, and let's say our first samples are Gibbs sampling and Gibbs sampling. We will now iterate over all of our unobserved variables N, L, Q. Starting with N, we will sample it from the distribution Gibbs sampling. As we are computing the distribution over a single variable, we can compute it very easily as follows:

Gibbs sampling

Now that we have sampled Gibbs sampling from the distribution Gibbs sampling, we continue with the iteration and sample L by conditioning the distribution with the new sample value of N, Gibbs sampling. Similarly, we go on generating samples.

The thing to notice here is that unlike forward sampling, when sampling here we are taking into consideration the evidences, although this method will not give the true posterior as we began sampling from the prior distribution. Yet, considering the evidence, we are able to generate samples that are much closer to the posterior, and the repetition of this method enables us to keep generating samples that get closer to the posterior distribution.

In the later sections, we will formalize this concept using the Markov chain Monte Carlo method. Using this method, we will be able to generate samples that will be much closer to the posterior distribution.

Markov chains

In the case of graphical models, Markov chains are a graph of states of variables X, and the edges represent the probability of transitioning from one state to another. So, an edge Markov chains represents the probability of transitioning from the state x to Markov chains, represented by Markov chains:

Markov chains

Fig 4.20: Markov chain for a drunk man

Let's take the example of a drunk man walking along a road. The position of the person on the road can be represented by a random variable. Let's say the person started at point 0 and can go ahead to +4 or go behind to -4, but there are walls beyond this point, so even if he tries to go beyond these points he will stay at the same point. Also, the probability of going either forward or backward is 0.4 and the probability of staying in the same position is 0.2, that is, Markov chains, Markov chains and Markov chains respectively. Also, Markov chainsas the road is blocked by the walls.

We can consider the position of the man at any given time t to be a random variable represented by Markov chains. This can be computed as follows:

Markov chains

Putting the earlier equation in words, we can say that the probability of the person being at point Markov chains at some time (t + 1) is equal to the sum over all the states Markov chains of the product of that person being in that state x and then transitioning to state Markov chains from x.

Let's now try computing a few probability values for the man's position. We know that the man started from the point 0, so at time t = 0, Markov chains. Now, at time t = 1, the probability of the man being at point 0 is Markov chains, and the probability of being at +1 or -1 is Markov chains. Moving on, at time t = 2, the probability of the man being at point 0 is Markov chains, point +1 or -1 is Markov chains, and point +2 or -2 is Markov chains. We can now see that the probability of being at different states spreads with each time instance, and finally, we will reach a uniform distribution.

To sample from the Markov chain, we can simply select states at each instant of time using the distribution for that instance. However, Markov chains are not a very good method if we want to sample from a uniform distribution, because for the range Markov chains, it takes on average Markov chains steps to reach the uniform distribution. So now, let's try to find out when a Markov chain converges and what the distribution on convergence is.

To make the computation simpler, let's take an example of a similar, but much smaller network, as shown in Fig 4.21:

Markov chains

Fig 4.21: An example of a Markov chain

At equilibrium, we can say that for any state Markov chains, Markov chains should almost be equal to Markov chains:

Markov chains

At equilibrium, the distribution is known as stationary distribution and is represented by Markov chains. We can easily show this as follows:

Markov chains

Now, let's try to compute the stationary distributions for the Markov chain in Fig 4.21. We can write the following equations:

Markov chains
Markov chains
Markov chains

For this to be a legal distribution, it should also satisfy:

Markov chains

We can now easily solve this set of equations to get the following results:

Markov chains
Markov chains
Markov chains

In this case, we got a unique solution for the distributions, but in general, we cannot guarantee that we will always get a converged distribution. For a finite state Markov chain, we can verify the Markov chain for the following two conditions to check if the distributions converge:

  • It is possible to get from any state to another state using a positive probability path
  • For each node, there is a single-step positive probability path to get back to it, that is, a self-loop with positive probability

These two conditions are usually sufficient but not necessary to guarantee convergence in the distribution.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset