In the previous section, we saw that to find the approximate distribution, we need to optimize the relative entropy , but computing the relative entropy requires us to compute a summation over all possible instantiations of . To avoid this, we will now try to transform our optimization function in the form of an energy function.
We know the following:
Using the product form of , we have the following:
Also, we know that . Using this in the preceding equation, we get the following:
Here, is the energy functional where:
The important thing to note here is that Z in the relative entropy term doesn't depend on Q. Hence, minimizing the relative entropy is equivalent to maximizing the energy function .
Now, the energy function has two terms. The first one is known as the energy term. The energy term is the summation of the expectations of the logarithm of the factors in . Therefore, in this term, each factor of appears separately. Hence, if these factors are small, then the expectations will be dealing with much fewer variables. The second term in the energy function is called the entropy term and it represents the entropy of Q. The complexity of computing this depends on our choice of Q.