Mutual information

Mutual information between two random variables tells us the amount of information we can obtain from one random variable through another. Mutual information between two random variables x and y can be given as follows:

It is basically the difference between the entropy of y and the conditional entropy of y given x.

Mutual information between code and the generator output tells us how much information we can obtain about through . If the mutual information c and is high, then we can say knowing the generator output helps us to infer c. But if the mutual information is low, then we cannot infer c from the generator output. Our goal is to maximize the mutual information.

The mutual information between code and the generator output, , can be given as follows:

Let's look at the elements of the formula:

  • is the entropy of the code
  • is the conditional entropy of the code c given the generator output

But the problem is, how do we compute ? Because to compute this value, we need to know the posterior, , which we don't know yet. So, we estimate the posterior with the auxiliary distribution, :

Let's say , then we can deduce mutual information as follows:

Thus, we can say:

Maximizing mutual information, basically implies we are maximizing our knowledge about c given the generated output, that is, knowing about one variable through another.

