Understanding InfoGAN

InfoGAN is an unsupervised version of CGAN. In CGAN, we learned how to condition the generator and discriminator to generate the image we want. But how can we do that when we have no labels in the dataset? Assume we have an MNIST dataset with no labels; how can we tell the generator to generate the specific image that we are interested in? Since the dataset is unlabeled, we do not even know about the classes present in the dataset.

We know that generators use noise z as an input and generate the image. Generators encapsulate all the necessary information about the image in the z and it is called entangled representation. It is basically learning the semantic representation of the image in z. If we can disentangle this vector, then we can discover interesting features of our image.

So, we will split this z into two:

Usual noise
Code c

What is the code? The code c is basically interpretable disentangled information. Assuming we have MNIST data, then, code c1 implies the digit label, code c2 implies the width, c3 implies the stroke of the digit, and so on. We collectively represent them with the term c.

Now that we have z and c, how can we learn meaningful code c? Can we learn meaningful code with the image generated from the generator? Say a generator generates the image of 7. Now we can say code c1 is 7 as we know c1 implies the digit label.

But since code can mean anything, say, a label, a width of the digit, stroke, rotation angle, and so on—how can we learn what we want? The code c will be learned based on the choice of the prior. For instance, if we chose a multinomial prior for c, then our InfoGAN might assign a digit label for c. Say, we assign a Gaussian prior, then it might assign a rotation angle, and so on. We can also have more than one prior.

The distribution for prior c can be anything. InfoGAN assigns different properties according to the distribution. In InfoGAN, the code c is inferred automatically based on the generator output, unlike CGAN, where we explicitly specify the c.

In a nutshell, we are inferring based on the generator output, . But how exactly we are inferring ? We use a concept from information theory called mutual information.

Table of Contents for Understanding InfoGAN

Create new playlist

Sign In

Sign Up

Table of Contents for
Understanding InfoGAN