Are we minimizing JS divergence in GANs?

We know that generators try to learn the real data distribution, ,so that it can generate new samples from the learned distribution, , and the discriminator tells us whether the image is from a real or fake distribution.

We also learned that when , then the discriminator cannot tell us whether the image is from a real or a fake distribution. It just outputs 0.5 because it cannot differentiate between and .

So, for a generator, the optimal discriminator can be given as follows:

Let's recall the loss function of the discriminator:

It can be simply written as follows:

Substituting equation (1) in the preceding equation we get the following:

It can be solved as follows:

As you can see, we are basically minimizing the JS divergence in the loss function of GAN. So, minimizing the loss function of the GAN basically implies that minimizing the JS divergence between the real data distribution, and the fake data distribution, as shown:

Minimizing the JS divergence between and denotes that the generator makes their distribution similar to the real data distribution . But there is a problem with JS divergence. As you can see from the following figure, there is no overlap between the two distributions. When there is no overlap or when the two distributions do not share the same support, JS divergence will explode or return a constant value and the GANs cannot learn properly:

So, to avoid this, we need to change our loss function. Instead of minimizing the JS divergence, we use a new distance metric called the Wasserstein distance, which tells us how the two distributions are apart from each other even when they don't share the same support.

Table of Contents for Are we minimizing JS divergence in GANs?

Create new playlist

Sign In

Sign Up

Table of Contents for
Are we minimizing JS divergence in GANs?