Least squares GAN

We just learned how GANs are used to generate images. Least Squares GAN (LSGAN) is another simple variant of a GAN. As the name suggests, here, we use the least square error as a loss function instead of sigmoid cross-entropy loss. With LSGAN, we can improve the quality of images being generated from the GAN. But how can we do that? Why do the vanilla GANs generate poor quality images?

If you can recollect the loss function of GAN, we used sigmoid cross-entropy as the loss function. The goal of the generator is to learn the distribution of the images in the training set, that is, real data distribution, map it to the fake distribution, and generate fake samples from the learned fake distribution. So, the GANs try to map the fake distribution as close to the true distribution as possible.

But once the fake samples are on the correct side of the decision surface, then gradients tend to vanish even though the fake samples are far away from the real distribution. This is due to the sigmoid cross-entropy loss.

Let's understand this with the following figure. A decision boundary of vanilla GANs with sigmoid cross-entropy as a loss function is shown in the following figure where fake samples are represented by a cross, and real samples are represented by a dot, and the fake samples for updating the generator are represented by a star.

As you can observe, once the fake samples (star) generated by the generator are on the correct side of the decision surface, that is, once the fake samples are on the side of real samples (dot) then the gradients tend to vanish even though the fakes samples are far away from real distribution. This is due to the sigmoid cross-entropy loss, because it does not care whether the fake samples are close to real samples; it only looks for whether the fake samples are on the correct side of the decision surface. This leads to a problem that when the gradients vanish even though the fake samples are far away from the real data distribution then the generator cannot learn the real distribution of the dataset:

So, we can change this decision surface with sigmoid cross-entropy loss to a least squared loss. Now, as you can see in the following diagram, although the fake samples generated by the generator are on the correct side of the decision surface, gradients will not vanish until the fake samples match the true distribution. Least square loss forces the updates to match the fake samples to the true samples:

So, since we are matching a fake distribution to the real distribution, our image quality will be improved when we use the least square as a cost function.

In a nutshell gradient updates in the vanilla GANs will be stopped when the fake samples are correct side of the decision surface even though they are from the real samples, that is, the real distribution. This is due to the sigmoid cross-entropy loss and it does not care whether the fake samples are close to real samples, it only looks for whether the fake samples are on the correct side. This leads to the problem that we cannot learn the real data distribution perfectly. So, we use LSGAN, which uses least squared error as a loss function, where the gradient updates will not be stopped until the fake samples match the real sample, even though the fake samples are on the correct side of the decision boundary.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset