StackGAN

Now we will see one of the most intriguing and fascinating types of GAN, which is called a StackGAN. Can you believe if I say StackGANs can generate photo-realistic images just based on the textual descriptions? Well, yes. They can do that. Given a text description, they can generate a realistic image.

Let's first understand how an artist draws an image. In the first stage, artists draw primitive shapes and create a basic outline that forms an initial version of the image. In the next stage, they enhance the image by making it more realistic and appealing.

StackGANs works in a similar manner. They divide the process of generating images into two stages. Just like artists draw pictures, in the first stage, they generate a basic outline, primitive shapes, and create a low-resolution version of the image, and in the second stage, they enhance the picture generated in the first stage by making it more realistic, and then convert them into a high-resolution image.

But how do StackGANs do this?

They use two GANs, one for each stage. The GAN in the first stage generates a basic image and sends it to the GAN in the next stage, which converts basic low-resolution image into a proper high-resolution image. The following figure shows how StackGANs generate images in each of the stages based on the text description:

Source: https://arxiv.org/pdf/1612.03242.pdf

As you can see , in the first stage, we have a low-resolution version of the image, but in the second stage, we have good clarity high-resolution image. But, still, how StackGAN are doing this? Remember, when we learned with conditional GANs that we can make our GAN generate images that we want by conditioning them?

We just use them in both of the stages. In stage one, our network is conditioned based on the text description. With this text description, they generate a basic version of an image. In stage II, our network is conditioned based on the image generated from stage I and also on the text description.

But why do we have to have to condition on the text description again in stage II? Because in stage I, we miss some details specified in the text description to create a basic version of an image. So, in stage II, we again condition on the text description to fix the missing information and also to make our image more realistic.

With this ability to generate pictures just based on the text, it is used for numerous applications. It is heavily used in the entertainment industry, for instance, for creating frames just based on descriptions, and it can also be used for generating comics and many more.

Table of Contents for StackGAN

Create new playlist

Sign In

Sign Up

Table of Contents for
StackGAN