Conditioning augmentation

We have a text description as an input to the GAN. Based on these descriptions, it has to generate the images. But how do they understand the meaning of the text to generate a picture?

First, we convert the text into an embedding using an encoder. We represent this text embedding by . Can we create variations of ? By creating variations of text embeddings, , we can have additional training pairs, and we can also increase the robustness to small perturbations.

Let be mean and be the diagonal covariance matrix of our text embedding, . Now we randomly sample an additional conditioning variable, , from the independent Gaussian distribution, . It helps us create variations of text descriptions with their meanings. We know that same text can be written in various ways, so with the conditioning variable, , we can have various versions of the text mapping to the image.

Thus, once we have the text description, we will extract their embeddings using the encoder, and then we compute their mean and covariance. Then, we sample from the Gaussian distribution of the text embedding, .

Table of Contents for Conditioning augmentation

Create new playlist

Sign In

Sign Up

Table of Contents for
Conditioning augmentation