Dissecting variational autoencoders

Now we will see another very interesting type of autoencoders called variational autoencoders (VAE). Unlike other autoencoders, VAEs are generative models that imply they learn to generate new data just like GANs.

Let's say we have a dataset containing facial images of many individuals. When we train our variational autoencoder with this dataset, it learns to generate new realistic faces that are not seen in the dataset. VAEs have various applications because of their generative nature and some of them include generating images, songs, and so on. But what makes VAE generative and how is it different than other autoencoders? Let's learn that in the coming section.

Just as we learned when discussing GANs, for a model to be generative, it has to learn the distribution of the inputs. For instance, let's say we have a dataset that consists of handwritten digits, such as the MNIST dataset. Now, in order to generate new handwritten digits, our model has to learn the distribution of the digits in the given dataset. Learning the distribution of the digits present in the dataset helps VAE to learn useful properties such as digit width, stroke, height, and so on. Once the model encodes this property in its distribution, then it can generate new handwritten digits by sampling from the learned distribution.

Say we have a dataset of human faces, then learning the distribution of the faces in the dataset helps us to learn various properties such as gender, facial expression, hair color, and so on. Once the model learns and encode these properties in its distribution, then it can generate a new face just by sampling from the learned distribution.

Thus, in VAE, instead of mapping the encoder's encodings directly to the latent vector (bottleneck), we map the encodings to a distribution; usually, it is a Gaussian distribution. We sample a latent vector from this distribution and feed it to a decoder then the decoder learns to reconstruct the image. As shown in the following diagram, an encoder maps its encodings to a distribution and we sample a latent vector from this distribution and feed it to a decoder to reconstruct an image:

A gaussian distribution can be parameterized by its mean and covariance matrix. Thus, we can make our encoder generate its encoding and maps it to a mean vector and standard deviation vector that approximately follows the Gaussian distribution. Now, from this distribution, we sample a latent vector and feed it to our decoder which then reconstructs an image:

In a nutshell, the encoder learns the desirable properties of the given input and encodes them into distribution. We sample a latent vector from this distribution and feed the latent vector as input to the decoder which then generates images learned from the encoder's distribution.

In VAE, the encoder is also called as recognition model and the decoder is also called as generative model. Now that we have an intuitive understanding of VAE, in the next section, we will go into detail and learn how VAE works.

Table of Contents for Dissecting variational autoencoders

Create new playlist

Sign In

Sign Up

Table of Contents for
Dissecting variational autoencoders