Variational autoencoder

Variational autoencoder (VAE) are inspired by the concept of Autoencoder: a model consisting of two neural networks called encoders and decoders. As we have seen, the encoder network tries to code its input in a compressed form, while the network decoder tries to reconstruct the initial input, starting from the code returned by the encoder.

However, the functioning of the VAE is very different than that of simple autoencoders. VAEs allow not only coding/decoding of input but also generating new data. To do this, they treat both the code z and the reconstruction/generation x' as if they belonged to a certain probability distribution. In particular, the VAEs are the result of the combination of deep learning and Bayesian inference, in the sense that they consist of a neural network trained with the backpropagation algorithm modified with a technique called re-parameterization. While deep learning has proven to be very effective in the approximation of complex functions, the Bayesian statistics allow managing the uncertainty derived from a random generation in the form of probabilities.

The VAE uses the same structure to generate new images, similar to those belonging to the training set. In this case, the encoder does not directly produce a code for a given input but calculates the mean and variance of a normal distribution. A value is taken from this distribution and it is decoded by the decoder. The training consists of modifying the encoder and decoder parameters so that the result of the decoded so carried out is as similar as possible to the starting image. At the end of the training, we have that starting from the normal distribution with mean and variance produced by the encoder; the decoder will be able to produce images similar to those belonging to the training set.

Let's define the following terms:

X: Input data vector
z: Latent variable
P(X) : Probability distribution of the data
P(z): Probability distribution of the latent variable
P(X|z): Posterior probability, that is, the distribution of generating data given the latent variable

The posterior probability P(X|z) is the probability of the condition X given the evidence z.

Our goal is to generate data according to the characteristics contained in the latent variable, so we want to find P(X). For this purpose, we can use the law of total probability according to the following formula:

To understand how we arrived at this formulation, we reason by step. Our first task in defining the model is to infer good values of the latent variables starting from the observed data, or to calculate the posterior p(z|X). To do this, we can use the Bayes theorem:

In the previous formula, the P(X) term appears. In the context of Bayesian statistics, it may also be referred to as the evidence or model evidence. The evidence can be calculated by marginalizing out the latent variables. This brings us to the starting formula:

The computational estimate of this integral requires an exponential time as it must be evaluated on all the configurations of latent variables. To reduce the computational cost, we are forced to approximate the estimate of the posterior probability.

In VAE, as the name suggests, we deduce p(z | X) using a method called variational inference (VI). VI is one of the most used methods in Bayesian inference. This technique considers inference as an optimization problem. In doing this, we use a simpler distribution that is easy to evaluate (for example, Gaussian) and minimize the difference between these two distributions using the Kullback-Leibler divergence metric.

Kullback-Leibler divergence metric is a non-symmetric measure of the difference between two probability distributions P and Q. Specially, the Kullback-Leibler divergence of Q from P, denoted by DKL (P ||Q), is the measurement of the information lost when Q is used to approximate P.

For discrete probability distributions P and Q, the Kullback-Leibler divergence from Q to P is defined as follows:

Analyzing the formula makes it evident that the divergence of Kullback-Leibler is the expectation of the logarithmic difference between the probabilities P and Q, where the expectation is taken using the probability P.

Table of Contents for Variational autoencoder

Create new playlist

Sign In

Sign Up

Table of Contents for
Variational autoencoder