Variational inference

Before going ahead, let's get familiar with the notations:

  • Let's represent the distribution of the input dataset by , where represents the parameter of the network that will be learned during training
  • We represent the latent variable by , which encodes all the properties of the input by sampling from the distribution
  • denotes the joint distribution of the input with their properties,
  • represents the distribution of the latent variable

Using the Bayesian theorem, we can write the following:

The preceding equation helps us to compute the probability distribution of the input dataset. But the problem lies in computing , because computing it is intractable. Thus, we need to find a tractable way to estimate the . Here, we introduce a concept called variational inference.

Instead of inferring the distribution of directly, we approximate them using another distribution, say a Gaussian distribution . That is, we use which is basically a neural network parameterized by parameter to estimate the value of :

  • is basically our probabilistic encoder; that is, they to create a latent vector z given
  • is the probabilistic decoder; that is, it tries to construct the input given the latent vector

The following diagram helps you attain good clarity on the notations and what we have seen so far:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset