The loss function

We just learned that we use to approximate . Thus, the estimated value of should be close to . Since these both are distributions, we use KL divergence to measure how diverges from and we need to minimize the divergence.

A KL divergence between and is given as follows:

Since we know , substituting this in the preceding equation, we can write the following:

Since we know log (a/b) = log(a) - log(b), we can rewrite the preceding equation as follows:

We can take the outside the expectations since it has no dependency on :

Since we know log(ab) = log (a) + log(b), we can rewrite the preceding equation as follows:

We know that KL divergence between and can be given as:

Substituting equation (2) in equation (1) we can write:

Rearranging the left and right-hand sides of the equation, we can write the following:

Rearranging the terms, our final equation can be given as follows:

What does the above equation imply?

The left-hand side of the equation is also known as the variational lower bound or the evidence lower bound (ELBO). The first term in the left-hand side implies the distribution of the input x, which we want to maximize and implies the KL divergence between the estimated and the real distribution.

The loss function can be written as follows:

In this equation, you will notice the following:

  • implies we are maximizing the distribution of the input; we can convert the maximization problem into minimization by simply adding a negative sign; thus, we can write
  • implies we are maximizing the KL divergence between the estimated and real distribution, but we want to minimize them, so we can write to minimize the KL divergence

Thus, our loss function becomes the following:

If you look at this equation, basically implies the reconstruction of the input, that is, the decoder which takes the latent vector and reconstructs the input .

Thus, our final loss function is the sum of the reconstruction loss and the KL divergence:

The value for KL divergence is simplified as follows:

Thus, minimizing the preceding loss function implies we are minimizing the reconstruction loss and also minimizing the KL divergence between the estimated and real distribution.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset