We face a problem while training VAE a through gradient descent. Remember, we are performing a sampling operation to generate a latent vector. Since a sampling operation is not differentiable, we cannot calculate gradients. That is, while backpropagating the network to minimize the error, we cannot calculate the gradients of the sampling operation as shown in the following diagram:
So, to combat this, we introduce a new trick called the reparameterization trick. We introduce a new parameter called epsilon, which we randomly sample from a unit Gaussian, which is given as follows:
And now we can rewrite our latent vector as:
The reparameterization trick is shown in the following diagram:
Thus, with the reparameterization trick, we can train the VAE with the gradient descent algorithm.