Let's look at the first term:
Here, implies that we are sampling input from the real data distribution, , so is a real image.
implies that we are feeding the input image to the discriminator , and the discriminator will return the probability of input image to be a real image. As is sampled from real data distribution , we know that is a real image. So, we need to maximize the probability of :
But instead of maximizing raw probabilities, we maximize log probabilities; as we learned in Chapter 7, Learning Text Representations, we can write the following:
So, our final equation becomes the following:
implies the expectations of the log likelihood of input images sampled from the real data distribution being real.