What is the Wasserstein distance?

The Wasserstein distance, also known as the Earth Movers (EM) distance, is one of the most popularly used distance measures in the optimal transport problems where we need to move things from one configuration to another.

So, when we have two distributions, and , implies that how much amount of work is required for the probability distribution, to match the probability distribution .

Let’s try to understand the intuition behind the EM distance. We can view the probability distribution as a collection of mass. Our goal is to convert one probability distribution to another. There are many possible ways to convert one distribution to another, but the Wasserstein metric seeks to find the optimal and minimum way that has the least cost in conversion.

The cost of conversion can be given as a distance multiplied by the mass.

The amount of information moved from point x to point y is given as . It is called a transport plan. It tells us how much information we need to transport from x to y, and the distance between x and y is given as .

So, the cost is given as follows:

We have many (x,y) pairs so the expectations across all (x,y) pairs are given as follows:

It implies the cost of moving from point x to y. There are many ways to move from x to y, but we are interested only in the optimal path, that is, minimum cost, so we rewrite our preceding equation as follows:

Here, inf basically implies the minimum value. is the set of all possible joint distributions between and .

So, out of all the possible joint distributions between and we are finding the minimum cost required to make one distribution look like another.

Our final equation can be given as follows:

However, calculating the Wasserstein distance is not a simple task because it is difficult to exhaust all possible joint distributions, , and it turns into another optimization problem.

In order to avoid that, we introduce Kantorovich-Rubinstein duality. It converts our equation into a simple maximization problem, as follows:

Okay, but what does the above equation mean? We are basically applying the supremum over all k-Lipschitz function. Wait. What is the Lipschitz function and what is supremum? Let's discuss that in the next section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset