Complexity of any comparison-based sorting

Now that we have seen two algorithms for sorting that are more efficient than the ones described in the previous chapter, how do we know that they are as efficient as a sorting can be? Can we make algorithms that are even faster? We will see in this section that we have reached our asymptotic limit of efficiency, that is, a comparison-based sorting will have a minimum time complexity of θ(m lg m), where m is the number of elements.

Suppose we start with an array of m elements. For the time being, let's assume they are all distinct. After all, if such an array is a possible input, we need to consider this case as well. The number of different arrangements possible with these elements is m!. One of these arrangements is the correct sorted one. Any algorithm that will sort this array using comparison will have to be able to distinguish this particular arrangement from all others using only comparison between pairs of elements. Any comparison divides the arrangements into two sets–one that causes an inversion as per the comparison between those two exact values and one that does not. This is to say that given any two values a and b from the arrays, a comparison that returns a<b will divide the set of arrangements into two partitions; the first set will contain all the arrangements where b comes before a, and the second set will contain all the arrangements where a comes before b. The sorted arrangement is, of course, a member of the second set. Any algorithm that sorts based on comparisons will have to do enough of them to pin down on the single correct arrangement, that is, the sorted arrangement. Basically, it will first perform a comparison, choose the correct subset, then perform another comparison and choose the correct subset of the subset, and so on, until it reaches a set of arrangements containing just one arrangement. This particular arrangement is the sorted version of the array. What is the minimum number of comparisons that are required to find one particular arrangement out of m! arrangements? This is the same as asking how many times you have to halve a set of m! elements to reach a set of only one element. It is, of course, lg (m!). This is a rough estimation; in fact, the number of comparisons required would be a bit more than this because the two subsets that each comparison creates may not be equal in size. But we know that the number of comparisons required is lg (m!) at the minimum.

Now, how much is lg m!? Well, it is (ln (m!)) (lg e), where ln (x) is the natural logarithm of x. We will find a simpler asymptotic complexity for the function ln(m!). It requires a little bit of calculus.

Let's look at the following figure:

Complexity of any comparison-based sorting

Area under the curve y = ln x.

The diagram shows some plots. We know that the integral measures the area under the curve of a function. Now, the area under the curve y=ln b between a and b is (b-a)ln b, and the area under the curve y=ln a is (b-a) ln a. The area under the curve y=lg x in the same interval is as follows:

Complexity of any comparison-based sorting

From the graph in the preceding figure, the following is clear:

Complexity of any comparison-based sorting

In particular, having b=a+1, we get the following:

Complexity of any comparison-based sorting

So, we set a = 1 and move up to a = m-1, to get the following set of inequalities:

Complexity of any comparison-based sorting

Adding respective sides, we get the following:

Complexity of any comparison-based sorting

Now, of course we know the following:

Complexity of any comparison-based sorting

So, we have the following inequalities:

Complexity of any comparison-based sorting

In the left inequality, if we put m instead of m-1, we will have the following:

Complexity of any comparison-based sorting

This, combined with the right inequality, gives the following relations:

Complexity of any comparison-based sorting

This gives a pretty good upper bound and lower bound on the value of ln(m!). Both the upper bound and lower bound are θ(m ln m). So, we can conclude that ln(m!) = θ(m ln m). This means that lg(m!) = (ln (m!))(lg e) is also θ(m ln m) = θ(m lg m) because lg(m) = (ln m )(lg e).

So, the minimum time complexity of a comparison-based sorting algorithm would have to be at least θ(m lg m) just because of the minimum number of comparisons that would be needed to do this. Therefore, mergesort and the typical case of quicksort are asymptotically optimal.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset