Time for action – partial sorting via selection for a fast median with the partition() function

The partition() function does partial sorting, which should be faster than full sorting, because it's less work.

Note

For more information, please refer to http://en.wikipedia.org/wiki/Partial_sorting. A common use case is getting the top 10 elements of a collection. Partial sorting doesn't guarantee the correct order within the group of top elements itself.

The first argument of the function is the array to partially sort. The second argument is an integer or a sequence of integers corresponding to indices of array elements. The partition() function sorts elements in those indices correctly. With one specified index, we get two partitions; with multiple indices, we get more than one partition. The sorting algorithm makes sure that elements in partitions, which are smaller than a correctly sorted element, come before this element. Otherwise, they are placed behind this element. Let's illustrate this explanation with an example. Start a Python or IPython shell and import NumPy:

$ ipython
In [1]: import numpy as np

Create an array with random elements to sort:

In [2]: np.random.seed(20)

In [3]: a = np.random.random_integers(0, 9, 9)

In [4]: a
Out[4]: array([3, 9, 4, 6, 7, 2, 0, 6, 8])

Partially sort the array by partitioning it in two roughly equal parts:

In [5]: np.partition(a, 4)
Out[5]: array([0, 2, 3, 4, 6, 6, 7, 9, 8])

We get an almost perfect sorting except for the last two elements.

What just happened?

We partially sorted a nine-element array. The sorting only guaranteed that one element in the middle at index 4 is at the correct position. This corresponds to trying to get the top five elements of the array without caring about the order within the top five group. Since the correctly sorted element is in the middle, this also gives the median of the array.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset