The partition()
function does partial sorting, which should be faster than full sorting, because it's less work.
For more information, please refer to http://en.wikipedia.org/wiki/Partial_sorting. A common use case is getting the top 10 elements of a collection. Partial sorting doesn't guarantee the correct order within the group of top elements itself.
The first argument of the function is the array to partially sort. The second argument is an integer or a sequence of integers corresponding to indices of array elements. The partition()
function sorts elements in those indices correctly. With one specified index, we get two partitions; with multiple indices, we get more than one partition. The sorting algorithm makes sure that elements in partitions, which are smaller than a correctly sorted element, come before this element. Otherwise, they are placed behind this element. Let's illustrate this explanation with an example. Start a Python or IPython shell and import NumPy:
$ ipython In [1]: import numpy as np
Create an array with random elements to sort:
In [2]: np.random.seed(20) In [3]: a = np.random.random_integers(0, 9, 9) In [4]: a Out[4]: array([3, 9, 4, 6, 7, 2, 0, 6, 8])
Partially sort the array by partitioning it in two roughly equal parts:
In [5]: np.partition(a, 4) Out[5]: array([0, 2, 3, 4, 6, 6, 7, 9, 8])
We get an almost perfect sorting except for the last two elements.
We partially sorted a nine-element array. The sorting only guaranteed that one element in the middle at index 4 is at the correct position. This corresponds to trying to get the top five elements of the array without caring about the order within the top five group. Since the correctly sorted element is in the middle, this also gives the median of the array.