< Previous Page | Home Page | Next Page >
Up to this point we have been concerned mainly with tools to access and operate on array data with NumPy. This section covers algorithms (such as insertion sorts, selection sorts, merge sorts, quick sorts, bubble sorts, and many, many more) related to sorting values in NumPy arrays.
All of the above mentioned algorithms are means of accomplishing a similar task: sorting the values in a list or array.
For example, a simple selection sort repeatedly finds the minimum value from a list, and makes swaps until the list is sorted. We can code this in just a few lines of Python:
import numpy as np
def selection_sort(x):
for i in range(len(x)):
swap = i + np.argmin(x[i:])
(x[i], x[swap]) = (x[swap], x[i])
return x
x = np.array([2, 1, 4, 3, 5])
selection_sort(x)
The selection sort is useful for its simplicity, but is much too slow to be useful for larger arrays.
Even selection sort, though, is much better than our all-time favorite sorting algorithms, the bogosort:
def bogosort(x):
while np.any(x[:-1] > x[1:]):
np.random.shuffle(x)
return x
x = np.array([2, 1, 4, 3, 5])
bogosort(x)
This silly sorting method relies on pure chance: it repeatedly applies a random shuffling of the array until the result happens to be sorted. With an average scaling of $\mathcal{O}[N \times N!]$, (that's N times N factorial) this should–quite obviously–never be used for any real computation.
Fortunately, Python contains built-in sorting algorithms that are much more efficient than either of the simplistic algorithms just shown. We'll start by looking at the Python built-ins, and then take a look at the routines included in NumPy and optimized for NumPy arrays.
np.sort
and np.argsort
¶Although Python has built-in sort
and sorted
functions to work with lists, we won't discuss them here because NumPy's np.sort
function turns out to be much more efficient and useful for our purposes.
By default np.sort
uses an $\mathcal{O}[N\log N]$, quicksort algorithm, though mergesort and heapsort are also available. For most applications, the default quicksort is more than sufficient.
To return a sorted version of the array without modifying the input, you can use np.sort
:
x = np.array([2, 1, 4, 3, 5])
np.sort(x)
If you prefer to sort the array in-place, you can instead use the sort
method of arrays:
x.sort()
print(x)
A related function is argsort
, which instead returns the indices of the sorted elements:
x = np.array([2, 1, 4, 3, 5])
i = np.argsort(x)
print(i)
The first element of this result gives the index of the smallest element, the second value gives the index of the second smallest, and so on. These indices can then be used (via fancy indexing) to construct the sorted array if desired:
x[i]
A useful feature of NumPy's sorting algorithms is the ability to sort along specific rows or columns of a multidimensional array using the axis
argument. For example:
rand = np.random.RandomState(42)
X = rand.randint(0, 10, (4, 6))
print(X)
# sort each column of X
np.sort(X, axis=0)
# sort each row of X
np.sort(X, axis=1)
Keep in mind that this treats each row or column as an independent array, and any relationships between the row or column values will be lost!
Sometimes we're not interested in sorting the entire array, but simply want to find the k smallest values in the array. NumPy provides this in the np.partition
function. np.partition
takes an array and a number K; the result is a new array with the smallest K values to the left of the partition, and the remaining values to the right, in arbitrary order:
x = np.array([7, 2, 3, 1, 6, 5, 4])
np.partition(x, 3)
Note that the first three values in the resulting array are the three smallest in the array, and the remaining array positions contain the remaining values. Within the two partitions, the elements have arbitrary order.
Similarly to sorting, we can partition along an arbitrary axis of a multidimensional array:
np.partition(X, 2, axis=1)
The result is an array where the first two slots in each row contain the smallest values from that row, with the remaining values filling the remaining slots.
Finally, just as there is a np.argsort
that computes indices of the sort, there is a np.argpartition
that computes indices of the partition.
We'll see this in action in the following section.
< Previous Page | Home Page | Next Page >