How to Sort Arrays in NumPy
Sorting is one of the most fundamental operations in data processing. Whether you're ranking search results, organizing time-series data, or preprocessing features for machine learning, you'll sort...
Key Insights
- NumPy’s
np.sort()returns a sorted copy whilendarray.sort()sorts in-place—choose based on whether you need to preserve the original array - Use
np.partition()instead of full sorting when you only need the k-smallest or k-largest elements, reducing complexity from O(n log n) to O(n) - The
axisparameter controls sorting direction in multi-dimensional arrays:axis=0sorts columns independently,axis=1sorts rows independently
Introduction to NumPy Sorting
Sorting is one of the most fundamental operations in data processing. Whether you’re ranking search results, organizing time-series data, or preprocessing features for machine learning, you’ll sort arrays constantly. NumPy provides optimized sorting functions that dramatically outperform Python’s built-in sorting on numerical data.
The performance difference isn’t trivial. NumPy’s sorting operates on contiguous memory blocks using compiled C code, while Python’s sorted() must handle Python objects with all their overhead. For a million-element array, NumPy sorting can be 10-50x faster depending on the data type and algorithm.
Beyond raw speed, NumPy offers specialized sorting functions that Python doesn’t: partial sorting, indirect sorting via indices, and multi-dimensional sorting along specific axes. Let’s explore each of these capabilities.
Basic Array Sorting with np.sort()
The np.sort() function is your primary tool for sorting arrays. By default, it returns a sorted copy of the input array in ascending order:
import numpy as np
# Basic 1D sorting
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6])
sorted_arr = np.sort(arr)
print(f"Original: {arr}")
print(f"Sorted: {sorted_arr}")
Output:
Original: [3 1 4 1 5 9 2 6]
Sorted: [1 1 2 3 4 5 6 9]
Notice that arr remains unchanged. This is the key difference from the method form ndarray.sort(), which sorts in-place and returns None:
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6])
# In-place sorting - modifies original array
result = arr.sort()
print(f"Array after sort: {arr}")
print(f"Return value: {result}")
Output:
Array after sort: [1 1 2 3 4 5 6 9]
Return value: None
Use np.sort() when you need to preserve the original array or chain operations. Use ndarray.sort() when memory is tight and you don’t need the original order.
For descending order, NumPy doesn’t have a reverse parameter. Instead, reverse the sorted array:
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6])
# Descending sort
descending = np.sort(arr)[::-1]
print(f"Descending: {descending}")
Output:
Descending: [9 6 5 4 3 2 1 1]
Sorting Multi-Dimensional Arrays
When sorting 2D arrays, the axis parameter determines the sorting direction. This trips up many developers, so let’s clarify with a visual example:
arr_2d = np.array([
[3, 1, 4],
[1, 5, 9],
[2, 6, 5]
])
print("Original array:")
print(arr_2d)
print()
# axis=0: Sort each column independently (sort along rows)
sorted_axis0 = np.sort(arr_2d, axis=0)
print("Sorted along axis=0 (columns sorted independently):")
print(sorted_axis0)
print()
# axis=1: Sort each row independently (sort along columns)
sorted_axis1 = np.sort(arr_2d, axis=1)
print("Sorted along axis=1 (rows sorted independently):")
print(sorted_axis1)
Output:
Original array:
[[3 1 4]
[1 5 9]
[2 6 5]]
Sorted along axis=0 (columns sorted independently):
[[1 1 4]
[2 5 5]
[3 6 9]]
Sorted along axis=1 (rows sorted independently):
[[1 3 4]
[1 5 9]
[2 5 6]]
With axis=0, each column is sorted top-to-bottom. With axis=1, each row is sorted left-to-right. The default is axis=-1, which sorts along the last axis.
To sort the entire array as if it were flattened, use axis=None:
flattened_sort = np.sort(arr_2d, axis=None)
print(f"Flattened and sorted: {flattened_sort}")
Output:
Flattened and sorted: [1 1 2 3 4 5 5 6 9]
Getting Sorted Indices with np.argsort()
Sometimes you need to know where elements would end up after sorting, not the sorted values themselves. np.argsort() returns the indices that would sort the array:
scores = np.array([85, 92, 78, 96, 88])
names = np.array(['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'])
# Get indices that would sort scores
sorted_indices = np.argsort(scores)
print(f"Sorted indices: {sorted_indices}")
# Use indices to sort both arrays consistently
sorted_scores = scores[sorted_indices]
sorted_names = names[sorted_indices]
print(f"Names by ascending score: {sorted_names}")
print(f"Corresponding scores: {sorted_scores}")
# For descending order, reverse the indices
desc_indices = np.argsort(scores)[::-1]
print(f"\nTop performers: {names[desc_indices]}")
print(f"Their scores: {scores[desc_indices]}")
Output:
Sorted indices: [2 0 4 1 3]
Names by ascending score: ['Charlie' 'Alice' 'Eve' 'Bob' 'Diana']
Corresponding scores: [78 85 88 92 96]
Top performers: ['Diana' 'Bob' 'Eve' 'Alice' 'Charlie']
Their scores: [96 92 88 85 78]
This pattern is essential when you have parallel arrays that must stay synchronized. It’s also useful for ranking: the sorted indices tell you each element’s position in the sorted order.
Partial Sorting with np.partition()
Full sorting is O(n log n). But what if you only need the top 5 elements from a million-element array? np.partition() solves this in O(n) time by partially sorting the array around a pivot point:
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5])
# Partition around the 3rd smallest element (index 2)
partitioned = np.partition(arr, 2)
print(f"Original: {arr}")
print(f"Partitioned: {partitioned}")
print(f"3 smallest: {partitioned[:3]}")
Output:
Original: [3 1 4 1 5 9 2 6 5 3 5]
Partitioned: [1 1 2 3 5 9 4 6 5 3 5]
3 smallest: [1 1 2]
After partitioning at index k, all elements before index k are smaller than or equal to the element at index k, and all elements after are greater than or equal. The elements within each partition aren’t sorted—that’s what makes it fast.
For the k-largest elements, use negative indexing:
# Get 3 largest elements
partitioned = np.partition(arr, -3)
print(f"3 largest: {partitioned[-3:]}")
Output:
3 largest: [5 6 9]
Let’s benchmark the performance difference:
import time
large_arr = np.random.rand(1_000_000)
# Full sort
start = time.perf_counter()
sorted_arr = np.sort(large_arr)
top_100_sort = sorted_arr[-100:]
sort_time = time.perf_counter() - start
# Partition
start = time.perf_counter()
partitioned = np.partition(large_arr, -100)
top_100_partition = partitioned[-100:]
partition_time = time.perf_counter() - start
print(f"Full sort time: {sort_time:.4f}s")
print(f"Partition time: {partition_time:.4f}s")
print(f"Speedup: {sort_time/partition_time:.1f}x")
Output (typical):
Full sort time: 0.0523s
Partition time: 0.0089s
Speedup: 5.9x
The speedup increases with array size. For truly large arrays, partition can be 10-20x faster.
Sorting Structured Arrays and Custom Keys
Structured arrays let you sort records by specific fields. This is powerful for tabular data:
# Define a structured array with employee data
dtype = [('name', 'U10'), ('department', 'U10'), ('salary', 'i4')]
employees = np.array([
('Alice', 'Engineering', 95000),
('Bob', 'Sales', 72000),
('Charlie', 'Engineering', 85000),
('Diana', 'Sales', 78000),
('Eve', 'Engineering', 92000)
], dtype=dtype)
# Sort by salary
by_salary = np.sort(employees, order='salary')
print("Sorted by salary:")
for emp in by_salary:
print(f" {emp['name']:10} {emp['department']:12} ${emp['salary']:,}")
print()
# Sort by department, then by salary (descending requires a workaround)
by_dept_salary = np.sort(employees, order=['department', 'salary'])
print("Sorted by department, then salary:")
for emp in by_dept_salary:
print(f" {emp['name']:10} {emp['department']:12} ${emp['salary']:,}")
Output:
Sorted by salary:
Bob Sales $72,000
Diana Sales $78,000
Charlie Engineering $85,000
Eve Engineering $92,000
Alice Engineering $95,000
Sorted by department, then salary:
Charlie Engineering $85,000
Eve Engineering $92,000
Alice Engineering $95,000
Bob Sales $72,000
Diana Sales $78,000
For descending sort on numeric fields, negate the field before sorting:
# Sort by salary descending using argsort
indices = np.argsort(-employees['salary'])
by_salary_desc = employees[indices]
print("Sorted by salary (descending):")
for emp in by_salary_desc:
print(f" {emp['name']:10} ${emp['salary']:,}")
Performance Tips and Algorithm Selection
NumPy offers four sorting algorithms via the kind parameter:
arr = np.random.rand(100_000)
algorithms = ['quicksort', 'mergesort', 'heapsort', 'stable']
for algo in algorithms:
start = time.perf_counter()
for _ in range(10):
np.sort(arr.copy(), kind=algo)
elapsed = (time.perf_counter() - start) / 10
print(f"{algo:12} {elapsed*1000:.2f}ms")
Typical output:
quicksort 5.23ms
mergesort 7.89ms
heapsort 12.45ms
stable 7.91ms
When to use each:
- quicksort (default): Fastest average case, but unstable. Use for most numeric sorting.
- mergesort/stable: Stable sorting preserves relative order of equal elements. Essential when sorting by multiple keys sequentially.
- heapsort: Guaranteed O(n log n) worst case, but slower in practice. Rarely needed.
Stability matters when you sort by multiple criteria:
# Unstable sort may scramble order of equal elements
data = np.array([(1, 'a'), (2, 'b'), (1, 'c'), (2, 'd')],
dtype=[('num', 'i4'), ('letter', 'U1')])
# Stable sort preserves original order within equal groups
stable_sorted = np.sort(data, order='num', kind='stable')
print("Stable sort preserves letter order within equal nums:")
print(stable_sorted)
For maximum performance: ensure your arrays are contiguous in memory (np.ascontiguousarray()), use appropriate dtypes (smaller is faster), and consider np.partition() when you don’t need full sorting.
NumPy’s sorting functions are workhorses you’ll use constantly. Master the axis parameter for multi-dimensional arrays, reach for argsort() when tracking indices matters, and remember that partition() exists for those cases where full sorting is overkill.