NumPy - Sort Array (np.sort, np.argsort)

• NumPy provides multiple sorting functions with `np.sort()` returning sorted copies and `np.argsort()` returning indices, while in-place sorting via `ndarray.sort()` modifies arrays directly for...

Key Insights

• NumPy provides multiple sorting functions with np.sort() returning sorted copies and np.argsort() returning indices, while in-place sorting via ndarray.sort() modifies arrays directly for memory efficiency • Multi-dimensional array sorting supports axis-based operations and complex sorting scenarios including sorting by multiple columns using structured arrays or lexsort • Understanding sorting algorithm selection (quicksort, mergesort, heapsort) and stability guarantees is critical for performance optimization in production systems

Basic Array Sorting with np.sort()

The np.sort() function returns a sorted copy of an array without modifying the original. This is the safest approach when you need to preserve the original data structure.

import numpy as np

# 1D array sorting
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6])
sorted_arr = np.sort(arr)
print(f"Original: {arr}")
print(f"Sorted: {sorted_arr}")
# Original: [3 1 4 1 5 9 2 6]
# Sorted: [1 1 2 3 4 5 6 9]

# Descending order
sorted_desc = np.sort(arr)[::-1]
print(f"Descending: {sorted_desc}")
# Descending: [9 6 5 4 3 2 1 1]

For in-place sorting where memory conservation is critical, use the ndarray.sort() method:

arr = np.array([3, 1, 4, 1, 5, 9, 2, 6])
arr.sort()
print(f"In-place sorted: {arr}")
# In-place sorted: [1 1 2 3 4 5 6 9]

Understanding np.argsort() for Index-Based Sorting

The np.argsort() function returns the indices that would sort an array. This is invaluable when you need to sort multiple related arrays based on one array’s values or track original positions.

# Basic argsort usage
values = np.array([30, 10, 40, 20])
indices = np.argsort(values)
print(f"Sorting indices: {indices}")
print(f"Sorted values: {values[indices]}")
# Sorting indices: [1 3 0 2]
# Sorted values: [10 20 30 40]

# Practical example: sorting parallel arrays
names = np.array(['Alice', 'Bob', 'Charlie', 'David'])
scores = np.array([85, 92, 78, 95])

# Sort names by scores
sorted_indices = np.argsort(scores)[::-1]  # Descending
print("Leaderboard:")
for idx in sorted_indices:
    print(f"{names[idx]}: {scores[idx]}")
# David: 95
# Bob: 92
# Alice: 85
# Charlie: 78

Multi-Dimensional Array Sorting

Sorting multi-dimensional arrays requires understanding axis parameters. By default, np.sort() sorts along the last axis.

# 2D array sorting
matrix = np.array([
    [3, 1, 4],
    [1, 5, 9],
    [2, 6, 5]
])

# Sort along last axis (rows)
sorted_rows = np.sort(matrix, axis=1)
print("Sorted rows:")
print(sorted_rows)
# [[1 3 4]
#  [1 5 9]
#  [2 5 6]]

# Sort along first axis (columns)
sorted_cols = np.sort(matrix, axis=0)
print("\nSorted columns:")
print(sorted_cols)
# [[1 1 4]
#  [2 5 5]
#  [3 6 9]]

# Flatten and sort
sorted_flat = np.sort(matrix, axis=None)
print(f"\nFlattened sort: {sorted_flat}")
# [1 1 2 3 4 5 5 6 9]

Sorting by Multiple Columns

For complex sorting scenarios involving multiple criteria, use structured arrays or np.lexsort().

# Method 1: Structured arrays
data = np.array([
    ('Alice', 25, 85000),
    ('Bob', 30, 75000),
    ('Charlie', 25, 90000),
    ('David', 30, 75000)
], dtype=[('name', 'U10'), ('age', 'i4'), ('salary', 'i4')])

# Sort by age, then by salary (descending)
sorted_data = np.sort(data, order=['age', 'salary'])
print("Sorted by age, then salary:")
for row in sorted_data:
    print(row)

# Method 2: np.lexsort() - sorts by last key first
ages = np.array([25, 30, 25, 30])
salaries = np.array([85000, 75000, 90000, 75000])
names = np.array(['Alice', 'Bob', 'Charlie', 'David'])

# Sort by age (primary), then salary (secondary)
indices = np.lexsort((salaries, ages))
print("\nLexsort result:")
for idx in indices:
    print(f"{names[idx]}: Age {ages[idx]}, Salary ${salaries[idx]}")

Sorting Algorithm Selection

NumPy supports three sorting algorithms: quicksort (default), mergesort, and heapsort. Algorithm choice affects performance and stability.

arr = np.array([3, 1, 4, 1, 5, 9, 2, 6])

# Quicksort (default, fastest average case)
quick_sorted = np.sort(arr, kind='quicksort')

# Mergesort (stable, O(n log n) worst case)
merge_sorted = np.sort(arr, kind='mergesort')

# Heapsort (O(n log n) worst case, in-place)
heap_sorted = np.sort(arr, kind='heapsort')

# Stable sort (guaranteed stable)
stable_sorted = np.sort(arr, kind='stable')

Stability matters when sorting structured data:

# Demonstrate stability
records = np.array([
    (1, 'A'),
    (2, 'B'),
    (1, 'C'),
    (2, 'D')
], dtype=[('value', 'i4'), ('label', 'U1')])

# Stable sort preserves relative order of equal elements
stable = np.sort(records, order='value', kind='stable')
print("Stable sort:")
print(stable)
# [(1, 'A') (1, 'C') (2, 'B') (2, 'D')]
# Note: 'A' still comes before 'C' for value=1

Performance Considerations

Understanding performance characteristics helps optimize production code:

import time

# Large array performance comparison
large_arr = np.random.randint(0, 1000000, size=1000000)

# Copy vs in-place
start = time.time()
sorted_copy = np.sort(large_arr)
print(f"np.sort() time: {time.time() - start:.4f}s")

arr_copy = large_arr.copy()
start = time.time()
arr_copy.sort()
print(f"ndarray.sort() time: {time.time() - start:.4f}s")

# Partial sorting with partition (faster when you need k smallest)
arr = np.random.randint(0, 1000, size=10000)
k = 100

start = time.time()
smallest_100_sorted = np.sort(arr)[:k]
full_sort_time = time.time() - start

start = time.time()
smallest_100_partition = np.partition(arr, k)[:k]
partition_time = time.time() - start

print(f"\nFull sort for k smallest: {full_sort_time:.6f}s")
print(f"Partition for k smallest: {partition_time:.6f}s")

Advanced Sorting Patterns

Real-world applications often require sophisticated sorting strategies:

# Sort rows by specific column
data = np.array([
    [3, 100, 'C'],
    [1, 200, 'A'],
    [2, 150, 'B']
], dtype=object)

# Sort by first column
sorted_by_col0 = data[data[:, 0].argsort()]
print("Sorted by column 0:")
print(sorted_by_col0)

# Reverse argsort for descending order
data_numeric = np.array([
    [3, 100],
    [1, 200],
    [2, 150]
])
sorted_desc = data_numeric[data_numeric[:, 1].argsort()[::-1]]
print("\nSorted by column 1 (descending):")
print(sorted_desc)

# Complex sorting: find top N indices
values = np.array([23, 45, 12, 67, 34, 89, 56])
top_3_indices = np.argsort(values)[-3:][::-1]
print(f"\nTop 3 values: {values[top_3_indices]}")
print(f"At indices: {top_3_indices}")

# Sorting with NaN handling
arr_with_nan = np.array([3.5, np.nan, 1.2, np.nan, 4.8, 2.1])
# NaNs go to the end
sorted_with_nan = np.sort(arr_with_nan)
print(f"\nSorted with NaN: {sorted_with_nan}")

Sorting String and Custom Types

NumPy handles various data types including strings and datetime objects:

# String array sorting
names = np.array(['Zara', 'Alice', 'Bob', 'Charlie'])
sorted_names = np.sort(names)
print(f"Sorted names: {sorted_names}")
# ['Alice' 'Bob' 'Charlie' 'Zara']

# Case-insensitive sorting requires custom approach
names_mixed = np.array(['zara', 'Alice', 'bob', 'Charlie'])
indices = np.argsort([name.lower() for name in names_mixed])
case_insensitive_sorted = names_mixed[indices]
print(f"Case-insensitive: {case_insensitive_sorted}")

# DateTime sorting
dates = np.array(['2024-03-15', '2024-01-10', '2024-06-20'], dtype='datetime64')
sorted_dates = np.sort(dates)
print(f"Sorted dates: {sorted_dates}")

NumPy’s sorting functions provide the foundation for data manipulation in scientific computing and data analysis pipelines. Choose np.sort() for non-destructive operations, ndarray.sort() for memory efficiency, and np.argsort() when you need to maintain relationships between multiple arrays. Always consider algorithm stability requirements and use appropriate axis parameters for multi-dimensional data.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.