NumPy - np.argmin() and np.argmax()
• `np.argmin()` and `np.argmax()` return indices of minimum and maximum values, not the values themselves—critical for locating positions in arrays for further operations
Key Insights
• np.argmin() and np.argmax() return indices of minimum and maximum values, not the values themselves—critical for locating positions in arrays for further operations
• Both functions support axis-based operations on multidimensional arrays, enabling row-wise or column-wise index extraction without manual iteration
• Performance scales linearly O(n) but offers 10-100x speedup over Python loops due to vectorized C implementation, making them essential for large-scale data processing
Understanding Index Retrieval vs Value Retrieval
NumPy’s argmin() and argmax() functions locate positions of extreme values rather than returning the values themselves. This distinction matters when you need to perform subsequent operations at those locations.
import numpy as np
temperatures = np.array([22.5, 18.3, 25.7, 19.1, 23.4])
# Get index of minimum temperature
min_idx = np.argmin(temperatures)
print(f"Coldest reading at index: {min_idx}") # Output: 1
print(f"Temperature: {temperatures[min_idx]}°C") # Output: 18.3°C
# Get index of maximum temperature
max_idx = np.argmax(temperatures)
print(f"Hottest reading at index: {max_idx}") # Output: 2
print(f"Temperature: {temperatures[max_idx]}°C") # Output: 25.7°C
For simple value extraction, use np.min() or np.max(). Use argmin()/argmax() when you need the index for slicing, masking, or cross-referencing with other arrays.
# Practical use case: finding corresponding timestamp
timestamps = np.array(['09:00', '10:00', '11:00', '12:00', '13:00'])
peak_time = timestamps[max_idx]
print(f"Peak temperature occurred at: {peak_time}") # Output: 11:00
Working with Multidimensional Arrays
The axis parameter controls which dimension to search along. Understanding axis behavior prevents common indexing errors.
# 2D array: daily temperatures across 4 cities for 5 days
temps = np.array([
[22, 25, 20, 23], # Day 1
[24, 26, 19, 22], # Day 2
[21, 24, 18, 21], # Day 3
[23, 27, 21, 24], # Day 4
[25, 28, 22, 25] # Day 5
])
# axis=0: search down columns (across days for each city)
coldest_day_per_city = np.argmin(temps, axis=0)
print(f"Coldest day index for each city: {coldest_day_per_city}")
# Output: [2 2 2 2] - Day 3 was coldest for all cities
# axis=1: search across rows (across cities for each day)
coldest_city_per_day = np.argmin(temps, axis=1)
print(f"Coldest city index for each day: {coldest_city_per_day}")
# Output: [2 2 2 2 2] - City 3 was coldest every day
# axis=None (default): flatten and search entire array
overall_min_idx = np.argmin(temps)
print(f"Overall minimum at flat index: {overall_min_idx}") # Output: 10
print(f"Value: {temps.flat[overall_min_idx]}°C") # Output: 18°C
Convert flattened indices to multidimensional coordinates using np.unravel_index():
min_coords = np.unravel_index(overall_min_idx, temps.shape)
print(f"Minimum at day {min_coords[0]}, city {min_coords[1]}")
# Output: Minimum at day 2, city 2
print(f"Verification: {temps[min_coords]}°C") # Output: 18°C
Handling Ties and Multiple Extrema
Both functions return the first occurrence when multiple elements share the extreme value. This deterministic behavior matters for reproducibility.
scores = np.array([85, 92, 78, 92, 88, 92])
max_idx = np.argmax(scores)
print(f"First maximum at index: {max_idx}") # Output: 1
# Find all occurrences of maximum value
max_value = scores[max_idx]
all_max_indices = np.where(scores == max_value)[0]
print(f"All maximum indices: {all_max_indices}") # Output: [1 3 5]
# Alternative: use argwhere for multidimensional arrays
all_max_coords = np.argwhere(scores == max_value)
print(f"Coordinates of all maxima: {all_max_coords.flatten()}")
For finding top-k indices, combine with np.argsort() or np.argpartition():
# Top 3 scores
top_3_indices = np.argsort(scores)[-3:][::-1]
print(f"Top 3 indices: {top_3_indices}") # Output: [1 3 5]
print(f"Top 3 scores: {scores[top_3_indices]}") # Output: [92 92 92]
# More efficient for large arrays: partial sort
top_3_partition = np.argpartition(scores, -3)[-3:]
print(f"Top 3 (unordered): {scores[top_3_partition]}")
Performance Characteristics
Vectorized operations outperform Python loops significantly. Here’s a benchmark comparison:
import time
# Large dataset
data = np.random.randn(10_000_000)
# NumPy approach
start = time.perf_counter()
idx_np = np.argmin(data)
time_np = time.perf_counter() - start
# Pure Python approach
start = time.perf_counter()
idx_py = min(range(len(data)), key=lambda i: data[i])
time_py = time.perf_counter() - start
print(f"NumPy: {time_np:.4f}s")
print(f"Python: {time_py:.4f}s")
print(f"Speedup: {time_py/time_np:.1f}x")
# Typical output:
# NumPy: 0.0089s
# Python: 0.9234s
# Speedup: 103.7x
Memory efficiency matters for large multidimensional arrays:
# Memory-efficient axis operations
large_matrix = np.random.randn(10000, 5000)
# Returns array of 5000 indices (one per column)
min_indices = np.argmin(large_matrix, axis=0)
print(f"Memory for indices: {min_indices.nbytes / 1024:.2f} KB")
# Output: ~39 KB (int64 indices)
# Compare to storing actual values
min_values = np.min(large_matrix, axis=0)
print(f"Memory for values: {min_values.nbytes / 1024:.2f} KB")
# Output: ~39 KB (float64 values)
Real-World Applications
Portfolio optimization: Find best/worst performing assets.
# Daily returns for 5 stocks over 252 trading days
returns = np.random.randn(252, 5) * 0.02
# Cumulative returns
cumulative_returns = np.cumprod(1 + returns, axis=0) - 1
# Best performing stock
best_stock = np.argmax(cumulative_returns[-1, :])
print(f"Best stock: {best_stock}, Return: {cumulative_returns[-1, best_stock]:.2%}")
# Worst drawdown day for each stock
daily_max = np.maximum.accumulate(cumulative_returns, axis=0)
drawdowns = (cumulative_returns - daily_max) / (1 + daily_max)
worst_drawdown_day = np.argmin(drawdowns, axis=0)
for stock in range(5):
day = worst_drawdown_day[stock]
dd = drawdowns[day, stock]
print(f"Stock {stock} worst drawdown: {dd:.2%} on day {day}")
Image processing: Locate brightest/darkest pixels.
# Grayscale image simulation
image = np.random.randint(0, 256, size=(480, 640), dtype=np.uint8)
# Find brightest pixel
brightest_idx = np.argmax(image)
brightest_coords = np.unravel_index(brightest_idx, image.shape)
print(f"Brightest pixel at row {brightest_coords[0]}, col {brightest_coords[1]}")
# Find darkest pixel in each row (useful for shadow detection)
darkest_per_row = np.argmin(image, axis=1)
print(f"Darkest column per row (first 10): {darkest_per_row[:10]}")
Time series analysis: Detect anomalies and extrema.
# Sensor readings with anomaly
sensor_data = np.random.randn(1000) * 10 + 50
sensor_data[456] = 150 # Anomaly
# Rolling window maximum detection
window_size = 50
rolling_max_indices = np.array([
np.argmax(sensor_data[i:i+window_size]) + i
for i in range(len(sensor_data) - window_size)
])
# Detect persistent maxima (potential anomalies)
unique, counts = np.unique(rolling_max_indices, return_counts=True)
anomalies = unique[counts > window_size // 2]
print(f"Potential anomaly indices: {anomalies}") # Should include 456
Edge Cases and Gotchas
Empty arrays raise ValueError:
try:
np.argmin(np.array([]))
except ValueError as e:
print(f"Error: {e}") # Output: Error: attempt to get argmin of an empty sequence
NaN handling requires explicit treatment:
data_with_nan = np.array([1.5, np.nan, 2.3, 0.8, np.nan])
# Default behavior: NaN propagates
print(np.argmin(data_with_nan)) # Output: 1 (first NaN)
# Ignore NaN values
masked = np.ma.masked_array(data_with_nan, np.isnan(data_with_nan))
print(np.argmin(masked)) # Output: 3 (0.8 is actual minimum)
# Alternative: use nanargmin
print(np.nanargmin(data_with_nan)) # Output: 3
These functions provide the foundation for efficient array manipulation in scientific computing, data analysis, and machine learning pipelines. Their O(n) complexity and vectorized implementation make them indispensable for production systems processing millions of data points.