NumPy - Indexing Multi-Dimensional Arrays
NumPy arrays support indexing along each dimension using comma-separated indices. Each index corresponds to an axis, starting from axis 0.
Key Insights
- NumPy’s advanced indexing supports integer arrays, boolean masks, and fancy indexing to select arbitrary elements from multi-dimensional arrays, enabling complex data extraction patterns beyond basic slicing
- Understanding the difference between views (basic slicing) and copies (fancy indexing) is critical for memory efficiency and avoiding unintended side effects when modifying array subsets
- Broadcasting rules apply to index arrays, allowing you to combine indices of different dimensions to select elements along multiple axes simultaneously
Basic Multi-Dimensional Indexing
NumPy arrays support indexing along each dimension using comma-separated indices. Each index corresponds to an axis, starting from axis 0.
import numpy as np
# Create a 3D array
arr = np.arange(24).reshape(2, 3, 4)
print(arr)
# [[[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
#
# [[12 13 14 15]
# [16 17 18 19]
# [20 21 22 23]]]
# Access single element
element = arr[1, 2, 3] # 23
# Access along specific axes
row = arr[0, 1] # [4 5 6 7]
column = arr[0, :, 2] # [2 6 10]
Omitting indices selects all elements along that axis. The colon : explicitly selects all elements, while omitting trailing indices does the same implicitly.
# These are equivalent
slice1 = arr[0]
slice2 = arr[0, :]
slice3 = arr[0, :, :]
# Access last element using negative indexing
last = arr[-1, -1, -1] # 23
Slicing Multi-Dimensional Arrays
Slicing uses the start:stop:step syntax for each dimension. Basic slicing returns views, not copies, making it memory-efficient.
arr = np.arange(24).reshape(4, 6)
# Slice rows and columns
subset = arr[1:3, 2:5]
# [[8 9 10]
# [14 15 16]]
# Every other row, specific columns
subset = arr[::2, 1:4]
# [[1 2 3]
# [13 14 15]]
# Reverse rows
reversed_arr = arr[::-1, :]
# Slice with negative indices
subset = arr[-3:-1, -4:-1]
Views share memory with the original array. Modifying a view modifies the original:
arr = np.arange(12).reshape(3, 4)
view = arr[1:, 2:]
view[0, 0] = 999
print(arr)
# [[ 0 1 2 3]
# [ 4 5 999 7]
# [ 8 9 10 11]]
Use .copy() to create independent arrays:
independent = arr[1:, 2:].copy()
independent[0, 0] = 0 # Original arr unchanged
Integer Array Indexing
Integer array indexing (fancy indexing) selects arbitrary elements using arrays of indices. This creates copies, not views.
arr = np.arange(20).reshape(4, 5)
# Select specific rows
rows = arr[[0, 2, 3]]
# [[ 0 1 2 3 4]
# [10 11 12 13 14]
# [15 16 17 18 19]]
# Select specific elements from each row
row_indices = np.array([0, 1, 2, 3])
col_indices = np.array([0, 2, 4, 1])
elements = arr[row_indices, col_indices]
# [0 7 14 16]
When using multiple index arrays, they must be broadcastable to the same shape. The result shape matches the broadcasted index array shape:
arr = np.arange(24).reshape(4, 6)
# Select 2x2 subgrid
rows = np.array([[0, 1], [2, 3]])
cols = np.array([[1, 2], [3, 4]])
subgrid = arr[rows, cols]
# [[ 1 8]
# [15 22]]
# Broadcasting: single row index with multiple columns
row = np.array([1])
cols = np.array([0, 2, 4])
result = arr[row, cols] # [6 8 10]
Boolean Indexing
Boolean masks filter arrays based on conditions. The mask must be broadcastable with the array.
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Create boolean mask
mask = arr > 5
# [[False False False]
# [False False True]
# [ True True True]]
# Extract elements
filtered = arr[mask] # [6 7 8 9]
# Modify elements matching condition
arr[arr % 2 == 0] = 0
# [[1 0 3]
# [0 5 0]
# [7 0 9]]
Combine multiple conditions using bitwise operators (&, |, ~):
arr = np.arange(20).reshape(4, 5)
# Multiple conditions
mask = (arr > 5) & (arr < 15)
result = arr[mask]
# Complex filtering
arr[(arr % 3 == 0) | (arr % 5 == 0)] = -1
Boolean indexing along specific axes:
arr = np.random.randn(5, 4)
# Select rows where first column > 0
row_mask = arr[:, 0] > 0
selected_rows = arr[row_mask]
# Select columns where mean > 0
col_mask = arr.mean(axis=0) > 0
selected_cols = arr[:, col_mask]
Combining Indexing Methods
Mix slicing, integer indexing, and boolean indexing for complex selections:
arr = np.arange(60).reshape(3, 4, 5)
# Slice first dimension, index second, slice third
result = arr[0:2, [1, 3], 2:4]
# Boolean mask on one axis, slice others
mask = np.array([True, False, True, False])
result = arr[:, mask, :]
# Integer array with slicing
rows = np.array([0, 2])
result = arr[rows, :, 1:4]
Using np.ix_ for Cartesian product indexing:
arr = np.arange(20).reshape(4, 5)
# Select rows 0,2 and columns 1,3,4
rows = np.array([0, 2])
cols = np.array([1, 3, 4])
result = arr[np.ix_(rows, cols)]
# [[ 1 3 4]
# [11 13 14]]
Ellipsis and newaxis
The ellipsis ... represents all dimensions not explicitly indexed:
arr = np.arange(120).reshape(2, 3, 4, 5)
# These are equivalent
result1 = arr[0, :, :, 2]
result2 = arr[0, ..., 2]
# Select first element of first and last dimensions
result = arr[0, ..., 2] # Shape: (3, 4)
np.newaxis adds dimensions:
arr = np.arange(12).reshape(3, 4)
# Add dimension at beginning
expanded = arr[np.newaxis, :, :] # Shape: (1, 3, 4)
# Add dimension in middle
expanded = arr[:, np.newaxis, :] # Shape: (3, 1, 4)
# Useful for broadcasting
vec = np.array([1, 2, 3])
result = arr + vec[:, np.newaxis] # Broadcast along columns
Advanced Selection Patterns
Selecting diagonal elements:
arr = np.arange(16).reshape(4, 4)
# Main diagonal
diag = np.diag(arr) # [0 5 10 15]
# Custom diagonal using indices
rows = np.arange(4)
cols = np.arange(4)
diagonal = arr[rows, cols]
# Anti-diagonal
anti_diag = arr[rows, rows[::-1]] # [3 6 9 12]
Selecting blocks:
arr = np.arange(64).reshape(8, 8)
# Select 2x2 blocks
block_rows = np.array([0, 0, 1, 1])
block_cols = np.array([0, 1, 0, 1])
blocks = arr[block_rows[:, None], block_cols]
# More elegant block selection
def get_block(arr, start_row, start_col, height, width):
return arr[start_row:start_row+height, start_col:start_col+width]
block = get_block(arr, 2, 3, 3, 3)
Performance Considerations
Basic slicing is faster than fancy indexing because it returns views:
import time
arr = np.random.randn(1000, 1000)
# Fast: view operation
start = time.time()
for _ in range(1000):
view = arr[100:200, 200:300]
print(f"Slicing: {time.time() - start:.4f}s")
# Slower: copy operation
start = time.time()
for _ in range(1000):
copy = arr[[i for i in range(100, 200)], :]
print(f"Fancy indexing: {time.time() - start:.4f}s")
Use boolean indexing efficiently:
# Inefficient: creates intermediate boolean array for each condition
result = arr[(arr > 0) & (arr < 10) & (arr % 2 == 0)]
# More efficient: use np.where for complex conditions
indices = np.where((arr > 0) & (arr < 10) & (arr % 2 == 0))
result = arr[indices]
# Or use np.extract
result = np.extract((arr > 0) & (arr < 10) & (arr % 2 == 0), arr)
Understanding these indexing patterns enables efficient data manipulation in scientific computing, image processing, and machine learning pipelines where selecting specific array regions is fundamental to algorithm implementation.