How to Index Arrays in NumPy
NumPy array indexing goes far beyond what Python lists offer. While Python lists give you basic slicing, NumPy provides a rich vocabulary for selecting, filtering, and reshaping data with minimal...
Key Insights
- NumPy offers three indexing paradigms—basic slicing, fancy indexing, and boolean masking—each with distinct performance characteristics and use cases that you should choose deliberately
- Basic indexing returns views that share memory with the original array, while fancy and boolean indexing always return copies; understanding this distinction prevents subtle bugs and memory issues
- Combining indexing methods unlocks powerful data selection patterns, but mixing slices with fancy indexing requires careful attention to avoid unexpected results
Introduction to NumPy Array Indexing
NumPy array indexing goes far beyond what Python lists offer. While Python lists give you basic slicing, NumPy provides a rich vocabulary for selecting, filtering, and reshaping data with minimal code and maximum performance.
Efficient indexing matters because data manipulation is the foundation of scientific computing, machine learning, and data analysis. Poor indexing choices lead to unnecessary memory copies, slow operations, and code that’s harder to maintain. Master NumPy indexing, and you’ll write cleaner, faster code.
Let’s work through the indexing system from fundamentals to advanced techniques.
Basic Indexing: Single Elements and Slices
NumPy uses zero-based indexing like Python, but extends it to multiple dimensions. The fundamental syntax follows array[start:stop:step] for slices, with negative indices counting from the end.
import numpy as np
# Create a 1D array
arr = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90])
# Single element access
print(arr[0]) # 10 (first element)
print(arr[-1]) # 90 (last element)
print(arr[4]) # 50 (fifth element)
# Slice notation: start:stop:step
print(arr[2:6]) # [30 40 50 60] (indices 2-5)
print(arr[::2]) # [10 30 50 70 90] (every second element)
print(arr[1::2]) # [20 40 60 80] (odd indices)
print(arr[::-1]) # [90 80 70 60 50 40 30 20 10] (reversed)
For multidimensional arrays, separate each dimension’s index with commas:
# Create a 2D array (3 rows, 4 columns)
matrix = np.array([
[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]
])
# Single element: [row, column]
print(matrix[0, 0]) # 1 (top-left)
print(matrix[2, 3]) # 12 (bottom-right)
print(matrix[-1, -1]) # 12 (same as above)
# Row and column slicing
print(matrix[0, :]) # [1 2 3 4] (first row)
print(matrix[:, 0]) # [1 5 9] (first column)
print(matrix[1:, :2]) # [[5 6] [9 10]] (rows 1-2, columns 0-1)
# Step-based slicing
print(matrix[::2, ::2]) # [[1 3] [9 11]] (every other row and column)
For 3D and higher arrays, the pattern continues. Each dimension gets its own slice specification:
# 3D array: 2 "pages" of 3x4 matrices
cube = np.arange(24).reshape(2, 3, 4)
print(cube[0, :, :]) # First page (3x4 matrix)
print(cube[:, 1, :]) # Second row from each page
print(cube[:, :, -1]) # Last column from each page
Fancy Indexing with Integer Arrays
Fancy indexing uses arrays of integers to select arbitrary elements. This lets you grab non-contiguous elements, reorder arrays, and perform complex selections that slicing can’t achieve.
arr = np.array([10, 20, 30, 40, 50, 60, 70])
# Select specific indices
indices = [0, 2, 5]
print(arr[indices]) # [10 30 60]
# Reorder elements
order = [6, 4, 2, 0]
print(arr[order]) # [70 50 30 10]
# Duplicate selections are allowed
repeats = [0, 0, 1, 1, 2, 2]
print(arr[repeats]) # [10 10 20 20 30 30]
With 2D arrays, you can select specific rows or columns:
matrix = np.array([
[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]
])
# Select specific rows
row_indices = [0, 2, 3]
print(matrix[row_indices])
# [[ 1 2 3 4]
# [ 9 10 11 12]
# [13 14 15 16]]
# Select specific columns
col_indices = [1, 3]
print(matrix[:, col_indices])
# [[ 2 4]
# [ 6 8]
# [10 12]
# [14 16]]
# Select individual elements with paired indices
rows = [0, 1, 2]
cols = [0, 1, 2]
print(matrix[rows, cols]) # [1 6 11] (diagonal elements)
Boolean Indexing and Masking
Boolean indexing filters arrays based on conditions. You create a boolean array (mask) and use it to select elements where the condition is True.
data = np.array([15, 22, 8, 45, 12, 67, 3, 89, 34])
# Simple condition
mask = data > 20
print(mask) # [False True False True False True False True True]
print(data[mask]) # [22 45 67 89 34]
# Direct condition in brackets (most common usage)
print(data[data > 20]) # [22 45 67 89 34]
print(data[data % 2 == 0]) # [22 8 12 34] (even numbers)
Combine conditions with & (and), | (or), and ~ (not). Parentheses are required due to operator precedence:
# Multiple conditions
print(data[(data > 10) & (data < 50)]) # [15 22 45 12 34]
print(data[(data < 10) | (data > 60)]) # [8 67 3 89]
print(data[~(data > 20)]) # [15 8 12 3] (NOT greater than 20)
np.where() provides conditional selection with replacement values:
# np.where(condition, value_if_true, value_if_false)
result = np.where(data > 20, data, 0)
print(result) # [ 0 22 0 45 0 67 0 89 34]
# Get indices where condition is true
indices = np.where(data > 50)
print(indices) # (array([5, 7]),)
print(data[indices]) # [67 89]
Boolean indexing works naturally with 2D arrays:
matrix = np.array([
[1, 5, 9],
[2, 6, 10],
[3, 7, 11]
])
# Flatten selection (returns 1D array)
print(matrix[matrix > 5]) # [ 6 9 10 7 11]
# Modify elements matching condition
matrix[matrix % 2 == 0] = 0
print(matrix)
# [[ 1 5 9]
# [ 0 0 0]
# [ 3 7 11]]
Advanced Techniques: Combining Methods
Real-world data manipulation often requires combining indexing methods. Here’s where things get powerful—and where you need to pay attention.
np.ix_() creates an open mesh for cross-product indexing:
matrix = np.arange(16).reshape(4, 4)
print(matrix)
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]
# [12 13 14 15]]
# Select rows 0, 2 and columns 1, 3 (subgrid)
rows = [0, 2]
cols = [1, 3]
# Without np.ix_(): selects elements at (0,1), (2,3)
print(matrix[rows, cols]) # [1 11]
# With np.ix_(): selects the 2x2 subgrid
print(matrix[np.ix_(rows, cols)])
# [[ 1 3]
# [ 9 11]]
np.newaxis adds dimensions for broadcasting:
arr = np.array([1, 2, 3, 4])
# Add dimension: (4,) -> (4, 1)
col_vector = arr[:, np.newaxis]
print(col_vector.shape) # (4, 1)
# Add dimension: (4,) -> (1, 4)
row_vector = arr[np.newaxis, :]
print(row_vector.shape) # (1, 4)
# Useful for broadcasting operations
outer_product = col_vector * row_vector
print(outer_product)
# [[ 1 2 3 4]
# [ 2 4 6 8]
# [ 3 6 9 12]
# [ 4 8 12 16]]
Views vs. Copies: Memory Considerations
This is where many developers get burned. Basic indexing returns a view—a window into the original array’s memory. Fancy and boolean indexing return copies—independent arrays with their own memory.
original = np.array([1, 2, 3, 4, 5])
# Basic slicing returns a VIEW
view = original[1:4]
view[0] = 99
print(original) # [ 1 99 3 4 5] - original changed!
# Fancy indexing returns a COPY
original = np.array([1, 2, 3, 4, 5])
copy = original[[1, 2, 3]]
copy[0] = 99
print(original) # [1 2 3 4 5] - original unchanged
# Boolean indexing returns a COPY
original = np.array([1, 2, 3, 4, 5])
copy = original[original > 2]
copy[0] = 99
print(original) # [1 2 3 4 5] - original unchanged
When you need an explicit copy from basic indexing, use .copy():
original = np.array([1, 2, 3, 4, 5])
safe_copy = original[1:4].copy()
safe_copy[0] = 99
print(original) # [1 2 3 4 5] - original unchanged
Check if an array shares memory with another:
print(np.shares_memory(original, original[1:4])) # True (view)
print(np.shares_memory(original, original[[1, 2, 3]])) # False (copy)
Common Pitfalls and Best Practices
Avoid chained indexing for assignment. This creates ambiguity about views vs. copies:
# Bad: may not modify original
matrix = np.zeros((3, 3))
matrix[0][1] = 5 # Works but unreliable in complex cases
# Good: single indexing operation
matrix[0, 1] = 5
Watch out for off-by-one errors with slices. Remember that stop is exclusive:
arr = np.arange(10)
print(arr[0:5]) # [0 1 2 3 4] - NOT including index 5
Use boolean indexing for filtering, fancy indexing for reordering. Each has its sweet spot:
data = np.array([3, 1, 4, 1, 5, 9, 2, 6])
# Filtering: boolean indexing
filtered = data[data > 3]
# Reordering: fancy indexing with argsort
sorted_data = data[np.argsort(data)]
Prefer vectorized operations over loops. If you’re iterating over indices, you’re probably doing it wrong:
# Slow: loop-based
result = np.zeros(1000)
for i in range(1000):
if arr[i] > 0:
result[i] = arr[i] * 2
# Fast: vectorized with boolean indexing
result = np.where(arr > 0, arr * 2, 0)
NumPy indexing is a skill that compounds. The more fluent you become, the more naturally you’ll express complex data transformations in clean, efficient code. Start with basic slicing, graduate to fancy indexing when you need it, and always keep the view-vs-copy distinction in mind.