How to Use Fancy Indexing in NumPy

NumPy's basic slicing syntax (`arr[1:5]`, `arr[::2]`) handles contiguous or regularly-spaced selections well. But real-world data analysis often requires grabbing arbitrary elements: specific rows...

Key Insights

  • Fancy indexing uses integer or boolean arrays to select non-contiguous elements, always returning copies rather than views—a critical distinction from basic slicing that affects both memory usage and mutation behavior.
  • Boolean masking is the most readable way to filter arrays based on conditions, but combining multiple conditions requires parentheses and bitwise operators (&, |, ~) instead of Python’s and, or, not.
  • When assigning values with duplicate indices, NumPy only applies the last value—use np.add.at() or similar ufunc methods when you need accumulation behavior.

Introduction to Fancy Indexing

NumPy’s basic slicing syntax (arr[1:5], arr[::2]) handles contiguous or regularly-spaced selections well. But real-world data analysis often requires grabbing arbitrary elements: specific rows from a dataset, values meeting certain criteria, or scattered points across a matrix.

Fancy indexing solves this by accepting arrays as indices instead of integers or slices. You pass an array of positions (integer indexing) or an array of True/False values (boolean indexing), and NumPy returns the corresponding elements.

import numpy as np

arr = np.array([10, 20, 30, 40, 50])

# Basic slicing: contiguous elements
print(arr[1:4])  # [20 30 40]

# Fancy indexing: arbitrary elements
indices = np.array([0, 2, 4])
print(arr[indices])  # [10 30 50]

# Boolean indexing: conditional selection
mask = arr > 25
print(arr[mask])  # [30 40 50]

The key difference isn’t just syntax—it’s behavior. Basic slicing returns a view (a window into the original array), while fancy indexing returns a copy. This matters when you’re modifying data or managing memory in large-scale applications.

Indexing with Integer Arrays

Integer array indexing lets you specify exactly which elements you want by their positions. The index array can be any shape, and the output will match that shape.

arr = np.array([100, 200, 300, 400, 500])

# Select elements at positions 0, 3, and 1
indices = np.array([0, 3, 1])
print(arr[indices])  # [100 400 200]

# Repeat indices to duplicate elements
indices = np.array([0, 0, 2, 2])
print(arr[indices])  # [100 100 300 300]

# 2D index array produces 2D output
indices = np.array([[0, 1], [3, 4]])
print(arr[indices])
# [[100 200]
#  [400 500]]

For 2D arrays, you can select entire rows or columns by indexing along one axis:

matrix = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
    [10, 11, 12]
])

# Select rows 0, 2, and 3
row_indices = np.array([0, 2, 3])
print(matrix[row_indices])
# [[ 1  2  3]
#  [ 7  8  9]
#  [10 11 12]]

# Select columns 0 and 2
col_indices = np.array([0, 2])
print(matrix[:, col_indices])
# [[ 1  3]
#  [ 4  6]
#  [ 7  9]
#  [10 12]]

This pattern appears constantly in machine learning: selecting specific features from a dataset, reordering columns, or extracting a subset of samples.

Indexing with Boolean Arrays (Masking)

Boolean indexing uses an array of True/False values to filter elements. Any element where the mask is True gets included in the result.

data = np.array([15, 8, 22, 3, 17, 9, 31])

# Create a mask from a condition
mask = data > 10
print(mask)  # [ True False  True False  True False  True]

# Apply the mask
print(data[mask])  # [15 22 17 31]

# Common shorthand: inline the condition
print(data[data > 10])  # [15 22 17 31]

Combining conditions requires bitwise operators with parentheses—this trips up many Python developers:

# WRONG: Python's 'and' doesn't work with arrays
# data[(data > 10) and (data < 25)]  # ValueError!

# CORRECT: Use & for AND, | for OR, ~ for NOT
print(data[(data > 10) & (data < 25)])  # [15 22 17]
print(data[(data < 5) | (data > 20)])   # [ 3 22 31]
print(data[~(data > 10)])               # [ 8  3  9]

The parentheses aren’t optional. Bitwise operators have higher precedence than comparison operators, so data > 10 & data < 25 gets parsed as data > (10 & data) < 25, which isn’t what you want.

Boolean indexing excels at data cleaning tasks:

# Replace negative values with zero
prices = np.array([10.5, -2.0, 15.3, -0.5, 8.7])
prices[prices < 0] = 0
print(prices)  # [10.5  0.  15.3  0.   8.7]

# Count elements meeting a condition
temperatures = np.array([72, 85, 91, 68, 88, 95, 77])
hot_days = np.sum(temperatures > 85)
print(f"Days above 85°F: {hot_days}")  # Days above 85°F: 3

Multi-dimensional Fancy Indexing

When you provide index arrays for multiple dimensions, NumPy pairs them element-wise to select specific coordinate pairs:

matrix = np.arange(20).reshape(4, 5)
print(matrix)
# [[ 0  1  2  3  4]
#  [ 5  6  7  8  9]
#  [10 11 12 13 14]
#  [15 16 17 18 19]]

# Select elements at (0,1), (1,3), (2,0), (3,4)
rows = np.array([0, 1, 2, 3])
cols = np.array([1, 3, 0, 4])
print(matrix[rows, cols])  # [ 1  8 10 19]

This is different from selecting rows and columns independently. The output shape matches the shape of the index arrays (after broadcasting), not the Cartesian product:

# Selecting specific (row, col) pairs: returns 1D array of 4 elements
rows = np.array([0, 1, 2, 3])
cols = np.array([1, 3, 0, 4])
print(matrix[rows, cols].shape)  # (4,)

# Selecting rows and all their columns: returns 2D array
print(matrix[rows, :].shape)  # (4, 5)

# To get a submatrix (Cartesian product), use np.ix_
print(matrix[np.ix_(rows, cols)])
# [[ 1  3  0  4]
#  [ 6  8  5  9]
#  [11 13 10 14]
#  [16 18 15 19]]

The np.ix_ function creates open meshes that broadcast together, giving you the rectangular submatrix instead of paired coordinates.

Combining Fancy Indexing with Slicing

You can mix fancy indexing with slices in the same expression. This is useful when you want arbitrary selections along one axis but contiguous ranges along another:

data = np.arange(30).reshape(6, 5)
print(data)
# [[ 0  1  2  3  4]
#  [ 5  6  7  8  9]
#  [10 11 12 13 14]
#  [15 16 17 18 19]
#  [20 21 22 23 24]
#  [25 26 27 28 29]]

# Select rows 1, 3, 5 with columns 1 through 3
row_indices = np.array([1, 3, 5])
print(data[row_indices, 1:4])
# [[ 6  7  8]
#  [16 17 18]
#  [26 27 28]]

# Boolean mask on rows, slice on columns
row_mask = np.array([False, True, False, True, False, True])
print(data[row_mask, ::2])  # Every other column
# [[ 5  7  9]
#  [15 17 19]
#  [25 27 29]]

A common pitfall: mixing integer arrays with slices can produce unexpected shapes due to broadcasting rules. When in doubt, break complex indexing into multiple steps and verify intermediate shapes.

Modifying Arrays with Fancy Indexing

Fancy indexing works on the left side of assignments too:

arr = np.zeros(10, dtype=int)
indices = np.array([1, 3, 5, 7])
arr[indices] = 99
print(arr)  # [ 0 99  0 99  0 99  0 99  0  0]

# Assign different values to each position
arr[indices] = [10, 20, 30, 40]
print(arr)  # [ 0 10  0 20  0 30  0 40  0  0]

However, duplicate indices behave unexpectedly:

arr = np.zeros(5, dtype=int)
indices = np.array([1, 1, 1, 2])
arr[indices] += 1
print(arr)  # [0 1 0 1 0] — NOT [0 3 0 1 0]!

NumPy doesn’t accumulate the additions. It evaluates the right side first, then assigns each result independently. The last assignment to index 1 wins.

For accumulation behavior, use ufunc .at() methods:

arr = np.zeros(5, dtype=int)
indices = np.array([1, 1, 1, 2])
np.add.at(arr, indices, 1)
print(arr)  # [0 3 0 1 0] — Correct accumulation

This pattern is essential for histogram-like operations or any scenario where indices repeat.

Performance Considerations and Best Practices

Fancy indexing always creates a copy, which has memory and performance implications:

import time

large_arr = np.random.rand(10_000_000)

# Basic slicing (returns a view)
start = time.perf_counter()
for _ in range(1000):
    view = large_arr[::2]
slice_time = time.perf_counter() - start

# Fancy indexing (returns a copy)
indices = np.arange(0, 10_000_000, 2)
start = time.perf_counter()
for _ in range(1000):
    copy = large_arr[indices]
fancy_time = time.perf_counter() - start

print(f"Slicing: {slice_time:.4f}s")      # ~0.0003s
print(f"Fancy indexing: {fancy_time:.4f}s")  # ~15s

The difference is dramatic because slicing just creates a new array header pointing to existing memory, while fancy indexing allocates new memory and copies data.

Best practices:

  1. Use slicing when possible. If your access pattern is regular, stick with slices.
  2. Preallocate for repeated fancy indexing. If you’re selecting the same indices repeatedly, store the result instead of recomputing.
  3. Use boolean indexing for filtering. It’s readable and NumPy optimizes it well.
  4. Remember the copy behavior. Modifications to fancy-indexed results don’t affect the original array.
  5. Use np.take() for repeated integer indexing. It can be faster and offers an out parameter for preallocated arrays.
# np.take() alternative with preallocated output
indices = np.array([0, 2, 4, 6, 8])
out = np.empty(5)
np.take(large_arr, indices, out=out)

Fancy indexing is one of NumPy’s most powerful features. Master it, and you’ll write cleaner, more expressive array manipulation code. Just stay aware of the copy semantics and reach for basic slicing when your access patterns allow it.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.