NumPy - np.extract() - Extract Elements by Condition

The `np.extract()` function extracts elements from an array based on a boolean condition. It takes two primary arguments: a condition (boolean array or expression) and the array from which to extract...

Key Insights

  • np.extract() provides a cleaner alternative to boolean indexing when you need to filter array elements based on conditions, returning a flattened 1D array of matching values
  • The function accepts a condition array and a source array, making it particularly useful when working with pre-computed boolean masks or complex multi-condition filters
  • Unlike boolean indexing which preserves array shape in some cases, np.extract() always returns a 1D array, which can be both a feature and limitation depending on your use case

Understanding np.extract() Fundamentals

The np.extract() function extracts elements from an array based on a boolean condition. It takes two primary arguments: a condition (boolean array or expression) and the array from which to extract elements.

import numpy as np

# Basic extraction
arr = np.array([10, 20, 30, 40, 50])
condition = arr > 25

result = np.extract(condition, arr)
print(result)  # Output: [30 40 50]

The condition array must be broadcastable to the shape of the input array. When the condition evaluates to True at a given position, the corresponding element is included in the output.

# Direct condition in extract
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
result = np.extract(arr % 2 == 0, arr)
print(result)  # Output: [ 2  4  6  8 10]

np.extract() vs Boolean Indexing

While np.extract() and boolean indexing often produce similar results, they differ in syntax and output characteristics.

arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

# Boolean indexing
bool_result = arr[arr > 5]
print("Boolean indexing:", bool_result)  # [6 7 8 9]

# np.extract()
extract_result = np.extract(arr > 5, arr)
print("np.extract():", extract_result)  # [6 7 8 9]

# Both return flattened 1D arrays for this case
print("Same result:", np.array_equal(bool_result, extract_result))  # True

The key difference emerges when working with pre-computed conditions or when you need explicit function calls for readability:

# Pre-computed complex condition
condition1 = arr > 3
condition2 = arr < 8
combined_condition = condition1 & condition2

# More readable with extract
result = np.extract(combined_condition, arr)
print(result)  # [4 5 6 7]

# Equivalent boolean indexing
result2 = arr[combined_condition]
print(result2)  # [4 5 6 7]

Working with Multi-Dimensional Arrays

np.extract() always returns a flattened 1D array, regardless of the input array’s dimensionality. This behavior is consistent and predictable.

# 2D array extraction
matrix = np.array([[10, 15, 20],
                   [25, 30, 35],
                   [40, 45, 50]])

# Extract values divisible by 5
result = np.extract(matrix % 5 == 0, matrix)
print(result)  # [10 15 20 25 30 35 40 45 50]
print(result.shape)  # (9,)

# 3D array extraction
cube = np.arange(27).reshape(3, 3, 3)
result = np.extract(cube > 20, cube)
print(result)  # [21 22 23 24 25 26]
print(result.shape)  # (6,)

Complex Conditional Extraction

You can combine multiple conditions using logical operators to create sophisticated filters.

data = np.array([12, 45, 67, 23, 89, 34, 56, 78, 90, 11])

# Multiple conditions with AND
result = np.extract((data > 20) & (data < 70), data)
print("Between 20 and 70:", result)  # [45 67 23 34 56]

# Multiple conditions with OR
result = np.extract((data < 15) | (data > 85), data)
print("Less than 15 OR greater than 85:", result)  # [12 89 90 11]

# Complex nested conditions
result = np.extract(((data % 2 == 0) & (data > 30)) | (data < 15), data)
print("Even and >30, OR <15:", result)  # [12 34 56 78 90 11]

Practical Use Cases

Filtering Sensor Data

# Simulated temperature sensor readings
temperatures = np.array([22.5, 23.1, 25.8, 28.3, 24.7, 21.9, 26.4, 29.1, 23.8])
timestamps = np.arange(len(temperatures))

# Extract only readings above threshold
threshold = 25.0
high_temps = np.extract(temperatures > threshold, temperatures)
high_temp_times = np.extract(temperatures > threshold, timestamps)

print("High temperature readings:", high_temps)
# [25.8 28.3 26.4 29.1]
print("Occurred at indices:", high_temp_times)
# [2 3 6 7]

Data Cleaning and Outlier Removal

# Dataset with outliers
measurements = np.array([45, 47, 46, 48, 150, 44, 49, 47, -20, 46, 48])

# Calculate mean and standard deviation
mean = np.mean(measurements)
std = np.std(measurements)

# Extract values within 2 standard deviations
lower_bound = mean - 2 * std
upper_bound = mean + 2 * std

cleaned_data = np.extract(
    (measurements >= lower_bound) & (measurements <= upper_bound),
    measurements
)

print("Original:", measurements)
print("Cleaned:", cleaned_data)  # [45 47 46 48 44 49 47 46 48]
print(f"Removed {len(measurements) - len(cleaned_data)} outliers")

Financial Data Analysis

# Stock prices and volumes
prices = np.array([150.2, 152.8, 148.5, 155.3, 153.7, 149.2, 151.6])
volumes = np.array([1000, 1500, 800, 2000, 1200, 900, 1100])

# Extract high-volume trading days
high_volume_threshold = 1000
high_volume_prices = np.extract(volumes > high_volume_threshold, prices)
high_volumes = np.extract(volumes > high_volume_threshold, volumes)

print("Prices on high-volume days:", high_volume_prices)
# [152.8 155.3 153.7 151.6]
print("Corresponding volumes:", high_volumes)
# [1500 2000 1200 1100]

# Calculate average price on high-volume days
avg_high_volume_price = np.mean(high_volume_prices)
print(f"Average price on high-volume days: ${avg_high_volume_price:.2f}")

Broadcasting Conditions

The condition array must be broadcastable to the source array’s shape. This enables powerful pattern-based extraction.

# 2D array
matrix = np.array([[1, 2, 3, 4],
                   [5, 6, 7, 8],
                   [9, 10, 11, 12]])

# Row-based condition (broadcasts across columns)
row_condition = np.array([True, False, True])[:, np.newaxis]
result = np.extract(row_condition, matrix)
print("Rows 0 and 2:", result)  # [ 1  2  3  4  9 10 11 12]

# Column-based condition (broadcasts across rows)
col_condition = np.array([False, True, True, False])
result = np.extract(col_condition, matrix)
print("Columns 1 and 2:", result)  # [ 2  3  6  7 10 11]

Performance Considerations

For large arrays, np.extract() performs similarly to boolean indexing, but creating the condition array can be the bottleneck.

import time

# Large array
large_arr = np.random.randint(0, 100, size=10_000_000)

# Time np.extract
start = time.time()
result1 = np.extract(large_arr > 50, large_arr)
extract_time = time.time() - start

# Time boolean indexing
start = time.time()
result2 = large_arr[large_arr > 50]
indexing_time = time.time() - start

print(f"np.extract: {extract_time:.4f}s")
print(f"Boolean indexing: {indexing_time:.4f}s")
print(f"Results equal: {np.array_equal(result1, result2)}")

Both methods have similar performance characteristics. Choose np.extract() when you need explicit function semantics or when working with pre-computed conditions. Use boolean indexing for more concise code in simple cases.

Edge Cases and Gotchas

# Empty result
arr = np.array([1, 2, 3, 4, 5])
result = np.extract(arr > 10, arr)
print("Empty result:", result)  # []
print("Shape:", result.shape)  # (0,)

# All True condition
result = np.extract(arr > 0, arr)
print("All elements:", result)  # [1 2 3 4 5]

# Mismatched shapes raise errors
try:
    condition = np.array([True, False])  # Length 2
    arr = np.array([1, 2, 3, 4])  # Length 4
    np.extract(condition, arr)
except ValueError as e:
    print(f"Error: {e}")

The np.extract() function provides a clean, functional approach to conditional array filtering. Use it when you need explicit extraction semantics, pre-computed conditions, or when building data processing pipelines where function calls improve code clarity.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.