NumPy - np.extract() - Extract Elements by Condition
The `np.extract()` function extracts elements from an array based on a boolean condition. It takes two primary arguments: a condition (boolean array or expression) and the array from which to extract...
Key Insights
np.extract()provides a cleaner alternative to boolean indexing when you need to filter array elements based on conditions, returning a flattened 1D array of matching values- The function accepts a condition array and a source array, making it particularly useful when working with pre-computed boolean masks or complex multi-condition filters
- Unlike boolean indexing which preserves array shape in some cases,
np.extract()always returns a 1D array, which can be both a feature and limitation depending on your use case
Understanding np.extract() Fundamentals
The np.extract() function extracts elements from an array based on a boolean condition. It takes two primary arguments: a condition (boolean array or expression) and the array from which to extract elements.
import numpy as np
# Basic extraction
arr = np.array([10, 20, 30, 40, 50])
condition = arr > 25
result = np.extract(condition, arr)
print(result) # Output: [30 40 50]
The condition array must be broadcastable to the shape of the input array. When the condition evaluates to True at a given position, the corresponding element is included in the output.
# Direct condition in extract
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
result = np.extract(arr % 2 == 0, arr)
print(result) # Output: [ 2 4 6 8 10]
np.extract() vs Boolean Indexing
While np.extract() and boolean indexing often produce similar results, they differ in syntax and output characteristics.
arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Boolean indexing
bool_result = arr[arr > 5]
print("Boolean indexing:", bool_result) # [6 7 8 9]
# np.extract()
extract_result = np.extract(arr > 5, arr)
print("np.extract():", extract_result) # [6 7 8 9]
# Both return flattened 1D arrays for this case
print("Same result:", np.array_equal(bool_result, extract_result)) # True
The key difference emerges when working with pre-computed conditions or when you need explicit function calls for readability:
# Pre-computed complex condition
condition1 = arr > 3
condition2 = arr < 8
combined_condition = condition1 & condition2
# More readable with extract
result = np.extract(combined_condition, arr)
print(result) # [4 5 6 7]
# Equivalent boolean indexing
result2 = arr[combined_condition]
print(result2) # [4 5 6 7]
Working with Multi-Dimensional Arrays
np.extract() always returns a flattened 1D array, regardless of the input array’s dimensionality. This behavior is consistent and predictable.
# 2D array extraction
matrix = np.array([[10, 15, 20],
[25, 30, 35],
[40, 45, 50]])
# Extract values divisible by 5
result = np.extract(matrix % 5 == 0, matrix)
print(result) # [10 15 20 25 30 35 40 45 50]
print(result.shape) # (9,)
# 3D array extraction
cube = np.arange(27).reshape(3, 3, 3)
result = np.extract(cube > 20, cube)
print(result) # [21 22 23 24 25 26]
print(result.shape) # (6,)
Complex Conditional Extraction
You can combine multiple conditions using logical operators to create sophisticated filters.
data = np.array([12, 45, 67, 23, 89, 34, 56, 78, 90, 11])
# Multiple conditions with AND
result = np.extract((data > 20) & (data < 70), data)
print("Between 20 and 70:", result) # [45 67 23 34 56]
# Multiple conditions with OR
result = np.extract((data < 15) | (data > 85), data)
print("Less than 15 OR greater than 85:", result) # [12 89 90 11]
# Complex nested conditions
result = np.extract(((data % 2 == 0) & (data > 30)) | (data < 15), data)
print("Even and >30, OR <15:", result) # [12 34 56 78 90 11]
Practical Use Cases
Filtering Sensor Data
# Simulated temperature sensor readings
temperatures = np.array([22.5, 23.1, 25.8, 28.3, 24.7, 21.9, 26.4, 29.1, 23.8])
timestamps = np.arange(len(temperatures))
# Extract only readings above threshold
threshold = 25.0
high_temps = np.extract(temperatures > threshold, temperatures)
high_temp_times = np.extract(temperatures > threshold, timestamps)
print("High temperature readings:", high_temps)
# [25.8 28.3 26.4 29.1]
print("Occurred at indices:", high_temp_times)
# [2 3 6 7]
Data Cleaning and Outlier Removal
# Dataset with outliers
measurements = np.array([45, 47, 46, 48, 150, 44, 49, 47, -20, 46, 48])
# Calculate mean and standard deviation
mean = np.mean(measurements)
std = np.std(measurements)
# Extract values within 2 standard deviations
lower_bound = mean - 2 * std
upper_bound = mean + 2 * std
cleaned_data = np.extract(
(measurements >= lower_bound) & (measurements <= upper_bound),
measurements
)
print("Original:", measurements)
print("Cleaned:", cleaned_data) # [45 47 46 48 44 49 47 46 48]
print(f"Removed {len(measurements) - len(cleaned_data)} outliers")
Financial Data Analysis
# Stock prices and volumes
prices = np.array([150.2, 152.8, 148.5, 155.3, 153.7, 149.2, 151.6])
volumes = np.array([1000, 1500, 800, 2000, 1200, 900, 1100])
# Extract high-volume trading days
high_volume_threshold = 1000
high_volume_prices = np.extract(volumes > high_volume_threshold, prices)
high_volumes = np.extract(volumes > high_volume_threshold, volumes)
print("Prices on high-volume days:", high_volume_prices)
# [152.8 155.3 153.7 151.6]
print("Corresponding volumes:", high_volumes)
# [1500 2000 1200 1100]
# Calculate average price on high-volume days
avg_high_volume_price = np.mean(high_volume_prices)
print(f"Average price on high-volume days: ${avg_high_volume_price:.2f}")
Broadcasting Conditions
The condition array must be broadcastable to the source array’s shape. This enables powerful pattern-based extraction.
# 2D array
matrix = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
# Row-based condition (broadcasts across columns)
row_condition = np.array([True, False, True])[:, np.newaxis]
result = np.extract(row_condition, matrix)
print("Rows 0 and 2:", result) # [ 1 2 3 4 9 10 11 12]
# Column-based condition (broadcasts across rows)
col_condition = np.array([False, True, True, False])
result = np.extract(col_condition, matrix)
print("Columns 1 and 2:", result) # [ 2 3 6 7 10 11]
Performance Considerations
For large arrays, np.extract() performs similarly to boolean indexing, but creating the condition array can be the bottleneck.
import time
# Large array
large_arr = np.random.randint(0, 100, size=10_000_000)
# Time np.extract
start = time.time()
result1 = np.extract(large_arr > 50, large_arr)
extract_time = time.time() - start
# Time boolean indexing
start = time.time()
result2 = large_arr[large_arr > 50]
indexing_time = time.time() - start
print(f"np.extract: {extract_time:.4f}s")
print(f"Boolean indexing: {indexing_time:.4f}s")
print(f"Results equal: {np.array_equal(result1, result2)}")
Both methods have similar performance characteristics. Choose np.extract() when you need explicit function semantics or when working with pre-computed conditions. Use boolean indexing for more concise code in simple cases.
Edge Cases and Gotchas
# Empty result
arr = np.array([1, 2, 3, 4, 5])
result = np.extract(arr > 10, arr)
print("Empty result:", result) # []
print("Shape:", result.shape) # (0,)
# All True condition
result = np.extract(arr > 0, arr)
print("All elements:", result) # [1 2 3 4 5]
# Mismatched shapes raise errors
try:
condition = np.array([True, False]) # Length 2
arr = np.array([1, 2, 3, 4]) # Length 4
np.extract(condition, arr)
except ValueError as e:
print(f"Error: {e}")
The np.extract() function provides a clean, functional approach to conditional array filtering. Use it when you need explicit extraction semantics, pre-computed conditions, or when building data processing pipelines where function calls improve code clarity.