NumPy - Comparison Operators (==, !=, <, >, <=, >=)
NumPy's comparison operators (`==`, `!=`, `<`, `>`, `<=`, `>=`) work element-by-element on arrays, returning boolean arrays of the same shape. Unlike Python's built-in operators that return single...
Key Insights
- NumPy comparison operators perform element-wise comparisons and return boolean arrays, enabling vectorized conditional logic that’s 10-100x faster than Python loops
- Broadcasting rules allow comparing arrays of different shapes, automatically expanding dimensions to match compatible arrays without copying data
- Boolean arrays from comparisons integrate directly with indexing,
np.where(),np.any(), andnp.all()for powerful data filtering and conditional operations
Element-Wise Comparison Basics
NumPy’s comparison operators (==, !=, <, >, <=, >=) work element-by-element on arrays, returning boolean arrays of the same shape. Unlike Python’s built-in operators that return single boolean values for lists, NumPy operators vectorize the comparison across all elements.
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
# Element-wise comparisons
print(arr > 3) # [False False False True True]
print(arr == 3) # [False False True False False]
print(arr <= 2) # [ True True False False False]
print(arr != 4) # [ True True True False True]
This works with multi-dimensional arrays identically:
matrix = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
result = matrix >= 5
print(result)
# [[False False False]
# [False True True]
# [ True True True]]
print(result.dtype) # bool
Comparing Arrays with Arrays
When comparing two arrays, NumPy performs element-wise comparison at matching positions. Arrays must have compatible shapes (same shape or broadcastable).
arr1 = np.array([10, 20, 30, 40])
arr2 = np.array([15, 20, 25, 50])
print(arr1 > arr2) # [False False True False]
print(arr1 == arr2) # [False True False False]
# Multi-dimensional comparison
mat1 = np.array([[1, 2], [3, 4]])
mat2 = np.array([[1, 3], [2, 4]])
print(mat1 < mat2)
# [[False True]
# [False False]]
For floating-point comparisons with tolerance, use np.isclose() or np.allclose():
arr1 = np.array([1.0, 2.0, 3.0])
arr2 = np.array([1.0000001, 2.0, 3.0000001])
# Direct comparison may fail due to floating-point precision
print(arr1 == arr2) # [ True True False]
# Use isclose for tolerance-based comparison
print(np.isclose(arr1, arr2)) # [ True True True]
print(np.allclose(arr1, arr2)) # True (single boolean)
Broadcasting in Comparisons
Broadcasting allows comparing arrays of different shapes by automatically expanding dimensions. This eliminates explicit loops and temporary array creation.
# Compare 2D array with 1D array
matrix = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
row = np.array([2, 5, 8])
# Broadcasting: row is compared against each row of matrix
result = matrix > row
print(result)
# [[False False True]
# [ True False False]
# [False False True]]
# Compare with column vector
col = np.array([[3], [6], [9]])
result = matrix < col
print(result)
# [[ True True False]
# [ True True False]
# [ True True False]]
Broadcasting with scalars is the most common pattern:
data = np.array([[10, 20, 30],
[40, 50, 60]])
# Scalar broadcasts to all elements
threshold = 35
mask = data > threshold
print(mask)
# [[False False False]
# [ True True True]]
Boolean Indexing with Comparison Results
Boolean arrays from comparisons serve as masks for filtering data, enabling concise conditional selection without explicit loops.
temperatures = np.array([72, 85, 91, 68, 77, 95, 88])
# Select temperatures above 80
hot_days = temperatures[temperatures > 80]
print(hot_days) # [85 91 95 88]
# Multiple conditions with logical operators
comfortable = temperatures[(temperatures >= 70) & (temperatures <= 85)]
print(comfortable) # [72 85 68 77]
# 2D boolean indexing
data = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Get all elements greater than 5
print(data[data > 5]) # [6 7 8 9]
# Replace values conditionally
data_copy = data.copy()
data_copy[data_copy < 5] = 0
print(data_copy)
# [[0 0 0]
# [0 5 6]
# [7 8 9]]
Conditional Operations with np.where()
np.where() provides vectorized if-else logic, selecting values from two arrays based on a condition.
scores = np.array([45, 78, 92, 65, 88, 54])
# Ternary operation: pass if >= 60, fail otherwise
results = np.where(scores >= 60, 'Pass', 'Fail')
print(results)
# ['Fail' 'Pass' 'Pass' 'Pass' 'Pass' 'Fail']
# Numerical transformation
adjusted = np.where(scores < 60, scores + 10, scores)
print(adjusted) # [55 78 92 65 88 64]
# Multiple conditions using nested np.where()
grades = np.where(scores >= 90, 'A',
np.where(scores >= 80, 'B',
np.where(scores >= 70, 'C', 'D')))
print(grades) # ['D' 'C' 'A' 'D' 'B' 'D']
For complex multi-condition scenarios, use np.select():
conditions = [
scores >= 90,
(scores >= 80) & (scores < 90),
(scores >= 70) & (scores < 80),
scores < 70
]
choices = ['A', 'B', 'C', 'D']
grades = np.select(conditions, choices)
print(grades) # ['D' 'C' 'A' 'D' 'B' 'D']
Aggregating Boolean Arrays
Use np.any() and np.all() to reduce boolean arrays to single values, with optional axis specification for multi-dimensional arrays.
data = np.array([1, 2, 3, 4, 5])
print(np.any(data > 4)) # True (at least one element > 4)
print(np.all(data > 0)) # True (all elements > 0)
print(np.all(data > 2)) # False (not all elements > 2)
# Count True values
print(np.sum(data > 2)) # 3 (True=1, False=0)
print(np.count_nonzero(data > 2)) # 3 (alternative)
# Multi-dimensional aggregation
matrix = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Check along axes
print(np.any(matrix > 5, axis=0)) # [False True True]
print(np.all(matrix > 0, axis=1)) # [ True True True]
# Find positions of True values
indices = np.where(matrix > 5)
print(indices) # (array([1, 2, 2, 2]), array([2, 0, 1, 2]))
print(list(zip(indices[0], indices[1]))) # [(1, 2), (2, 0), (2, 1), (2, 2)]
Performance Considerations
NumPy comparisons vastly outperform Python loops due to vectorization and C-level implementation.
import time
# Large dataset
large_array = np.random.randint(0, 100, size=1000000)
# NumPy vectorized approach
start = time.time()
result_np = large_array > 50
numpy_time = time.time() - start
# Python list comprehension approach
start = time.time()
result_py = [x > 50 for x in large_array.tolist()]
python_time = time.time() - start
print(f"NumPy: {numpy_time:.6f}s")
print(f"Python: {python_time:.6f}s")
print(f"Speedup: {python_time/numpy_time:.1f}x")
# Typical output: NumPy: 0.001s, Python: 0.08s, Speedup: 80x
Memory efficiency matters for boolean indexing:
# Memory-efficient: boolean array is smaller than data
data = np.random.rand(1000000)
mask = data > 0.5 # bool array: 1 byte per element
filtered = data[mask] # Creates new array only for True values
# Less efficient: creating intermediate arrays
# Avoid: data[np.where(data > 0.5)[0]]
# Prefer: data[data > 0.5]
Chain comparisons efficiently using bitwise operators (&, |, ~) with parentheses:
values = np.array([15, 25, 35, 45, 55])
# Correct: use parentheses with bitwise operators
valid = (values > 20) & (values < 50)
print(values[valid]) # [25 35 45]
# Wrong: comparison operators have higher precedence
# valid = values > 20 & values < 50 # Incorrect logic
NumPy comparison operators form the foundation for data filtering, conditional transformations, and logical operations in scientific computing. Master these patterns to write efficient, readable array-processing code.