NumPy - Comparison Operators (==, !=, <, >, <=, >=)

Key Insights

NumPy comparison operators perform element-wise comparisons and return boolean arrays, enabling vectorized conditional logic that’s 10-100x faster than Python loops
Broadcasting rules allow comparing arrays of different shapes, automatically expanding dimensions to match compatible arrays without copying data
Boolean arrays from comparisons integrate directly with indexing, np.where(), np.any(), and np.all() for powerful data filtering and conditional operations

Element-Wise Comparison Basics

NumPy’s comparison operators (==, !=, <, >, <=, >=) work element-by-element on arrays, returning boolean arrays of the same shape. Unlike Python’s built-in operators that return single boolean values for lists, NumPy operators vectorize the comparison across all elements.

import numpy as np

arr = np.array([1, 2, 3, 4, 5])

# Element-wise comparisons
print(arr > 3)        # [False False False  True  True]
print(arr == 3)       # [False False  True False False]
print(arr <= 2)       # [ True  True False False False]
print(arr != 4)       # [ True  True  True False  True]

This works with multi-dimensional arrays identically:

matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

result = matrix >= 5
print(result)
# [[False False False]
#  [False  True  True]
#  [ True  True  True]]

print(result.dtype)  # bool

Comparing Arrays with Arrays

When comparing two arrays, NumPy performs element-wise comparison at matching positions. Arrays must have compatible shapes (same shape or broadcastable).

arr1 = np.array([10, 20, 30, 40])
arr2 = np.array([15, 20, 25, 50])

print(arr1 > arr2)   # [False False  True False]
print(arr1 == arr2)  # [False  True False False]

# Multi-dimensional comparison
mat1 = np.array([[1, 2], [3, 4]])
mat2 = np.array([[1, 3], [2, 4]])

print(mat1 < mat2)
# [[False  True]
#  [False False]]

For floating-point comparisons with tolerance, use np.isclose() or np.allclose():

arr1 = np.array([1.0, 2.0, 3.0])
arr2 = np.array([1.0000001, 2.0, 3.0000001])

# Direct comparison may fail due to floating-point precision
print(arr1 == arr2)  # [ True  True False]

# Use isclose for tolerance-based comparison
print(np.isclose(arr1, arr2))  # [ True  True  True]
print(np.allclose(arr1, arr2))  # True (single boolean)

Broadcasting in Comparisons

Broadcasting allows comparing arrays of different shapes by automatically expanding dimensions. This eliminates explicit loops and temporary array creation.

# Compare 2D array with 1D array
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

row = np.array([2, 5, 8])

# Broadcasting: row is compared against each row of matrix
result = matrix > row
print(result)
# [[False False  True]
#  [ True False False]
#  [False False  True]]

# Compare with column vector
col = np.array([[3], [6], [9]])
result = matrix < col
print(result)
# [[ True  True False]
#  [ True  True False]
#  [ True  True False]]

Broadcasting with scalars is the most common pattern:

data = np.array([[10, 20, 30],
                 [40, 50, 60]])

# Scalar broadcasts to all elements
threshold = 35
mask = data > threshold
print(mask)
# [[False False False]
#  [ True  True  True]]

Boolean Indexing with Comparison Results

Boolean arrays from comparisons serve as masks for filtering data, enabling concise conditional selection without explicit loops.

temperatures = np.array([72, 85, 91, 68, 77, 95, 88])

# Select temperatures above 80
hot_days = temperatures[temperatures > 80]
print(hot_days)  # [85 91 95 88]

# Multiple conditions with logical operators
comfortable = temperatures[(temperatures >= 70) & (temperatures <= 85)]
print(comfortable)  # [72 85 68 77]

# 2D boolean indexing
data = np.array([[1, 2, 3],
                 [4, 5, 6],
                 [7, 8, 9]])

# Get all elements greater than 5
print(data[data > 5])  # [6 7 8 9]

# Replace values conditionally
data_copy = data.copy()
data_copy[data_copy < 5] = 0
print(data_copy)
# [[0 0 0]
#  [0 5 6]
#  [7 8 9]]

Conditional Operations with np.where()

np.where() provides vectorized if-else logic, selecting values from two arrays based on a condition.

scores = np.array([45, 78, 92, 65, 88, 54])

# Ternary operation: pass if >= 60, fail otherwise
results = np.where(scores >= 60, 'Pass', 'Fail')
print(results)
# ['Fail' 'Pass' 'Pass' 'Pass' 'Pass' 'Fail']

# Numerical transformation
adjusted = np.where(scores < 60, scores + 10, scores)
print(adjusted)  # [55 78 92 65 88 64]

# Multiple conditions using nested np.where()
grades = np.where(scores >= 90, 'A',
                  np.where(scores >= 80, 'B',
                          np.where(scores >= 70, 'C', 'D')))
print(grades)  # ['D' 'C' 'A' 'D' 'B' 'D']

For complex multi-condition scenarios, use np.select():

conditions = [
    scores >= 90,
    (scores >= 80) & (scores < 90),
    (scores >= 70) & (scores < 80),
    scores < 70
]
choices = ['A', 'B', 'C', 'D']

grades = np.select(conditions, choices)
print(grades)  # ['D' 'C' 'A' 'D' 'B' 'D']

Aggregating Boolean Arrays

Use np.any() and np.all() to reduce boolean arrays to single values, with optional axis specification for multi-dimensional arrays.

data = np.array([1, 2, 3, 4, 5])

print(np.any(data > 4))   # True (at least one element > 4)
print(np.all(data > 0))   # True (all elements > 0)
print(np.all(data > 2))   # False (not all elements > 2)

# Count True values
print(np.sum(data > 2))   # 3 (True=1, False=0)
print(np.count_nonzero(data > 2))  # 3 (alternative)

# Multi-dimensional aggregation
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Check along axes
print(np.any(matrix > 5, axis=0))  # [False  True  True]
print(np.all(matrix > 0, axis=1))  # [ True  True  True]

# Find positions of True values
indices = np.where(matrix > 5)
print(indices)  # (array([1, 2, 2, 2]), array([2, 0, 1, 2]))
print(list(zip(indices[0], indices[1])))  # [(1, 2), (2, 0), (2, 1), (2, 2)]

Performance Considerations

NumPy comparisons vastly outperform Python loops due to vectorization and C-level implementation.

import time

# Large dataset
large_array = np.random.randint(0, 100, size=1000000)

# NumPy vectorized approach
start = time.time()
result_np = large_array > 50
numpy_time = time.time() - start

# Python list comprehension approach
start = time.time()
result_py = [x > 50 for x in large_array.tolist()]
python_time = time.time() - start

print(f"NumPy: {numpy_time:.6f}s")
print(f"Python: {python_time:.6f}s")
print(f"Speedup: {python_time/numpy_time:.1f}x")
# Typical output: NumPy: 0.001s, Python: 0.08s, Speedup: 80x

Memory efficiency matters for boolean indexing:

# Memory-efficient: boolean array is smaller than data
data = np.random.rand(1000000)
mask = data > 0.5  # bool array: 1 byte per element
filtered = data[mask]  # Creates new array only for True values

# Less efficient: creating intermediate arrays
# Avoid: data[np.where(data > 0.5)[0]]
# Prefer: data[data > 0.5]

Chain comparisons efficiently using bitwise operators (&, |, ~) with parentheses:

values = np.array([15, 25, 35, 45, 55])

# Correct: use parentheses with bitwise operators
valid = (values > 20) & (values < 50)
print(values[valid])  # [25 35 45]

# Wrong: comparison operators have higher precedence
# valid = values > 20 & values < 50  # Incorrect logic

NumPy comparison operators form the foundation for data filtering, conditional transformations, and logical operations in scientific computing. Master these patterns to write efficient, readable array-processing code.