NumPy - np.array_equal() - Compare Arrays
• `np.array_equal()` performs element-wise comparison and returns a single boolean, unlike `==` which returns an array of booleans
Key Insights
• np.array_equal() performs element-wise comparison and returns a single boolean, unlike == which returns an array of booleans
• The function handles NaN values differently from standard equality—use equal_nan=True to treat NaN values as equal
• Broadcasting rules don’t apply; arrays must have identical shapes for array_equal() to return True
Understanding np.array_equal() vs Standard Comparison
NumPy’s array_equal() function provides a cleaner way to compare entire arrays than using the equality operator. When you use == between two arrays, you get an element-wise comparison that returns a boolean array. With array_equal(), you get a single True or False indicating whether the arrays are identical.
import numpy as np
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([1, 2, 3, 4])
arr3 = np.array([1, 2, 3, 5])
# Standard equality operator
print(arr1 == arr2) # [True True True True]
print(arr1 == arr3) # [True True True False]
# Using np.array_equal()
print(np.array_equal(arr1, arr2)) # True
print(np.array_equal(arr1, arr3)) # False
The standard approach of using (arr1 == arr2).all() works but is verbose and less efficient. The array_equal() function is optimized for this specific use case and provides clearer intent in your code.
Shape and Dtype Considerations
Arrays must have identical shapes for array_equal() to return True. Unlike broadcasting operations, this function performs strict shape checking before comparing values.
import numpy as np
# Different shapes
arr1 = np.array([1, 2, 3])
arr2 = np.array([[1, 2, 3]])
print(arr1.shape) # (3,)
print(arr2.shape) # (1, 3)
print(np.array_equal(arr1, arr2)) # False
# Same values, different shapes
arr3 = np.array([[1, 2], [3, 4]])
arr4 = np.array([1, 2, 3, 4])
print(np.array_equal(arr3, arr4)) # False
Data types are compared as well, but NumPy applies its standard type promotion rules:
import numpy as np
arr_int = np.array([1, 2, 3], dtype=np.int32)
arr_float = np.array([1.0, 2.0, 3.0], dtype=np.float64)
# Values are equal after type promotion
print(np.array_equal(arr_int, arr_float)) # True
# But with different values
arr_float2 = np.array([1.1, 2.0, 3.0], dtype=np.float64)
print(np.array_equal(arr_int, arr_float2)) # False
Handling NaN Values
NaN (Not a Number) values create special cases in equality comparisons. By default, NaN != NaN in floating-point arithmetic, which means arrays containing NaN values won’t be equal even if NaN appears in the same positions.
import numpy as np
arr1 = np.array([1.0, 2.0, np.nan, 4.0])
arr2 = np.array([1.0, 2.0, np.nan, 4.0])
# Default behavior: NaN != NaN
print(np.array_equal(arr1, arr2)) # False
# Treat NaN as equal
print(np.array_equal(arr1, arr2, equal_nan=True)) # True
This behavior is critical when working with real-world datasets that contain missing values:
import numpy as np
# Simulating sensor data with missing readings
sensor1 = np.array([23.5, 24.1, np.nan, 25.3, np.nan])
sensor2 = np.array([23.5, 24.1, np.nan, 25.3, np.nan])
# Verify sensors recorded identical patterns including missing data
if np.array_equal(sensor1, sensor2, equal_nan=True):
print("Sensors show identical patterns")
else:
print("Sensors diverge")
Multidimensional Array Comparisons
The function works seamlessly with arrays of any dimensionality, comparing all elements regardless of the array structure:
import numpy as np
# 2D arrays
matrix1 = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
matrix2 = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
matrix3 = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 0]])
print(np.array_equal(matrix1, matrix2)) # True
print(np.array_equal(matrix1, matrix3)) # False
# 3D arrays
tensor1 = np.random.rand(3, 4, 5)
tensor2 = tensor1.copy()
tensor3 = np.random.rand(3, 4, 5)
print(np.array_equal(tensor1, tensor2)) # True
print(np.array_equal(tensor1, tensor3)) # False
Performance Characteristics
The array_equal() function implements short-circuit evaluation. It stops comparing as soon as it finds a difference, making it efficient for large arrays that differ early:
import numpy as np
import time
# Large arrays with early difference
size = 10_000_000
arr1 = np.ones(size)
arr2 = np.ones(size)
arr2[0] = 0 # Difference at the start
start = time.perf_counter()
result = np.array_equal(arr1, arr2)
elapsed = time.perf_counter() - start
print(f"Early difference: {elapsed:.6f} seconds") # Very fast
# Large arrays with late difference
arr3 = np.ones(size)
arr4 = np.ones(size)
arr4[-1] = 0 # Difference at the end
start = time.perf_counter()
result = np.array_equal(arr3, arr4)
elapsed = time.perf_counter() - start
print(f"Late difference: {elapsed:.6f} seconds") # Slower
Practical Use Cases
Validating Test Results
import numpy as np
def matrix_multiply(A, B):
"""Custom matrix multiplication implementation"""
return A @ B
# Test against known result
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
expected = np.array([[19, 22], [43, 50]])
result = matrix_multiply(A, B)
assert np.array_equal(result, expected), "Matrix multiplication failed"
Checking Data Integrity
import numpy as np
def verify_data_integrity(original, processed):
"""Ensure processing didn't corrupt data structure"""
if not np.array_equal(original.shape, processed.shape):
raise ValueError("Shape mismatch after processing")
# Check for unexpected NaN introduction
original_nan_mask = np.isnan(original)
processed_nan_mask = np.isnan(processed)
if not np.array_equal(original_nan_mask, processed_nan_mask):
raise ValueError("NaN pattern changed during processing")
# Example usage
data = np.array([[1.0, 2.0, np.nan],
[4.0, 5.0, 6.0]])
processed_data = data * 2 # Some processing
verify_data_integrity(data, processed_data)
Caching and Memoization
import numpy as np
from functools import lru_cache
class ArrayCache:
def __init__(self):
self.cache = {}
self.cache_keys = []
def get(self, key_array):
"""Retrieve cached result for array key"""
for cached_key, result in self.cache.items():
if np.array_equal(key_array, cached_key, equal_nan=True):
return result
return None
def set(self, key_array, result):
"""Store result with array key"""
self.cache[key_array.tobytes()] = result
# Usage
cache = ArrayCache()
input_data = np.array([1, 2, 3, 4, 5])
# First computation
result = input_data.sum()
cache.set(input_data, result)
# Retrieve from cache
cached_result = cache.get(input_data)
Comparison with Alternatives
While array_equal() is the standard choice for exact equality, NumPy provides related functions for different comparison needs:
import numpy as np
arr1 = np.array([1.0, 2.0, 3.0])
arr2 = np.array([1.0000001, 2.0, 3.0])
# Exact equality
print(np.array_equal(arr1, arr2)) # False
# Close equality (floating-point tolerance)
print(np.allclose(arr1, arr2)) # True (default rtol=1e-5)
# Array equivalence (allows broadcasting)
print(np.array_equiv(arr1, arr2)) # False
# Element-wise comparison with tolerance
print(np.isclose(arr1, arr2)) # [True True True]
Choose array_equal() when you need exact equality verification, allclose() for numerical computations where floating-point precision matters, and array_equiv() when broadcasting semantics are acceptable.