NumPy - np.nonzero() - Find Non-Zero Elements
• `np.nonzero()` returns a tuple of arrays containing indices where elements are non-zero, with one array per dimension
Key Insights
• np.nonzero() returns a tuple of arrays containing indices where elements are non-zero, with one array per dimension
• The function treats any non-zero value (including negative numbers, NaN, and infinity) as True, making it useful for conditional indexing
• Combining np.nonzero() with boolean conditions enables powerful filtering operations without explicit loops
Understanding np.nonzero() Fundamentals
The np.nonzero() function identifies the positions of all non-zero elements in an array. It returns a tuple of arrays, where each array contains the indices for that dimension. For a 1D array, you get a single array of indices. For 2D arrays, you get two arrays: one for row indices and one for column indices.
import numpy as np
# 1D array example
arr_1d = np.array([0, 5, 0, 3, 0, 8])
indices = np.nonzero(arr_1d)
print(f"Non-zero indices: {indices}")
print(f"Non-zero values: {arr_1d[indices]}")
Output:
Non-zero indices: (array([1, 3, 5]),)
Non-zero values: [5 3 8]
Notice the result is a tuple containing one array. This consistent tuple structure allows uniform handling regardless of array dimensions.
Working with Multi-Dimensional Arrays
For 2D and higher-dimensional arrays, np.nonzero() returns one index array per dimension. This becomes particularly useful when working with matrices or image data.
# 2D array example
matrix = np.array([
[0, 2, 0],
[4, 0, 6],
[0, 0, 9]
])
rows, cols = np.nonzero(matrix)
print(f"Row indices: {rows}")
print(f"Column indices: {cols}")
print(f"Non-zero values: {matrix[rows, cols]}")
# Pair up coordinates
coordinates = list(zip(rows, cols))
print(f"Coordinates: {coordinates}")
Output:
Row indices: [0 1 1 2]
Column indices: [1 0 2 2]
Non-zero values: [2 4 6 9]
Coordinates: [(0, 1), (1, 0), (1, 2), (2, 2)]
The paired indices directly map to positions in the original array, making it straightforward to extract or modify specific elements.
Conditional Filtering with np.nonzero()
The real power of np.nonzero() emerges when combined with boolean conditions. You can pass any boolean array to find where conditions are True.
data = np.array([15, 22, 8, 31, 19, 5, 28])
# Find indices where values > 20
high_value_indices = np.nonzero(data > 20)
print(f"Values > 20 at indices: {high_value_indices[0]}")
print(f"Actual values: {data[high_value_indices]}")
# Multiple conditions
moderate_indices = np.nonzero((data >= 10) & (data <= 25))
print(f"Values between 10-25: {data[moderate_indices]}")
Output:
Values > 20 at indices: [1 3 6]
Actual values: [22 31 28]
Values between 10-25: [15 22 19]
This approach eliminates the need for explicit loops and provides cleaner, more efficient code.
Practical Example: Sparse Matrix Operations
Sparse matrices contain mostly zeros, making np.nonzero() essential for efficient processing. Here’s a realistic scenario working with adjacency matrices in graph analysis.
# Adjacency matrix representing a directed graph
# 1 indicates an edge from node i to node j
adjacency = np.array([
[0, 1, 0, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 0]
])
# Find all edges
from_nodes, to_nodes = np.nonzero(adjacency)
print("Edge list:")
for src, dst in zip(from_nodes, to_nodes):
print(f" {src} -> {dst}")
# Count outgoing edges per node
unique, counts = np.unique(from_nodes, return_counts=True)
out_degrees = dict(zip(unique, counts))
print(f"\nOut-degrees: {out_degrees}")
Output:
Edge list:
0 -> 1
0 -> 3
1 -> 2
2 -> 3
2 -> 4
3 -> 4
Out-degrees: {0: 2, 1: 1, 2: 2, 3: 1}
Image Processing Applications
In image processing, np.nonzero() helps identify regions of interest, detect edges, or locate specific features.
# Simulated binary image (0 = background, 1 = object)
image = np.array([
[0, 0, 0, 0, 0],
[0, 1, 1, 0, 0],
[0, 1, 1, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 0]
])
# Find bounding box of the object
rows, cols = np.nonzero(image)
if rows.size > 0:
min_row, max_row = rows.min(), rows.max()
min_col, max_col = cols.min(), cols.max()
print(f"Bounding box: ({min_row}, {min_col}) to ({max_row}, {max_col})")
print(f"Object dimensions: {max_row - min_row + 1}x{max_col - min_col + 1}")
# Extract region
region = image[min_row:max_row+1, min_col:max_col+1]
print(f"\nExtracted region:\n{region}")
Output:
Bounding box: (1, 1) to (3, 3)
Object dimensions: 3x3
Extracted region:
[[1 1 0]
[1 1 1]
[0 1 0]]
Performance Comparison: np.nonzero() vs Alternatives
Understanding when to use np.nonzero() versus alternatives like np.where() or boolean indexing matters for performance.
import time
# Large array for benchmarking
large_array = np.random.randint(-100, 100, size=1000000)
# Method 1: np.nonzero()
start = time.time()
indices = np.nonzero(large_array > 0)[0]
values = large_array[indices]
time_nonzero = time.time() - start
# Method 2: np.where()
start = time.time()
indices = np.where(large_array > 0)[0]
values = large_array[indices]
time_where = time.time() - start
# Method 3: Boolean indexing
start = time.time()
values = large_array[large_array > 0]
time_boolean = time.time() - start
print(f"np.nonzero(): {time_nonzero:.6f}s")
print(f"np.where(): {time_where:.6f}s")
print(f"Boolean idx: {time_boolean:.6f}s")
For simple filtering where you only need values, boolean indexing typically performs best. Use np.nonzero() when you specifically need indices for subsequent operations.
Handling Edge Cases
np.nonzero() treats various non-zero values consistently, but understanding edge cases prevents bugs.
# Different "non-zero" values
edge_cases = np.array([0, -5, 0.0, np.nan, np.inf, -np.inf, False, True])
indices = np.nonzero(edge_cases)
print(f"Original array: {edge_cases}")
print(f"Non-zero indices: {indices[0]}")
print(f"Non-zero values: {edge_cases[indices]}")
# Empty result
empty_array = np.zeros(5)
empty_indices = np.nonzero(empty_array)
print(f"\nEmpty result: {empty_indices}")
print(f"Number of non-zero elements: {len(empty_indices[0])}")
Output:
Original array: [ 0. -5. 0. nan inf -inf 0. 1.]
Non-zero indices: [1 3 4 5 7]
Non-zero values: [ -5. nan inf -inf 1.]
Empty result: (array([], dtype=int64),)
Number of non-zero elements: 0
Note that NaN and infinity are treated as non-zero. Always validate your data if these values might appear unexpectedly.
Converting Between Formats
np.nonzero() facilitates conversion between dense and sparse representations, crucial for memory-efficient storage.
# Dense to sparse (COO format simulation)
dense = np.array([
[0, 0, 3],
[4, 0, 0],
[0, 5, 6]
])
rows, cols = np.nonzero(dense)
values = dense[rows, cols]
sparse_representation = {
'rows': rows,
'cols': cols,
'values': values,
'shape': dense.shape
}
print(f"Sparse representation:")
print(f" Rows: {sparse_representation['rows']}")
print(f" Cols: {sparse_representation['cols']}")
print(f" Values: {sparse_representation['values']}")
# Reconstruct dense array
reconstructed = np.zeros(sparse_representation['shape'], dtype=int)
reconstructed[sparse_representation['rows'], sparse_representation['cols']] = sparse_representation['values']
print(f"\nReconstructed:\n{reconstructed}")
print(f"Arrays equal: {np.array_equal(dense, reconstructed)}")
This pattern forms the basis of sparse matrix libraries and reduces memory usage dramatically for large, sparse datasets.