NumPy - Create Array of Zeros (np.zeros)

The `np.zeros()` function creates a new array of specified shape filled with zeros. The most basic usage requires only the shape parameter:

Key Insights

  • np.zeros() creates arrays filled with zeros in any shape, with dtype control for memory optimization—use dtype=np.int8 for boolean-like data or dtype=np.float32 for reduced precision needs
  • Pre-allocating arrays with np.zeros() is 10-100x faster than appending to lists in loops, making it essential for numerical computing and image processing workflows
  • Combining np.zeros_like() with existing arrays ensures shape and dtype consistency across operations, preventing broadcasting errors in complex calculations

Basic Array Creation with np.zeros()

The np.zeros() function creates a new array of specified shape filled with zeros. The most basic usage requires only the shape parameter:

import numpy as np

# Create 1D array of 5 zeros
arr_1d = np.zeros(5)
print(arr_1d)
# Output: [0. 0. 0. 0. 0.]

# Create 2D array (3 rows, 4 columns)
arr_2d = np.zeros((3, 4))
print(arr_2d)
# Output:
# [[0. 0. 0. 0.]
#  [0. 0. 0. 0.]
#  [0. 0. 0. 0.]]

# Create 3D array
arr_3d = np.zeros((2, 3, 4))
print(arr_3d.shape)
# Output: (2, 3, 4)

Notice that for multi-dimensional arrays, the shape must be passed as a tuple. The default data type is float64, which is why you see decimal points in the output.

Controlling Data Types with dtype

The dtype parameter controls the data type of array elements, directly impacting memory usage and computational performance:

# Integer zeros (64-bit by default)
int_zeros = np.zeros(5, dtype=int)
print(int_zeros)
# Output: [0 0 0 0 0]

# Explicitly specify integer precision
int8_zeros = np.zeros(1000, dtype=np.int8)
int64_zeros = np.zeros(1000, dtype=np.int64)
print(f"int8 size: {int8_zeros.nbytes} bytes")
print(f"int64 size: {int64_zeros.nbytes} bytes")
# Output:
# int8 size: 1000 bytes
# int64 size: 8000 bytes

# Float precision control
float32_zeros = np.zeros((100, 100), dtype=np.float32)
float64_zeros = np.zeros((100, 100), dtype=np.float64)
print(f"float32 size: {float32_zeros.nbytes} bytes")
print(f"float64 size: {float64_zeros.nbytes} bytes")
# Output:
# float32 size: 40000 bytes
# float64 size: 80000 bytes

# Complex numbers
complex_zeros = np.zeros(5, dtype=complex)
print(complex_zeros)
# Output: [0.+0.j 0.+0.j 0.+0.j 0.+0.j 0.+0.j]

# Boolean arrays
bool_zeros = np.zeros(5, dtype=bool)
print(bool_zeros)
# Output: [False False False False False]

Choosing the right dtype matters for large-scale applications. A 4K image (3840×2160×3 channels) uses 24.9 MB with float32 versus 49.8 MB with float64.

Practical Applications in Image Processing

Pre-allocating arrays with np.zeros() is fundamental in computer vision and image manipulation:

# Create blank RGB image (height, width, channels)
height, width = 480, 640
blank_image = np.zeros((height, width, 3), dtype=np.uint8)

# Create image with colored regions
red_channel = np.zeros((height, width), dtype=np.uint8)
green_channel = np.zeros((height, width), dtype=np.uint8)
blue_channel = np.zeros((height, width), dtype=np.uint8)

# Fill specific region with color
red_channel[100:200, 100:200] = 255
rgb_image = np.stack([red_channel, green_channel, blue_channel], axis=2)

print(f"Image shape: {rgb_image.shape}")
print(f"Image dtype: {rgb_image.dtype}")
# Output:
# Image shape: (480, 640, 3)
# Image dtype: uint8

# Create alpha channel mask
alpha_mask = np.zeros((height, width), dtype=np.float32)
alpha_mask[150:350, 200:400] = 1.0  # Fully opaque region

Performance: Pre-allocation vs Dynamic Growth

Pre-allocating with np.zeros() dramatically outperforms dynamic list operations:

import time

# Bad: Dynamic list growth
start = time.time()
result_list = []
for i in range(100000):
    result_list.append(i * 2)
result_array = np.array(result_list)
list_time = time.time() - start

# Good: Pre-allocated array
start = time.time()
result_zeros = np.zeros(100000, dtype=int)
for i in range(100000):
    result_zeros[i] = i * 2
zeros_time = time.time() - start

# Best: Vectorized operation
start = time.time()
result_vectorized = np.arange(100000) * 2
vectorized_time = time.time() - start

print(f"List append: {list_time:.4f}s")
print(f"Pre-allocated: {zeros_time:.4f}s")
print(f"Vectorized: {vectorized_time:.6f}s")
# Typical output:
# List append: 0.0234s
# Pre-allocated: 0.0089s
# Vectorized: 0.000312s

While vectorized operations are fastest, pre-allocation with np.zeros() is essential when calculations depend on previous iterations.

Using np.zeros_like() for Shape Matching

np.zeros_like() creates a zero array matching another array’s shape and dtype:

# Original array
original = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32)

# Create matching zero array
zeros_matched = np.zeros_like(original)
print(zeros_matched)
print(f"Shape: {zeros_matched.shape}, dtype: {zeros_matched.dtype}")
# Output:
# [[0. 0. 0.]
#  [0. 0. 0.]]
# Shape: (2, 3), dtype: float32

# Override dtype if needed
zeros_int = np.zeros_like(original, dtype=int)
print(zeros_int.dtype)
# Output: int64

# Practical example: Gradient computation
def compute_gradient(image):
    # Pre-allocate gradient arrays with same shape as input
    gradient_x = np.zeros_like(image, dtype=np.float32)
    gradient_y = np.zeros_like(image, dtype=np.float32)
    
    # Compute gradients (simplified)
    gradient_x[:, 1:] = image[:, 1:] - image[:, :-1]
    gradient_y[1:, :] = image[1:, :] - image[:-1, :]
    
    return gradient_x, gradient_y

test_image = np.random.rand(100, 100)
gx, gy = compute_gradient(test_image)
print(f"Gradient shapes: {gx.shape}, {gy.shape}")
# Output: Gradient shapes: (100, 100), (100, 100)

Memory Order: C vs Fortran

The order parameter controls how multi-dimensional arrays are stored in memory:

# C-order (row-major, default)
c_order = np.zeros((1000, 1000), order='C')

# Fortran-order (column-major)
f_order = np.zeros((1000, 1000), order='F')

# Performance difference in access patterns
start = time.time()
for i in range(1000):
    _ = c_order[i, :].sum()  # Row access
c_row_time = time.time() - start

start = time.time()
for i in range(1000):
    _ = f_order[:, i].sum()  # Column access
f_col_time = time.time() - start

print(f"C-order row access: {c_row_time:.4f}s")
print(f"F-order column access: {f_col_time:.4f}s")
# F-order is faster for column operations

Use C-order for most applications. Use Fortran-order when interfacing with Fortran libraries or when column-wise operations dominate.

Advanced Patterns: Batch Processing

Pre-allocate arrays for batch operations in machine learning pipelines:

# Batch processing setup
batch_size = 32
image_height, image_width = 224, 224
num_channels = 3

# Pre-allocate batch array
batch_images = np.zeros((batch_size, image_height, image_width, num_channels), 
                        dtype=np.float32)

# Pre-allocate labels
batch_labels = np.zeros(batch_size, dtype=np.int64)

# Simulate loading batch
for i in range(batch_size):
    # In real scenario, load and preprocess image
    batch_images[i] = np.random.rand(image_height, image_width, num_channels)
    batch_labels[i] = np.random.randint(0, 10)

print(f"Batch shape: {batch_images.shape}")
print(f"Memory usage: {batch_images.nbytes / 1024 / 1024:.2f} MB")
# Output:
# Batch shape: (32, 224, 224, 3)
# Memory usage: 19.27 MB

# Pre-allocate results for inference
num_classes = 10
predictions = np.zeros((batch_size, num_classes), dtype=np.float32)

Common Pitfalls and Solutions

Pitfall 1: Modifying shared zero arrays

# Wrong: All rows share same array
wrong = np.zeros((3, 4))
row_ref = wrong[0]
row_ref[:] = 1
print(wrong[0])  # Only first row modified
# Output: [1. 1. 1. 1.]

# Correct: Each row is independent
correct = np.zeros((3, 4))
correct[0] = 1  # Proper assignment

Pitfall 2: Integer division with float arrays

# Unexpected behavior
float_zeros = np.zeros(5, dtype=float)
float_zeros[0] = 5 / 2  # 2.5
int_zeros = np.zeros(5, dtype=int)
int_zeros[0] = 5 / 2  # Truncated to 2

print(float_zeros[0], int_zeros[0])
# Output: 2.5 2

Pitfall 3: Memory allocation for large arrays

# Check available memory before allocation
def safe_zeros(shape, dtype=float):
    required_bytes = np.prod(shape) * np.dtype(dtype).itemsize
    required_gb = required_bytes / (1024**3)
    
    if required_gb > 10:  # Arbitrary threshold
        print(f"Warning: Allocating {required_gb:.2f} GB")
    
    return np.zeros(shape, dtype=dtype)

# Use for large arrays
large_array = safe_zeros((10000, 10000), dtype=np.float64)
# Output: Warning: Allocating 0.75 GB

The np.zeros() function is a foundational tool for numerical computing in Python. Master dtype selection for memory efficiency, leverage pre-allocation for performance, and use np.zeros_like() to maintain consistency across array operations.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.