NumPy - Create Empty Array (np.empty)

The `np.empty()` function creates a new array without initializing entries to any particular value. Unlike `np.zeros()` or `np.ones()`, it simply allocates memory and returns whatever values happen...

Key Insights

  • np.empty() allocates memory without initializing values, making it the fastest array creation method when you plan to immediately overwrite all elements
  • Empty arrays contain arbitrary values from memory (garbage data), so never use them when you need zeros or specific initial values
  • For performance-critical applications processing large datasets, np.empty() can be 2-3x faster than np.zeros() when combined with immediate value assignment

Understanding np.empty() Fundamentals

The np.empty() function creates a new array without initializing entries to any particular value. Unlike np.zeros() or np.ones(), it simply allocates memory and returns whatever values happen to exist at those memory locations.

import numpy as np

# Create a 1D empty array with 5 elements
arr_1d = np.empty(5)
print(arr_1d)
# Output: [6.23042070e-307 4.67296746e-307 1.69121096e-306 ...] (garbage values)

# Create a 2D empty array
arr_2d = np.empty((3, 4))
print(arr_2d)
# Output: [[random_values...]]

# Create a 3D empty array
arr_3d = np.empty((2, 3, 4))
print(arr_3d.shape)
# Output: (2, 3, 4)

The function signature is straightforward: np.empty(shape, dtype=float, order='C'). The shape parameter accepts an integer for 1D arrays or a tuple for multidimensional arrays.

Specifying Data Types

Control memory allocation precisely by specifying the dtype parameter. This is critical for memory-constrained applications or when interfacing with C libraries.

import numpy as np

# Integer array
int_arr = np.empty(5, dtype=np.int32)
print(int_arr.dtype)  # int32
print(int_arr.itemsize)  # 4 bytes per element

# Float64 (default)
float64_arr = np.empty(5, dtype=np.float64)
print(float64_arr.itemsize)  # 8 bytes per element

# Float32 for reduced memory footprint
float32_arr = np.empty(5, dtype=np.float32)
print(float32_arr.itemsize)  # 4 bytes per element

# Complex numbers
complex_arr = np.empty(3, dtype=np.complex128)
print(complex_arr.dtype)  # complex128

# Boolean array
bool_arr = np.empty(5, dtype=bool)
print(bool_arr)  # [False False False False False] or [True True True True True]

Performance Comparison: empty vs zeros vs ones

The performance advantage of np.empty() becomes significant with large arrays. Here’s a practical benchmark:

import numpy as np
import time

size = (10000, 10000)

# Benchmark np.empty()
start = time.perf_counter()
arr_empty = np.empty(size)
arr_empty.fill(1.0)  # Fill with actual values
time_empty = time.perf_counter() - start

# Benchmark np.zeros()
start = time.perf_counter()
arr_zeros = np.zeros(size)
arr_zeros += 1.0  # Modify to match empty operation
time_zeros = time.perf_counter() - start

# Benchmark np.ones()
start = time.perf_counter()
arr_ones = np.ones(size)
time_ones = time.perf_counter() - start

print(f"np.empty() + fill: {time_empty:.4f}s")
print(f"np.zeros() + add:  {time_zeros:.4f}s")
print(f"np.ones():         {time_ones:.4f}s")
print(f"Speedup: {time_zeros/time_empty:.2f}x")

# Typical output:
# np.empty() + fill: 0.0521s
# np.zeros() + add:  0.1243s
# np.ones():         0.0987s
# Speedup: 2.39x

Practical Use Cases

Pre-allocating Arrays for Computed Results

When you know you’ll immediately overwrite all values, np.empty() is ideal:

import numpy as np

def process_data(input_data):
    """Process input and store results in pre-allocated array."""
    n = len(input_data)
    results = np.empty(n, dtype=np.float64)
    
    for i in range(n):
        # Every element will be assigned
        results[i] = input_data[i] ** 2 + 2 * input_data[i] + 1
    
    return results

data = np.array([1, 2, 3, 4, 5])
output = process_data(data)
print(output)  # [  4.  9. 16. 25. 36.]

Building Arrays from File I/O

When reading data from files where you’ll populate every element:

import numpy as np

def load_matrix_from_file(filename, rows, cols):
    """Load matrix data from file into pre-allocated array."""
    matrix = np.empty((rows, cols), dtype=np.float32)
    
    with open(filename, 'r') as f:
        for i in range(rows):
            line = f.readline().strip().split()
            matrix[i, :] = [float(x) for x in line]
    
    return matrix

# Simulate file content
with open('data.txt', 'w') as f:
    f.write("1.0 2.0 3.0\n")
    f.write("4.0 5.0 6.0\n")

matrix = load_matrix_from_file('data.txt', 2, 3)
print(matrix)
# [[1. 2. 3.]
#  [4. 5. 6.]]

Vectorized Operations with Immediate Assignment

Combine np.empty() with NumPy’s vectorized operations:

import numpy as np

def generate_grid_coordinates(width, height):
    """Generate coordinate grid efficiently."""
    coords = np.empty((height, width, 2), dtype=np.float32)
    
    # Use broadcasting to fill coordinates
    coords[:, :, 0] = np.arange(width)
    coords[:, :, 1] = np.arange(height)[:, np.newaxis]
    
    return coords

grid = generate_grid_coordinates(4, 3)
print(grid)
# [[[0. 0.] [1. 0.] [2. 0.] [3. 0.]]
#  [[0. 1.] [1. 1.] [2. 1.] [3. 1.]]
#  [[0. 2.] [1. 2.] [2. 2.] [3. 2.]]]

Common Pitfalls and Anti-patterns

Never Use Empty Arrays Without Immediate Assignment

This is the most common mistake:

import numpy as np

# WRONG: Using empty array values directly
arr = np.empty(5)
result = arr.sum()  # Meaningless result from garbage data
print(result)  # Could be anything

# CORRECT: Initialize before use
arr = np.empty(5)
arr[:] = 0  # Or use np.zeros() directly
result = arr.sum()
print(result)  # 0.0

Don’t Use empty() for Partial Initialization

import numpy as np

# WRONG: Only some elements assigned
arr = np.empty(10)
arr[0:5] = [1, 2, 3, 4, 5]  # Elements 5-9 contain garbage
print(arr)  # [1. 2. 3. 4. 5. garbage...]

# CORRECT: Use zeros or full initialization
arr = np.zeros(10)
arr[0:5] = [1, 2, 3, 4, 5]
print(arr)  # [1. 2. 3. 4. 5. 0. 0. 0. 0. 0.]

Memory Order: C vs Fortran

The order parameter controls memory layout, critical for performance when interfacing with C or Fortran libraries:

import numpy as np

# C-contiguous (row-major, default)
arr_c = np.empty((1000, 1000), order='C')
print(arr_c.flags['C_CONTIGUOUS'])  # True
print(arr_c.flags['F_CONTIGUOUS'])  # False

# Fortran-contiguous (column-major)
arr_f = np.empty((1000, 1000), order='F')
print(arr_f.flags['C_CONTIGUOUS'])  # False
print(arr_f.flags['F_CONTIGUOUS'])  # True

# Performance impact on row vs column access
import time

# Row access on C-order (fast)
start = time.perf_counter()
for i in range(1000):
    arr_c[i, :] = i
time_c_row = time.perf_counter() - start

# Column access on F-order (fast)
start = time.perf_counter()
for j in range(1000):
    arr_f[:, j] = j
time_f_col = time.perf_counter() - start

print(f"C-order row access: {time_c_row:.4f}s")
print(f"F-order col access: {time_f_col:.4f}s")

Integration with NumPy Ecosystem

Use np.empty_like() to create arrays matching existing array properties:

import numpy as np

# Reference array
reference = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.int32)

# Create empty array with same shape and dtype
result = np.empty_like(reference)
result[:] = reference * 2

print(result)
# [[ 2  4  6]
#  [ 8 10 12]]

print(result.dtype)  # int32
print(result.shape)  # (2, 3)

The np.empty() function is a specialized tool for performance optimization. Use it when you have complete control over subsequent array population and need maximum speed. For all other cases, prefer np.zeros(), np.ones(), or np.full() to avoid subtle bugs from uninitialized memory.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.