NumPy - Create Random Array (np.random)
NumPy offers two approaches for random number generation. The legacy `np.random` module functions remain widely used but are considered superseded by the Generator-based API introduced in NumPy 1.17.
Key Insights
- NumPy’s random module provides two APIs: the legacy
np.randomfunctions and the modern Generator-based approach usingnp.random.default_rng(), with the latter offering better statistical properties and independent random streams - Random array generation covers multiple distributions (uniform, normal, binomial) and data types, with precise control over shape, range, and reproducibility through seed management
- Performance optimization requires understanding vectorization benefits, choosing appropriate distributions, and leveraging NumPy’s C-based implementations over Python loops for generating large random datasets
Legacy Random Functions vs Modern Generator API
NumPy offers two approaches for random number generation. The legacy np.random module functions remain widely used but are considered superseded by the Generator-based API introduced in NumPy 1.17.
import numpy as np
# Legacy approach (still functional but not recommended for new code)
legacy_array = np.random.rand(3, 4)
print("Legacy random array:\n", legacy_array)
# Modern Generator approach (recommended)
rng = np.random.default_rng(seed=42)
modern_array = rng.random((3, 4))
print("\nModern random array:\n", modern_array)
The Generator API provides better statistical properties, independent streams, and future-proof code. Each Generator instance maintains its own state, enabling parallel random number generation without interference.
# Multiple independent generators
rng1 = np.random.default_rng(seed=100)
rng2 = np.random.default_rng(seed=200)
# These won't interfere with each other
arr1 = rng1.integers(0, 10, size=5)
arr2 = rng2.integers(0, 10, size=5)
print("Independent stream 1:", arr1)
print("Independent stream 2:", arr2)
Uniform Distribution Random Arrays
The uniform distribution generates values where each number in the range has equal probability. Use random() for floats between 0 and 1, or uniform() for custom ranges.
rng = np.random.default_rng(seed=42)
# Random floats in [0.0, 1.0)
uniform_01 = rng.random((3, 3))
print("Uniform [0, 1):\n", uniform_01)
# Random floats in custom range [5.0, 10.0)
uniform_custom = rng.uniform(5.0, 10.0, size=(3, 3))
print("\nUniform [5, 10):\n", uniform_custom)
# Random integers (exclusive upper bound)
integers = rng.integers(low=0, high=100, size=(2, 5))
print("\nRandom integers [0, 100):\n", integers)
# Include endpoint
integers_inclusive = rng.integers(low=1, high=6, size=10, endpoint=True)
print("\nDice rolls [1, 6]:", integers_inclusive)
Legacy equivalents still work but lack the Generator’s advantages:
# Legacy functions (avoid in new code)
np.random.seed(42)
legacy_uniform = np.random.random((3, 3))
legacy_integers = np.random.randint(0, 100, size=(2, 5))
Normal (Gaussian) Distribution
Normal distributions are essential for statistical simulations and machine learning initialization. The normal() method accepts mean (μ) and standard deviation (σ) parameters.
rng = np.random.default_rng(seed=42)
# Standard normal distribution (μ=0, σ=1)
standard_normal = rng.standard_normal(size=(1000,))
print(f"Mean: {standard_normal.mean():.4f}, Std: {standard_normal.std():.4f}")
# Custom normal distribution (μ=100, σ=15)
custom_normal = rng.normal(loc=100, scale=15, size=(1000,))
print(f"Mean: {custom_normal.mean():.4f}, Std: {custom_normal.std():.4f}")
# Multi-dimensional normal array
normal_2d = rng.normal(loc=0, scale=1, size=(3, 4))
print("\n2D normal array:\n", normal_2d)
For weight initialization in neural networks:
# Xavier/Glorot initialization for a layer with 128 inputs, 64 outputs
fan_in, fan_out = 128, 64
limit = np.sqrt(6 / (fan_in + fan_out))
weights = rng.uniform(-limit, limit, size=(fan_in, fan_out))
print(f"Weight matrix shape: {weights.shape}")
print(f"Weight range: [{weights.min():.4f}, {weights.max():.4f}]")
Other Common Distributions
NumPy supports numerous probability distributions for specialized use cases.
rng = np.random.default_rng(seed=42)
# Binomial distribution (coin flips, A/B testing)
# 10 trials, 50% success probability, 1000 experiments
binomial = rng.binomial(n=10, p=0.5, size=1000)
print(f"Binomial - Mean successes: {binomial.mean():.2f}")
# Poisson distribution (event counts, arrival rates)
# Average rate of 3 events per interval
poisson = rng.poisson(lam=3.0, size=1000)
print(f"Poisson - Mean events: {poisson.mean():.2f}")
# Exponential distribution (time between events)
# Scale parameter (1/lambda)
exponential = rng.exponential(scale=2.0, size=1000)
print(f"Exponential - Mean time: {exponential.mean():.2f}")
# Multinomial distribution (rolling multiple dice)
# 20 rolls of a 6-sided die
multinomial = rng.multinomial(n=20, pvals=[1/6]*6, size=5)
print("Multinomial (dice rolls):\n", multinomial)
Random Sampling and Shuffling
Generate random samples from existing arrays or create permutations without replacement.
rng = np.random.default_rng(seed=42)
# Sample from existing array without replacement
population = np.arange(100)
sample = rng.choice(population, size=10, replace=False)
print("Random sample:", sample)
# Sample with replacement and custom probabilities
options = ['A', 'B', 'C', 'D']
probabilities = [0.1, 0.2, 0.3, 0.4]
weighted_sample = rng.choice(options, size=20, p=probabilities)
print("Weighted sample:", weighted_sample)
# Shuffle array in-place
deck = np.arange(52)
rng.shuffle(deck)
print("Shuffled deck (first 10):", deck[:10])
# Permutation (returns shuffled copy)
original = np.arange(10)
permuted = rng.permutation(original)
print("Original:", original)
print("Permuted:", permuted)
Reproducibility with Seeds
Seed management ensures reproducible results across runs, critical for debugging and scientific reproducibility.
# Same seed produces identical sequences
rng1 = np.random.default_rng(seed=12345)
rng2 = np.random.default_rng(seed=12345)
arr1 = rng1.random(5)
arr2 = rng2.random(5)
print("Identical:", np.array_equal(arr1, arr2))
# Different seeds produce different sequences
rng3 = np.random.default_rng(seed=67890)
arr3 = rng3.random(5)
print("Different:", not np.array_equal(arr1, arr3))
# Save and restore generator state
rng = np.random.default_rng(seed=42)
state = rng.bit_generator.state
arr_before = rng.random(3)
rng.bit_generator.state = state # Restore state
arr_after = rng.random(3)
print("Reproduced:", np.array_equal(arr_before, arr_after))
Performance Considerations
Vectorized random generation significantly outperforms loops for large arrays.
import time
rng = np.random.default_rng(seed=42)
n = 1_000_000
# Vectorized approach (fast)
start = time.perf_counter()
vectorized = rng.random(n)
vectorized_time = time.perf_counter() - start
# Loop approach (slow - avoid)
start = time.perf_counter()
looped = np.array([rng.random() for _ in range(n)])
loop_time = time.perf_counter() - start
print(f"Vectorized: {vectorized_time:.4f}s")
print(f"Loop: {loop_time:.4f}s")
print(f"Speedup: {loop_time/vectorized_time:.1f}x")
For memory-efficient generation of large arrays:
# Generate in chunks to manage memory
def generate_large_array(rng, total_size, chunk_size=1_000_000):
"""Generate large random array in chunks"""
result = np.empty(total_size)
for i in range(0, total_size, chunk_size):
end = min(i + chunk_size, total_size)
result[i:end] = rng.random(end - i)
return result
rng = np.random.default_rng(seed=42)
large_array = generate_large_array(rng, total_size=10_000_000)
print(f"Generated array shape: {large_array.shape}")
Practical Application: Data Augmentation
Random arrays power data augmentation pipelines in machine learning.
def augment_image_data(image_array, rng):
"""Apply random transformations to image data"""
augmented = image_array.copy()
# Random brightness adjustment
brightness_factor = rng.uniform(0.8, 1.2)
augmented = augmented * brightness_factor
# Random noise injection
noise = rng.normal(0, 0.01, size=image_array.shape)
augmented = augmented + noise
# Random horizontal flip (50% probability)
if rng.random() > 0.5:
augmented = np.fliplr(augmented)
return np.clip(augmented, 0, 1)
# Simulate image batch (10 images, 28x28 pixels)
rng = np.random.default_rng(seed=42)
images = rng.random((10, 28, 28))
augmented_images = np.array([augment_image_data(img, rng) for img in images])
print(f"Augmented batch shape: {augmented_images.shape}")
The modern Generator API provides robust, efficient random number generation for scientific computing, machine learning, and simulation tasks. Always prefer default_rng() over legacy functions for new projects to ensure better statistical properties and maintainable code.