NumPy - Random Module (np.random) Complete Guide
• NumPy's random module provides two APIs: the legacy `np.random` functions and the modern Generator-based approach with `np.random.default_rng()`, which offers better statistical properties and...
Key Insights
• NumPy’s random module provides two APIs: the legacy np.random functions and the modern Generator-based approach with np.random.default_rng(), which offers better statistical properties and performance
• Understanding the difference between sampling distributions (uniform, normal, binomial) and manipulating arrays (shuffle, choice, permutation) is critical for scientific computing and data analysis
• Reproducibility through seed management is essential for debugging and testing, but the implementation differs significantly between legacy and modern APIs
Legacy vs Modern Random Number Generation
NumPy offers two distinct approaches to random number generation. The legacy API uses global state through functions like np.random.rand(), while the modern approach uses Generator instances for better encapsulation and thread safety.
import numpy as np
# Legacy approach (not recommended for new code)
np.random.seed(42)
legacy_random = np.random.rand(5)
print(f"Legacy: {legacy_random}")
# Modern approach (recommended)
rng = np.random.default_rng(42)
modern_random = rng.random(5)
print(f"Modern: {modern_random}")
# Multiple independent generators
rng1 = np.random.default_rng(42)
rng2 = np.random.default_rng(123)
print(f"Generator 1: {rng1.random(3)}")
print(f"Generator 2: {rng2.random(3)}")
The Generator approach provides better statistical quality through the PCG64 bit generator and eliminates issues with global state in multithreaded applications.
Uniform Distribution Sampling
Uniform distributions generate values where each number in the range has equal probability. This is the foundation for most random sampling operations.
rng = np.random.default_rng(42)
# Random floats in [0.0, 1.0)
uniform_0_1 = rng.random(10)
print(f"Uniform [0,1): {uniform_0_1}")
# Random floats in custom range [low, high)
uniform_custom = rng.uniform(low=-5, high=5, size=(3, 4))
print(f"Uniform [-5,5):\n{uniform_custom}")
# Random integers (exclusive upper bound)
random_ints = rng.integers(low=0, high=100, size=10)
print(f"Random integers [0,100): {random_ints}")
# Random integers (inclusive upper bound)
dice_rolls = rng.integers(low=1, high=6, size=20, endpoint=True)
print(f"Dice rolls: {dice_rolls}")
The integers() method replaces the legacy randint() function and provides clearer semantics with the endpoint parameter.
Normal and Other Continuous Distributions
Normal (Gaussian) distributions are ubiquitous in statistical applications, machine learning, and simulation work.
rng = np.random.default_rng(42)
# Standard normal (mean=0, std=1)
standard_normal = rng.standard_normal(1000)
print(f"Mean: {standard_normal.mean():.4f}, Std: {standard_normal.std():.4f}")
# Normal with custom parameters
custom_normal = rng.normal(loc=100, scale=15, size=1000)
print(f"Mean: {custom_normal.mean():.2f}, Std: {custom_normal.std():.2f}")
# Exponential distribution (e.g., time between events)
exponential = rng.exponential(scale=2.0, size=1000)
print(f"Exponential mean: {exponential.mean():.2f}")
# Beta distribution (bounded between 0 and 1)
beta = rng.beta(a=2, b=5, size=1000)
print(f"Beta range: [{beta.min():.3f}, {beta.max():.3f}]")
# Gamma distribution
gamma = rng.gamma(shape=2.0, scale=2.0, size=1000)
print(f"Gamma mean: {gamma.mean():.2f}")
These distributions model different real-world phenomena: normal for measurement errors, exponential for waiting times, beta for probabilities, and gamma for positive-valued processes.
Discrete Distributions
Discrete distributions generate integer values following specific probability patterns, essential for modeling count data and categorical outcomes.
rng = np.random.default_rng(42)
# Binomial: number of successes in n trials
# Example: 10 coin flips, 1000 times
coin_flips = rng.binomial(n=10, p=0.5, size=1000)
print(f"Binomial (10 flips): mean={coin_flips.mean():.2f}")
# Poisson: events in fixed interval
# Example: customer arrivals (average 5 per hour)
arrivals = rng.poisson(lam=5, size=1000)
print(f"Poisson arrivals: mean={arrivals.mean():.2f}")
# Multinomial: rolling a die 100 times
die_rolls = rng.multinomial(n=100, pvals=[1/6]*6, size=10)
print(f"Die rolls (10 experiments of 100 rolls):\n{die_rolls}")
# Geometric: number of trials until first success
trials_to_success = rng.geometric(p=0.3, size=1000)
print(f"Geometric mean trials: {trials_to_success.mean():.2f}")
Array Manipulation and Sampling
Beyond generating random numbers, NumPy provides powerful functions for randomizing existing arrays and sampling from populations.
rng = np.random.default_rng(42)
# Shuffle in-place
arr = np.arange(10)
rng.shuffle(arr)
print(f"Shuffled: {arr}")
# Permutation (returns new array)
original = np.arange(10)
permuted = rng.permutation(original)
print(f"Original: {original}")
print(f"Permuted: {permuted}")
# Permutation with 2D arrays (shuffles along first axis)
matrix = np.arange(20).reshape(4, 5)
shuffled_rows = rng.permutation(matrix)
print(f"Shuffled rows:\n{shuffled_rows}")
# Choice: sample with or without replacement
population = np.array(['A', 'B', 'C', 'D', 'E'])
# Without replacement (default)
sample_without = rng.choice(population, size=3, replace=False)
print(f"Sample without replacement: {sample_without}")
# With replacement
sample_with = rng.choice(population, size=10, replace=True)
print(f"Sample with replacement: {sample_with}")
# Weighted sampling
weights = np.array([0.1, 0.1, 0.2, 0.3, 0.3])
weighted_sample = rng.choice(population, size=100, p=weights)
unique, counts = np.unique(weighted_sample, return_counts=True)
print(f"Weighted distribution: {dict(zip(unique, counts))}")
Reproducibility and Seed Management
Proper seed management ensures reproducible results across runs, critical for debugging, testing, and scientific reproducibility.
# Reproducible sequence with same seed
rng1 = np.random.default_rng(42)
rng2 = np.random.default_rng(42)
print(f"Same seed: {np.array_equal(rng1.random(5), rng2.random(5))}")
# Different seeds produce different sequences
rng3 = np.random.default_rng(123)
print(f"Different seed: {np.array_equal(rng1.random(5), rng3.random(5))}")
# Spawning independent generators (for parallel processing)
parent_rng = np.random.default_rng(42)
child_rngs = parent_rng.spawn(4)
for i, child in enumerate(child_rngs):
print(f"Child {i}: {child.random(3)}")
# Using SeedSequence for advanced control
from numpy.random import SeedSequence, default_rng
ss = SeedSequence(12345)
child_seeds = ss.spawn(3)
generators = [default_rng(s) for s in child_seeds]
for i, gen in enumerate(generators):
print(f"Generator {i}: {gen.integers(0, 100, 3)}")
Practical Application: Monte Carlo Simulation
Here’s a complete example demonstrating multiple random functions in a realistic scenario—estimating π using Monte Carlo methods.
rng = np.random.default_rng(42)
def estimate_pi(n_samples):
# Generate random points in [0,1] x [0,1] square
x = rng.uniform(0, 1, n_samples)
y = rng.uniform(0, 1, n_samples)
# Check if points fall inside quarter circle
inside_circle = (x**2 + y**2) <= 1
# π/4 = (area of quarter circle) / (area of square)
pi_estimate = 4 * inside_circle.sum() / n_samples
return pi_estimate
# Run with increasing sample sizes
for n in [1000, 10000, 100000, 1000000]:
pi_est = estimate_pi(n)
error = abs(pi_est - np.pi)
print(f"n={n:7d}: π ≈ {pi_est:.6f}, error={error:.6f}")
Performance Considerations
The Generator API offers significant performance improvements, especially for large arrays.
import time
# Legacy API timing
start = time.time()
np.random.seed(42)
for _ in range(1000):
_ = np.random.rand(1000)
legacy_time = time.time() - start
# Modern API timing
start = time.time()
rng = np.random.default_rng(42)
for _ in range(1000):
_ = rng.random(1000)
modern_time = time.time() - start
print(f"Legacy API: {legacy_time:.4f}s")
print(f"Modern API: {modern_time:.4f}s")
print(f"Speedup: {legacy_time/modern_time:.2f}x")
# Vectorization is crucial
start = time.time()
result = rng.random((1000, 1000)) # One call
vectorized_time = time.time() - start
start = time.time()
result = np.array([rng.random(1000) for _ in range(1000)]) # Loop
loop_time = time.time() - start
print(f"\nVectorized: {vectorized_time:.4f}s")
print(f"Loop: {loop_time:.4f}s")
print(f"Speedup: {loop_time/vectorized_time:.2f}x")
The modern Generator API combined with proper vectorization delivers optimal performance for random number generation in NumPy. Always prefer default_rng() for new code and leverage NumPy’s array operations to avoid Python loops.