How to Generate Random Numbers in NumPy
NumPy's random module is the workhorse of random number generation in scientific Python. While Python's built-in `random` module works fine for simple tasks, it falls short when you need to generate...
Key Insights
- NumPy’s modern
GeneratorAPI (introduced in NumPy 1.17) offers better statistical properties and performance than the legacynumpy.randomfunctions—always usedefault_rng()for new code. - Seeding your random number generator with
default_rng(seed=42)ensures reproducible results, which is essential for debugging, testing, and scientific reproducibility. - The
sizeparameter transforms any random function into an array generator, letting you create multi-dimensional random data in a single, vectorized call.
Introduction to NumPy’s Random Module
NumPy’s random module is the workhorse of random number generation in scientific Python. While Python’s built-in random module works fine for simple tasks, it falls short when you need to generate millions of random numbers or fill arrays with random data efficiently.
The difference comes down to vectorization. NumPy generates random numbers in compiled C code and returns them as arrays, avoiding Python’s interpreter overhead. When you need 10 million random floats, NumPy delivers them 50-100x faster than a Python loop calling random.random().
Beyond performance, NumPy provides a comprehensive suite of statistical distributions—from uniform and normal to exotic distributions like Dirichlet and Wishart. If you’re doing Monte Carlo simulations, machine learning, or statistical analysis, NumPy’s random module is non-negotiable.
The New Generator API vs. Legacy API
NumPy 1.17 introduced a completely redesigned random number generation system. You’ll encounter two approaches in the wild:
Legacy API (avoid for new code):
import numpy as np
# Old way - still works but deprecated for new projects
np.random.seed(42)
values = np.random.randn(5)
Modern Generator API (use this):
import numpy as np
# New way - better randomness, cleaner design
rng = np.random.default_rng(seed=42)
values = rng.standard_normal(5)
The modern API improves on the legacy system in several ways. It uses the PCG64 algorithm by default, which has better statistical properties than the old Mersenne Twister. The API design is cleaner—you create a Generator object and call methods on it, rather than relying on global state. This makes your code more predictable and thread-safe.
Here’s how to create a Generator:
import numpy as np
# Create with automatic seeding (uses system entropy)
rng = np.random.default_rng()
# Create with explicit seed for reproducibility
rng = np.random.default_rng(seed=12345)
# You can also use different bit generators if needed
from numpy.random import Generator, PCG64, MT19937
rng_pcg = Generator(PCG64(seed=42)) # Default algorithm
rng_mt = Generator(MT19937(seed=42)) # Legacy Mersenne Twister
For the rest of this article, I’ll use the modern Generator API exclusively. If you’re maintaining legacy code, the method names are similar but not identical—check the migration guide in NumPy’s documentation.
Generating Basic Random Numbers
Let’s cover the fundamental operations you’ll use daily.
Random floats between 0 and 1:
rng = np.random.default_rng(seed=42)
# Single value
single_float = rng.random()
print(single_float) # 0.7739560485559633
# Array of 5 values
float_array = rng.random(5)
print(float_array)
# [0.43857224 0.85859792 0.69736803 0.09417735 0.97562235]
Random integers:
rng = np.random.default_rng(seed=42)
# Single integer from 0 to 9 (exclusive upper bound)
single_int = rng.integers(10)
print(single_int) # 0
# Array of integers from 1 to 100
int_array = rng.integers(1, 101, size=5)
print(int_array) # [64 14 21 42 89]
# Include the endpoint with endpoint=True
dice_rolls = rng.integers(1, 6, size=10, endpoint=True)
print(dice_rolls) # [3 5 2 5 6 3 2 1 2 2]
Random floats in a custom range:
rng = np.random.default_rng(seed=42)
# Uniform distribution between -5 and 5
uniform_values = rng.uniform(-5, 5, size=5)
print(uniform_values)
# [ 2.73912097 3.71917584 1.94736057 -4.05825294 4.75622349]
# Useful for initializing neural network weights
weights = rng.uniform(-0.5, 0.5, size=(3, 4))
print(weights)
Random Sampling from Distributions
Real-world data rarely follows a uniform distribution. NumPy provides dozens of statistical distributions out of the box.
Normal (Gaussian) distribution:
rng = np.random.default_rng(seed=42)
# Standard normal (mean=0, std=1)
standard = rng.standard_normal(5)
print(standard)
# [-0.91042437 0.28272545 -0.59031678 0.33610633 -0.70803952]
# Custom mean and standard deviation
heights = rng.normal(loc=170, scale=10, size=1000) # Heights in cm
print(f"Mean: {heights.mean():.1f}, Std: {heights.std():.1f}")
# Mean: 170.2, Std: 9.9
Other useful distributions:
rng = np.random.default_rng(seed=42)
# Binomial: number of successes in n trials with probability p
coin_flips = rng.binomial(n=10, p=0.5, size=5) # Flip 10 coins, 5 times
print(coin_flips) # [4 3 5 5 7]
# Poisson: events per interval (good for count data)
customer_arrivals = rng.poisson(lam=5, size=10) # Average 5 per hour
print(customer_arrivals) # [6 6 4 5 7 5 3 5 4 5]
# Exponential: time between events
wait_times = rng.exponential(scale=2.0, size=5) # Average wait of 2 units
print(wait_times) # [0.51 3.91 0.72 4.72 0.12]
Sampling from existing data:
rng = np.random.default_rng(seed=42)
# Random choice from an array
colors = ['red', 'green', 'blue', 'yellow']
picks = rng.choice(colors, size=5)
print(picks) # ['blue' 'yellow' 'yellow' 'green' 'yellow']
# Weighted sampling
weighted_picks = rng.choice(colors, size=1000, p=[0.5, 0.3, 0.15, 0.05])
# 'red' appears ~50% of the time
# Sampling without replacement
deck = np.arange(52)
hand = rng.choice(deck, size=5, replace=False)
print(hand) # [37 1 10 20 45] - 5 unique cards
# Shuffle an array in-place
arr = np.arange(10)
rng.shuffle(arr)
print(arr) # [7 3 9 0 5 1 6 8 2 4]
# Get a shuffled copy without modifying original
arr = np.arange(10)
shuffled = rng.permutation(arr)
print(arr) # [0 1 2 3 4 5 6 7 8 9] - unchanged
print(shuffled) # [7 3 9 0 5 1 6 8 2 4]
Controlling Reproducibility with Seeds
Reproducibility is critical for debugging, testing, and scientific integrity. Seeds let you generate the exact same “random” sequence every time.
import numpy as np
# Same seed = same results
rng1 = np.random.default_rng(seed=42)
rng2 = np.random.default_rng(seed=42)
print(rng1.random(3)) # [0.77395605 0.43857224 0.85859792]
print(rng2.random(3)) # [0.77395605 0.43857224 0.85859792] - identical!
# Different seed = different results
rng3 = np.random.default_rng(seed=123)
print(rng3.random(3)) # [0.68235186 0.05382102 0.22035987]
A common pattern for testing is to seed at the start of your script or test function:
def run_simulation(seed=None):
"""Run Monte Carlo simulation with optional reproducibility."""
rng = np.random.default_rng(seed=seed)
# All random operations use this rng
samples = rng.normal(0, 1, size=10000)
noise = rng.uniform(-0.1, 0.1, size=10000)
return samples + noise
# Reproducible run for debugging
result1 = run_simulation(seed=42)
result2 = run_simulation(seed=42)
assert np.allclose(result1, result2) # Always true
# Production run with true randomness
result3 = run_simulation() # No seed = different each time
Generating Random Arrays and Matrices
The size parameter is your key to generating multi-dimensional random data efficiently.
rng = np.random.default_rng(seed=42)
# 1D array
vector = rng.random(5)
# 2D array (3 rows, 4 columns)
matrix = rng.random((3, 4))
print(matrix.shape) # (3, 4)
# 3D array (2 batches of 3x4 matrices)
tensor = rng.random((2, 3, 4))
print(tensor.shape) # (2, 3, 4)
# Random integers in a matrix
int_matrix = rng.integers(0, 100, size=(5, 5))
print(int_matrix)
Practical example—initializing neural network weights:
rng = np.random.default_rng(seed=42)
def initialize_layer(input_size, output_size, rng):
"""Xavier initialization for neural network weights."""
scale = np.sqrt(2.0 / (input_size + output_size))
weights = rng.normal(0, scale, size=(input_size, output_size))
biases = np.zeros(output_size)
return weights, biases
# Initialize a simple network
W1, b1 = initialize_layer(784, 256, rng) # Input to hidden
W2, b2 = initialize_layer(256, 10, rng) # Hidden to output
print(f"W1 shape: {W1.shape}, mean: {W1.mean():.4f}, std: {W1.std():.4f}")
# W1 shape: (784, 256), mean: 0.0001, std: 0.0438
Generating synthetic datasets:
rng = np.random.default_rng(seed=42)
# Create a synthetic classification dataset
n_samples = 1000
n_features = 20
# Features with some correlation structure
X = rng.normal(0, 1, size=(n_samples, n_features))
# Binary labels based on a linear combination plus noise
true_weights = rng.normal(0, 1, size=n_features)
logits = X @ true_weights + rng.normal(0, 0.5, size=n_samples)
y = (logits > 0).astype(int)
print(f"Features shape: {X.shape}")
print(f"Labels distribution: {np.bincount(y)}")
Practical Tips and Performance Considerations
Reuse your Generator instance. Creating a new Generator has overhead. Create one at the start and pass it around:
# Bad - creates new Generator each call
def bad_random_function():
rng = np.random.default_rng()
return rng.random(100)
# Good - reuse the Generator
class Simulation:
def __init__(self, seed=None):
self.rng = np.random.default_rng(seed)
def step(self):
return self.rng.random(100)
Generate in bulk, not in loops:
rng = np.random.default_rng(seed=42)
# Bad - slow Python loop
result_slow = [rng.random() for _ in range(100000)]
# Good - single vectorized call
result_fast = rng.random(100000)
Thread safety matters. Each thread should have its own Generator. Use spawn() to create independent child generators:
rng = np.random.default_rng(seed=42)
# Create independent generators for parallel work
child_rngs = rng.spawn(4) # 4 independent generators
# Each can be used in a separate thread/process
for i, child_rng in enumerate(child_rngs):
print(f"Child {i}: {child_rng.random(3)}")
Avoid the legacy global state. Functions like np.random.rand() use hidden global state that’s hard to reason about and not thread-safe. Always prefer explicit Generator objects.
The NumPy random module is mature, fast, and well-designed. Master default_rng(), understand the size parameter, and seed consistently—you’ll handle 99% of random number generation tasks with confidence.