NumPy - Random Float (np.random.rand, random_sample)
NumPy offers several approaches to generate random floating-point numbers. The most common methods—`np.random.rand()` and `np.random.random_sample()`—both produce uniformly distributed floats in the...
Key Insights
- NumPy provides multiple methods for generating random floats, with
np.random.rand()andnp.random.random_sample()being functionally identical for uniform distributions in [0.0, 1.0) - The legacy
numpy.randommodule remains widely used, butnumpy.random.Generatoroffers better statistical properties, performance, and is the recommended approach for new code - Understanding how to scale and shift random floats enables generation of custom ranges, normal distributions, and reproducible randomness through seeding
Understanding NumPy’s Random Float Generation
NumPy offers several approaches to generate random floating-point numbers. The most common methods—np.random.rand() and np.random.random_sample()—both produce uniformly distributed floats in the half-open interval [0.0, 1.0), meaning values can be 0.0 but never quite reach 1.0.
import numpy as np
# Both produce identical results
rand_values = np.random.rand(5)
sample_values = np.random.random_sample(5)
print(f"rand(): {rand_values}")
print(f"random_sample(): {sample_values}")
# Output example:
# rand(): [0.5488135 0.71518937 0.60276338 0.54488318 0.4236548 ]
# random_sample(): [0.64589411 0.43758721 0.891773 0.96366276 0.38344152]
Legacy vs Modern Random Number Generation
NumPy’s random module has evolved significantly. The legacy numpy.random functions use a global RandomState instance, while the modern approach uses Generator objects with improved algorithms.
# Legacy approach (still works, widely used)
legacy_random = np.random.rand(3, 3)
# Modern approach (recommended)
rng = np.random.default_rng(seed=42)
modern_random = rng.random((3, 3))
print("Legacy:\n", legacy_random)
print("\nModern:\n", modern_random)
The modern Generator approach offers:
- Better statistical properties with PCG64 algorithm
- Independent random streams
- Thread-safe operation
- Improved performance for large arrays
Generating Multi-Dimensional Arrays
Both methods support creating arrays of any shape. The syntax differs slightly between rand() and random_sample().
# np.random.rand() takes dimensions as separate arguments
matrix_2d = np.random.rand(3, 4)
tensor_3d = np.random.rand(2, 3, 4)
# np.random.random_sample() takes a tuple
matrix_2d_alt = np.random.random_sample((3, 4))
tensor_3d_alt = np.random.random_sample((2, 3, 4))
print(f"2D shape: {matrix_2d.shape}")
print(f"3D shape: {tensor_3d.shape}")
# Output:
# 2D shape: (3, 4)
# 3D shape: (2, 3, 4)
# Modern Generator approach
rng = np.random.default_rng()
modern_matrix = rng.random((3, 4))
Scaling Random Floats to Custom Ranges
The default [0.0, 1.0) range is rarely what you need in practice. Scale and shift these values to any range using basic arithmetic.
# Generate floats in range [min, max)
def random_range(min_val, max_val, size):
return np.random.rand(*size) * (max_val - min_val) + min_val
# Examples
temps_celsius = random_range(-10, 35, (10,))
prices = random_range(9.99, 99.99, (5,))
percentages = random_range(0, 100, (3, 3))
print(f"Temperatures: {temps_celsius}")
print(f"Prices: {prices}")
print(f"\nPercentages:\n{percentages}")
# Using modern Generator with uniform method (cleaner)
rng = np.random.default_rng()
temps_modern = rng.uniform(-10, 35, size=10)
print(f"\nModern uniform: {temps_modern}")
Reproducible Random Numbers with Seeds
Seeding ensures reproducibility—critical for debugging, testing, and scientific reproducibility.
# Legacy seeding
np.random.seed(42)
result1 = np.random.rand(5)
np.random.seed(42)
result2 = np.random.rand(5)
print(f"First run: {result1}")
print(f"Second run: {result2}")
print(f"Identical: {np.array_equal(result1, result2)}")
# Output: Identical: True
# Modern seeding (preferred)
rng1 = np.random.default_rng(seed=42)
rng2 = np.random.default_rng(seed=42)
modern1 = rng1.random(5)
modern2 = rng2.random(5)
print(f"\nModern identical: {np.array_equal(modern1, modern2)}")
Performance Comparison and Best Practices
For large-scale random number generation, performance matters. The modern Generator typically outperforms legacy methods.
import time
# Benchmark legacy vs modern
size = (10000, 1000)
# Legacy
start = time.time()
legacy_large = np.random.rand(*size)
legacy_time = time.time() - start
# Modern
rng = np.random.default_rng()
start = time.time()
modern_large = rng.random(size)
modern_time = time.time() - start
print(f"Legacy time: {legacy_time:.4f}s")
print(f"Modern time: {modern_time:.4f}s")
print(f"Speedup: {legacy_time/modern_time:.2f}x")
Common Distributions Beyond Uniform
While rand() and random_sample() generate uniform distributions, NumPy provides many other distributions.
rng = np.random.default_rng(seed=42)
# Normal (Gaussian) distribution
normal_data = rng.normal(loc=0, scale=1, size=1000)
# Exponential distribution
exponential_data = rng.exponential(scale=2.0, size=1000)
# Beta distribution
beta_data = rng.beta(a=2, b=5, size=1000)
# Log-normal distribution
lognormal_data = rng.lognormal(mean=0, sigma=1, size=1000)
print(f"Normal mean: {normal_data.mean():.3f}, std: {normal_data.std():.3f}")
print(f"Exponential mean: {exponential_data.mean():.3f}")
print(f"Beta mean: {beta_data.mean():.3f}")
Practical Application: Monte Carlo Simulation
Random floats power Monte Carlo simulations. Here’s a simple example estimating π.
def estimate_pi(n_samples):
rng = np.random.default_rng(seed=42)
# Generate random points in unit square
x = rng.random(n_samples)
y = rng.random(n_samples)
# Check if points fall inside unit circle
inside_circle = (x**2 + y**2) <= 1
# π ≈ 4 * (points inside circle / total points)
pi_estimate = 4 * np.sum(inside_circle) / n_samples
return pi_estimate
# Test with increasing sample sizes
for n in [1000, 10000, 100000, 1000000]:
estimate = estimate_pi(n)
error = abs(estimate - np.pi)
print(f"n={n:7d}: π ≈ {estimate:.6f}, error = {error:.6f}")
# Output example:
# n= 1000: π ≈ 3.144000, error = 0.002407
# n= 10000: π ≈ 3.150800, error = 0.009207
# n= 100000: π ≈ 3.142920, error = 0.001327
# n=1000000: π ≈ 3.141273, error = 0.000320
Avoiding Common Pitfalls
Several mistakes frequently trip up developers working with random floats.
# WRONG: Reusing the same seed globally
np.random.seed(42)
data1 = np.random.rand(100)
np.random.seed(42) # Don't reset the seed like this
data2 = np.random.rand(100)
# data1 and data2 are identical - probably not intended
# RIGHT: Use separate Generator instances
rng1 = np.random.default_rng(seed=42)
rng2 = np.random.default_rng(seed=43)
data1 = rng1.random(100)
data2 = rng2.random(100)
# WRONG: Inefficient generation in loops
results = []
for i in range(1000):
results.append(np.random.rand()) # Slow
results = np.array(results)
# RIGHT: Vectorized generation
results = np.random.rand(1000) # Much faster
# WRONG: Forgetting half-open interval
max_val = np.random.rand(1000000).max()
print(f"Max value: {max_val}") # Never exactly 1.0
print(f"Equals 1.0: {max_val == 1.0}") # Always False
Migration Path from Legacy to Modern
If you’re maintaining legacy code, migrate incrementally to the modern API.
# Legacy code
np.random.seed(42)
old_data = np.random.rand(5, 5)
old_normal = np.random.randn(5, 5)
old_range = np.random.uniform(0, 10, (5, 5))
# Modern equivalent
rng = np.random.default_rng(seed=42)
new_data = rng.random((5, 5))
new_normal = rng.standard_normal((5, 5))
new_range = rng.uniform(0, 10, (5, 5))
# For drop-in replacement, create a global Generator
_rng = np.random.default_rng()
def rand(*args):
return _rng.random(args if args else None)
# Now rand() works like legacy np.random.rand()
modern_result = rand(3, 3)
The choice between np.random.rand() and np.random.random_sample() is purely stylistic—they’re identical. However, the choice between legacy and modern APIs impacts code quality, performance, and maintainability. For new projects, always use np.random.default_rng() and its Generator methods.