NumPy - Random Uniform Distribution
A uniform distribution represents the simplest probability distribution where every value within a defined interval [a, b] has equal likelihood of occurring. The probability density function (PDF) is...
Key Insights
- NumPy’s
random.uniform()generates samples from a continuous uniform distribution where all values within a specified range have equal probability of being selected - Uniform distributions are essential for Monte Carlo simulations, random sampling, initialization of neural network weights, and generating test data with known statistical properties
- Understanding the difference between
random.uniform(),random.rand(), andrandom.random()prevents common mistakes in scientific computing and ensures reproducible results
Understanding Uniform Distribution
A uniform distribution represents the simplest probability distribution where every value within a defined interval [a, b] has equal likelihood of occurring. The probability density function (PDF) is constant across the range, making it fundamental for random number generation and statistical sampling.
NumPy provides numpy.random.uniform() to generate random numbers from this distribution. The function accepts three primary parameters: low (lower boundary), high (upper boundary), and size (output shape).
import numpy as np
# Single random value between 0 and 1
value = np.random.uniform(0, 1)
print(f"Single value: {value}")
# Array of 5 values between 0 and 10
values = np.random.uniform(0, 10, size=5)
print(f"Array: {values}")
# 2D array (3x4) between -1 and 1
matrix = np.random.uniform(-1, 1, size=(3, 4))
print(f"Matrix:\n{matrix}")
Basic Usage Patterns
The uniform() function’s flexibility makes it suitable for various scenarios. Here are practical examples demonstrating different use cases:
import numpy as np
# Generate random prices between $10 and $100
prices = np.random.uniform(10, 100, size=1000)
print(f"Mean price: ${prices.mean():.2f}")
print(f"Expected mean: ${(10 + 100) / 2:.2f}")
# Random angles in radians (0 to 2π)
angles = np.random.uniform(0, 2 * np.pi, size=500)
# Random coordinates in 2D space
x_coords = np.random.uniform(-10, 10, size=100)
y_coords = np.random.uniform(-10, 10, size=100)
coordinates = np.column_stack((x_coords, y_coords))
print(f"First 5 coordinates:\n{coordinates[:5]}")
Seeding for Reproducibility
Reproducibility is critical in scientific computing, testing, and debugging. NumPy’s random number generator uses a seed value to initialize its state. Setting a seed ensures identical results across multiple runs.
import numpy as np
# Using legacy random seed
np.random.seed(42)
result1 = np.random.uniform(0, 1, size=5)
np.random.seed(42)
result2 = np.random.uniform(0, 1, size=5)
print(f"First run: {result1}")
print(f"Second run: {result2}")
print(f"Identical: {np.array_equal(result1, result2)}")
# Modern approach with Generator (recommended)
rng = np.random.default_rng(seed=42)
result3 = rng.uniform(0, 1, size=5)
rng = np.random.default_rng(seed=42)
result4 = rng.uniform(0, 1, size=5)
print(f"\nGenerator first run: {result3}")
print(f"Generator second run: {result4}")
print(f"Identical: {np.array_equal(result3, result4)}")
The default_rng() approach is preferred in modern NumPy code because it provides better statistical properties and supports parallel random number generation.
Monte Carlo Simulation Example
Monte Carlo methods rely heavily on uniform random sampling. Here’s a practical example estimating π by randomly sampling points in a unit square:
import numpy as np
def estimate_pi(n_samples, seed=None):
"""Estimate π using Monte Carlo method."""
rng = np.random.default_rng(seed=seed)
# Generate random points in [0, 1] x [0, 1]
x = rng.uniform(0, 1, size=n_samples)
y = rng.uniform(0, 1, size=n_samples)
# Calculate distance from origin
distances = np.sqrt(x**2 + y**2)
# Count points inside quarter circle
inside_circle = np.sum(distances <= 1)
# Estimate π (area of quarter circle / area of square * 4)
pi_estimate = 4 * inside_circle / n_samples
return pi_estimate
# Run simulation with different sample sizes
sample_sizes = [1000, 10000, 100000, 1000000]
for n in sample_sizes:
estimate = estimate_pi(n, seed=42)
error = abs(estimate - np.pi)
print(f"Samples: {n:>7} | Estimate: {estimate:.6f} | Error: {error:.6f}")
Comparing Uniform Distribution Functions
NumPy offers multiple functions for generating uniform random numbers. Understanding their differences prevents confusion:
import numpy as np
rng = np.random.default_rng(seed=42)
# random.uniform(low, high, size) - explicit range
uniform_explicit = rng.uniform(5, 15, size=5)
# random.random(size) - always [0, 1)
random_default = rng.random(size=5)
# Scale random() to match uniform range
random_scaled = 5 + random_default * (15 - 5)
print(f"uniform(5, 15): {uniform_explicit}")
print(f"random() scaled: {random_scaled}")
print(f"random() [0, 1): {random_default}")
# Legacy numpy.random.rand() - [0, 1) with shape arguments
np.random.seed(42)
rand_legacy = np.random.rand(5)
print(f"rand() legacy: {rand_legacy}")
Neural Network Weight Initialization
Uniform distribution is commonly used for initializing neural network weights. Here’s an implementation of Xavier/Glorot uniform initialization:
import numpy as np
def xavier_uniform_init(n_inputs, n_outputs, seed=None):
"""
Initialize weights using Xavier/Glorot uniform distribution.
Range: [-limit, limit] where limit = sqrt(6 / (n_inputs + n_outputs))
"""
rng = np.random.default_rng(seed=seed)
limit = np.sqrt(6 / (n_inputs + n_outputs))
weights = rng.uniform(-limit, limit, size=(n_inputs, n_outputs))
return weights
def he_uniform_init(n_inputs, n_outputs, seed=None):
"""
Initialize weights using He uniform distribution (for ReLU).
Range: [-limit, limit] where limit = sqrt(6 / n_inputs)
"""
rng = np.random.default_rng(seed=seed)
limit = np.sqrt(6 / n_inputs)
weights = rng.uniform(-limit, limit, size=(n_inputs, n_outputs))
return weights
# Example: 784 input features, 128 hidden units
xavier_weights = xavier_uniform_init(784, 128, seed=42)
he_weights = he_uniform_init(784, 128, seed=42)
print(f"Xavier weights shape: {xavier_weights.shape}")
print(f"Xavier range: [{xavier_weights.min():.4f}, {xavier_weights.max():.4f}]")
print(f"Xavier std: {xavier_weights.std():.4f}")
print(f"\nHe weights shape: {he_weights.shape}")
print(f"He range: [{he_weights.min():.4f}, {he_weights.max():.4f}]")
print(f"He std: {he_weights.std():.4f}")
Statistical Validation
Verifying that generated samples follow the expected uniform distribution is important for quality assurance:
import numpy as np
def validate_uniform_distribution(samples, low, high, n_bins=50):
"""Validate uniformity using chi-square test concept."""
# Create histogram
hist, bin_edges = np.histogram(samples, bins=n_bins, range=(low, high))
# Expected frequency for uniform distribution
expected_freq = len(samples) / n_bins
# Calculate chi-square statistic
chi_square = np.sum((hist - expected_freq)**2 / expected_freq)
# Calculate statistics
mean = np.mean(samples)
expected_mean = (low + high) / 2
variance = np.var(samples)
expected_variance = ((high - low)**2) / 12
return {
'mean': mean,
'expected_mean': expected_mean,
'variance': variance,
'expected_variance': expected_variance,
'chi_square': chi_square
}
# Generate and validate
rng = np.random.default_rng(seed=42)
samples = rng.uniform(0, 10, size=10000)
stats = validate_uniform_distribution(samples, 0, 10)
print("Distribution Validation:")
print(f"Mean: {stats['mean']:.4f} (expected: {stats['expected_mean']:.4f})")
print(f"Variance: {stats['variance']:.4f} (expected: {stats['expected_variance']:.4f})")
print(f"Chi-square: {stats['chi_square']:.2f}")
Performance Considerations
When generating large arrays, understanding performance characteristics helps optimize code:
import numpy as np
import time
rng = np.random.default_rng(seed=42)
# Compare vectorized vs loop approach
n_samples = 1000000
# Vectorized (efficient)
start = time.time()
vectorized = rng.uniform(0, 100, size=n_samples)
vectorized_time = time.time() - start
# Loop (inefficient - for demonstration only)
start = time.time()
looped = np.array([rng.uniform(0, 100) for _ in range(10000)])
looped_time = time.time() - start
print(f"Vectorized (1M samples): {vectorized_time:.4f} seconds")
print(f"Loop (10K samples): {looped_time:.4f} seconds")
print(f"Speedup factor: ~{(looped_time * 100) / vectorized_time:.1f}x")
# Memory-efficient generation for very large datasets
def generate_large_uniform(low, high, total_size, chunk_size=1000000):
"""Generate uniform samples in chunks to manage memory."""
rng = np.random.default_rng(seed=42)
n_chunks = total_size // chunk_size
remainder = total_size % chunk_size
for _ in range(n_chunks):
yield rng.uniform(low, high, size=chunk_size)
if remainder > 0:
yield rng.uniform(low, high, size=remainder)
# Process 10 million samples in chunks
total = 0
for chunk in generate_large_uniform(0, 1, 10000000):
total += chunk.sum()
print(f"\nProcessed 10M samples, sum: {total:.2f}")
NumPy’s uniform distribution functions provide the foundation for random sampling in scientific computing. Proper understanding of seeding, performance characteristics, and statistical validation ensures robust implementations in production systems.