NumPy - Random Uniform Distribution | Application Architect

Key Insights

NumPy’s random.uniform() generates samples from a continuous uniform distribution where all values within a specified range have equal probability of being selected
Uniform distributions are essential for Monte Carlo simulations, random sampling, initialization of neural network weights, and generating test data with known statistical properties
Understanding the difference between random.uniform(), random.rand(), and random.random() prevents common mistakes in scientific computing and ensures reproducible results

Understanding Uniform Distribution

A uniform distribution represents the simplest probability distribution where every value within a defined interval [a, b] has equal likelihood of occurring. The probability density function (PDF) is constant across the range, making it fundamental for random number generation and statistical sampling.

NumPy provides numpy.random.uniform() to generate random numbers from this distribution. The function accepts three primary parameters: low (lower boundary), high (upper boundary), and size (output shape).

import numpy as np

# Single random value between 0 and 1
value = np.random.uniform(0, 1)
print(f"Single value: {value}")

# Array of 5 values between 0 and 10
values = np.random.uniform(0, 10, size=5)
print(f"Array: {values}")

# 2D array (3x4) between -1 and 1
matrix = np.random.uniform(-1, 1, size=(3, 4))
print(f"Matrix:\n{matrix}")

Basic Usage Patterns

The uniform() function’s flexibility makes it suitable for various scenarios. Here are practical examples demonstrating different use cases:

import numpy as np

# Generate random prices between $10 and $100
prices = np.random.uniform(10, 100, size=1000)
print(f"Mean price: ${prices.mean():.2f}")
print(f"Expected mean: ${(10 + 100) / 2:.2f}")

# Random angles in radians (0 to 2π)
angles = np.random.uniform(0, 2 * np.pi, size=500)

# Random coordinates in 2D space
x_coords = np.random.uniform(-10, 10, size=100)
y_coords = np.random.uniform(-10, 10, size=100)
coordinates = np.column_stack((x_coords, y_coords))
print(f"First 5 coordinates:\n{coordinates[:5]}")

Seeding for Reproducibility

Reproducibility is critical in scientific computing, testing, and debugging. NumPy’s random number generator uses a seed value to initialize its state. Setting a seed ensures identical results across multiple runs.

import numpy as np

# Using legacy random seed
np.random.seed(42)
result1 = np.random.uniform(0, 1, size=5)

np.random.seed(42)
result2 = np.random.uniform(0, 1, size=5)

print(f"First run:  {result1}")
print(f"Second run: {result2}")
print(f"Identical: {np.array_equal(result1, result2)}")

# Modern approach with Generator (recommended)
rng = np.random.default_rng(seed=42)
result3 = rng.uniform(0, 1, size=5)

rng = np.random.default_rng(seed=42)
result4 = rng.uniform(0, 1, size=5)

print(f"\nGenerator first run:  {result3}")
print(f"Generator second run: {result4}")
print(f"Identical: {np.array_equal(result3, result4)}")

The default_rng() approach is preferred in modern NumPy code because it provides better statistical properties and supports parallel random number generation.

Monte Carlo Simulation Example

Monte Carlo methods rely heavily on uniform random sampling. Here’s a practical example estimating π by randomly sampling points in a unit square:

import numpy as np

def estimate_pi(n_samples, seed=None):
    """Estimate π using Monte Carlo method."""
    rng = np.random.default_rng(seed=seed)
    
    # Generate random points in [0, 1] x [0, 1]
    x = rng.uniform(0, 1, size=n_samples)
    y = rng.uniform(0, 1, size=n_samples)
    
    # Calculate distance from origin
    distances = np.sqrt(x**2 + y**2)
    
    # Count points inside quarter circle
    inside_circle = np.sum(distances <= 1)
    
    # Estimate π (area of quarter circle / area of square * 4)
    pi_estimate = 4 * inside_circle / n_samples
    
    return pi_estimate

# Run simulation with different sample sizes
sample_sizes = [1000, 10000, 100000, 1000000]
for n in sample_sizes:
    estimate = estimate_pi(n, seed=42)
    error = abs(estimate - np.pi)
    print(f"Samples: {n:>7} | Estimate: {estimate:.6f} | Error: {error:.6f}")

Comparing Uniform Distribution Functions

NumPy offers multiple functions for generating uniform random numbers. Understanding their differences prevents confusion:

import numpy as np

rng = np.random.default_rng(seed=42)

# random.uniform(low, high, size) - explicit range
uniform_explicit = rng.uniform(5, 15, size=5)

# random.random(size) - always [0, 1)
random_default = rng.random(size=5)

# Scale random() to match uniform range
random_scaled = 5 + random_default * (15 - 5)

print(f"uniform(5, 15):     {uniform_explicit}")
print(f"random() scaled:    {random_scaled}")
print(f"random() [0, 1):    {random_default}")

# Legacy numpy.random.rand() - [0, 1) with shape arguments
np.random.seed(42)
rand_legacy = np.random.rand(5)
print(f"rand() legacy:      {rand_legacy}")

Neural Network Weight Initialization

Uniform distribution is commonly used for initializing neural network weights. Here’s an implementation of Xavier/Glorot uniform initialization:

import numpy as np

def xavier_uniform_init(n_inputs, n_outputs, seed=None):
    """
    Initialize weights using Xavier/Glorot uniform distribution.
    Range: [-limit, limit] where limit = sqrt(6 / (n_inputs + n_outputs))
    """
    rng = np.random.default_rng(seed=seed)
    limit = np.sqrt(6 / (n_inputs + n_outputs))
    weights = rng.uniform(-limit, limit, size=(n_inputs, n_outputs))
    return weights

def he_uniform_init(n_inputs, n_outputs, seed=None):
    """
    Initialize weights using He uniform distribution (for ReLU).
    Range: [-limit, limit] where limit = sqrt(6 / n_inputs)
    """
    rng = np.random.default_rng(seed=seed)
    limit = np.sqrt(6 / n_inputs)
    weights = rng.uniform(-limit, limit, size=(n_inputs, n_outputs))
    return weights

# Example: 784 input features, 128 hidden units
xavier_weights = xavier_uniform_init(784, 128, seed=42)
he_weights = he_uniform_init(784, 128, seed=42)

print(f"Xavier weights shape: {xavier_weights.shape}")
print(f"Xavier range: [{xavier_weights.min():.4f}, {xavier_weights.max():.4f}]")
print(f"Xavier std: {xavier_weights.std():.4f}")

print(f"\nHe weights shape: {he_weights.shape}")
print(f"He range: [{he_weights.min():.4f}, {he_weights.max():.4f}]")
print(f"He std: {he_weights.std():.4f}")

Statistical Validation

Verifying that generated samples follow the expected uniform distribution is important for quality assurance:

import numpy as np

def validate_uniform_distribution(samples, low, high, n_bins=50):
    """Validate uniformity using chi-square test concept."""
    # Create histogram
    hist, bin_edges = np.histogram(samples, bins=n_bins, range=(low, high))
    
    # Expected frequency for uniform distribution
    expected_freq = len(samples) / n_bins
    
    # Calculate chi-square statistic
    chi_square = np.sum((hist - expected_freq)**2 / expected_freq)
    
    # Calculate statistics
    mean = np.mean(samples)
    expected_mean = (low + high) / 2
    variance = np.var(samples)
    expected_variance = ((high - low)**2) / 12
    
    return {
        'mean': mean,
        'expected_mean': expected_mean,
        'variance': variance,
        'expected_variance': expected_variance,
        'chi_square': chi_square
    }

# Generate and validate
rng = np.random.default_rng(seed=42)
samples = rng.uniform(0, 10, size=10000)
stats = validate_uniform_distribution(samples, 0, 10)

print("Distribution Validation:")
print(f"Mean: {stats['mean']:.4f} (expected: {stats['expected_mean']:.4f})")
print(f"Variance: {stats['variance']:.4f} (expected: {stats['expected_variance']:.4f})")
print(f"Chi-square: {stats['chi_square']:.2f}")

Performance Considerations

When generating large arrays, understanding performance characteristics helps optimize code:

import numpy as np
import time

rng = np.random.default_rng(seed=42)

# Compare vectorized vs loop approach
n_samples = 1000000

# Vectorized (efficient)
start = time.time()
vectorized = rng.uniform(0, 100, size=n_samples)
vectorized_time = time.time() - start

# Loop (inefficient - for demonstration only)
start = time.time()
looped = np.array([rng.uniform(0, 100) for _ in range(10000)])
looped_time = time.time() - start

print(f"Vectorized (1M samples): {vectorized_time:.4f} seconds")
print(f"Loop (10K samples): {looped_time:.4f} seconds")
print(f"Speedup factor: ~{(looped_time * 100) / vectorized_time:.1f}x")

# Memory-efficient generation for very large datasets
def generate_large_uniform(low, high, total_size, chunk_size=1000000):
    """Generate uniform samples in chunks to manage memory."""
    rng = np.random.default_rng(seed=42)
    n_chunks = total_size // chunk_size
    remainder = total_size % chunk_size
    
    for _ in range(n_chunks):
        yield rng.uniform(low, high, size=chunk_size)
    
    if remainder > 0:
        yield rng.uniform(low, high, size=remainder)

# Process 10 million samples in chunks
total = 0
for chunk in generate_large_uniform(0, 1, 10000000):
    total += chunk.sum()

print(f"\nProcessed 10M samples, sum: {total:.2f}")

NumPy’s uniform distribution functions provide the foundation for random sampling in scientific computing. Proper understanding of seeding, performance characteristics, and statistical validation ensures robust implementations in production systems.