NumPy - Random Normal Distribution (np.random.randn/normal)

The `np.random.randn()` function generates samples from the standard normal distribution (Gaussian distribution with mean 0 and standard deviation 1). The function accepts dimensions as separate...

Key Insights

  • np.random.randn() generates standard normal samples (mean=0, std=1) using positional arguments for shape, while np.random.normal() offers explicit control over mean and standard deviation with named parameters
  • NumPy’s legacy random functions are being superseded by the Generator API (default_rng()), which provides better statistical properties, performance, and reproducibility across different systems
  • Understanding the difference between standard normal and custom normal distributions is critical for statistical simulations, Monte Carlo methods, and machine learning applications requiring specific data characteristics

Standard Normal Distribution with np.random.randn()

The np.random.randn() function generates samples from the standard normal distribution (Gaussian distribution with mean 0 and standard deviation 1). The function accepts dimensions as separate arguments rather than a tuple.

import numpy as np

# Single random value
single_value = np.random.randn()
print(f"Single value: {single_value}")

# 1D array of 5 values
array_1d = np.random.randn(5)
print(f"1D array: {array_1d}")

# 2D array (3x4 matrix)
array_2d = np.random.randn(3, 4)
print(f"2D array:\n{array_2d}")

# 3D array
array_3d = np.random.randn(2, 3, 4)
print(f"3D array shape: {array_3d.shape}")

Verify the distribution properties:

# Generate large sample to verify distribution
large_sample = np.random.randn(100000)

print(f"Mean: {np.mean(large_sample):.4f}")  # ~0
print(f"Std Dev: {np.std(large_sample):.4f}")  # ~1
print(f"Min: {np.min(large_sample):.4f}")
print(f"Max: {np.max(large_sample):.4f}")

Custom Normal Distribution with np.random.normal()

When you need control over mean (loc) and standard deviation (scale), use np.random.normal(). This function accepts shape as a tuple or integer.

# Generate samples with custom parameters
mean = 100
std_dev = 15

# Single value
custom_value = np.random.normal(mean, std_dev)
print(f"Custom value: {custom_value}")

# Array with specific shape
custom_array = np.random.normal(mean, std_dev, size=10)
print(f"Custom array: {custom_array}")

# 2D array
custom_2d = np.random.normal(mean, std_dev, size=(3, 4))
print(f"Custom 2D:\n{custom_2d}")

# Verify properties
large_custom = np.random.normal(mean, std_dev, size=100000)
print(f"Mean: {np.mean(large_custom):.2f}")  # ~100
print(f"Std Dev: {np.std(large_custom):.2f}")  # ~15

Converting Between randn() and normal()

Understanding the mathematical relationship helps you choose the right function:

# These are equivalent
mean, std = 50, 10

# Method 1: Using np.random.normal()
method1 = np.random.normal(mean, std, size=5)

# Method 2: Transform standard normal from randn()
method2 = mean + std * np.random.randn(5)

print(f"Method 1 (normal): {method1}")
print(f"Method 2 (randn): {method2}")

# Both follow the same distribution
samples1 = np.random.normal(mean, std, size=100000)
samples2 = mean + std * np.random.randn(100000)

print(f"\nMethod 1 - Mean: {np.mean(samples1):.2f}, Std: {np.std(samples1):.2f}")
print(f"Method 2 - Mean: {np.mean(samples2):.2f}, Std: {np.std(samples2):.2f}")

Modern Approach: Generator API

NumPy’s legacy random functions use a global state, which causes issues with reproducibility and parallel processing. The Generator API provides better alternatives:

from numpy.random import default_rng

# Create a generator with a seed for reproducibility
rng = default_rng(seed=42)

# Standard normal
standard = rng.standard_normal(size=5)
print(f"Standard normal: {standard}")

# Custom normal distribution
custom = rng.normal(loc=100, scale=15, size=5)
print(f"Custom normal: {custom}")

# Multiple generators for parallel processing
rng1 = default_rng(seed=1)
rng2 = default_rng(seed=2)

# Independent random streams
stream1 = rng1.standard_normal(1000)
stream2 = rng2.standard_normal(1000)

Reproducibility and Seeding

Controlling randomness is essential for debugging and scientific reproducibility:

# Legacy approach (global state)
np.random.seed(42)
legacy1 = np.random.randn(5)
np.random.seed(42)
legacy2 = np.random.randn(5)
print(f"Legacy match: {np.array_equal(legacy1, legacy2)}")  # True

# Generator approach (better isolation)
rng = default_rng(42)
gen1 = rng.standard_normal(5)

rng = default_rng(42)
gen2 = rng.standard_normal(5)
print(f"Generator match: {np.array_equal(gen1, gen2)}")  # True

# Independent generators
rng_a = default_rng(42)
rng_b = default_rng(42)
print(f"Independent gen A: {rng_a.standard_normal(3)}")
print(f"Independent gen B: {rng_b.standard_normal(3)}")

Practical Applications

Monte Carlo Simulation

# Simulate stock price movements (Geometric Brownian Motion)
def simulate_stock_price(S0, mu, sigma, T, steps, n_simulations):
    dt = T / steps
    rng = default_rng(42)
    
    # Generate all random samples at once (vectorized)
    random_shocks = rng.normal(0, 1, size=(n_simulations, steps))
    
    # Calculate price movements
    drift = (mu - 0.5 * sigma**2) * dt
    diffusion = sigma * np.sqrt(dt) * random_shocks
    
    returns = drift + diffusion
    price_paths = S0 * np.exp(np.cumsum(returns, axis=1))
    
    return price_paths

# Simulate 1000 paths
S0 = 100  # Initial price
mu = 0.1  # Expected return (10%)
sigma = 0.2  # Volatility (20%)
T = 1  # 1 year
steps = 252  # Trading days

paths = simulate_stock_price(S0, mu, sigma, T, steps, 1000)
print(f"Final prices - Mean: {np.mean(paths[:, -1]):.2f}, Std: {np.std(paths[:, -1]):.2f}")

Adding Noise to Data

# Add Gaussian noise to clean signal
def add_gaussian_noise(signal, noise_level, seed=None):
    rng = default_rng(seed)
    noise = rng.normal(0, noise_level, size=signal.shape)
    return signal + noise

# Clean signal
t = np.linspace(0, 2*np.pi, 100)
clean_signal = np.sin(t)

# Add noise
noisy_signal = add_gaussian_noise(clean_signal, noise_level=0.2, seed=42)

print(f"SNR: {10 * np.log10(np.var(clean_signal) / np.var(noisy_signal - clean_signal)):.2f} dB")

Generating Synthetic Datasets

# Create synthetic dataset for testing ML models
def generate_regression_data(n_samples, n_features, noise_std=1.0, seed=None):
    rng = default_rng(seed)
    
    # Feature matrix
    X = rng.normal(0, 1, size=(n_samples, n_features))
    
    # True coefficients
    true_coef = rng.normal(0, 1, size=n_features)
    
    # Generate target with noise
    y = X @ true_coef + rng.normal(0, noise_std, size=n_samples)
    
    return X, y, true_coef

X_train, y_train, coef = generate_regression_data(1000, 5, noise_std=0.5, seed=42)
print(f"X shape: {X_train.shape}")
print(f"y shape: {y_train.shape}")
print(f"True coefficients: {coef}")

Performance Considerations

import time

# Compare performance
n = 10_000_000

# Legacy randn
start = time.time()
_ = np.random.randn(n)
legacy_time = time.time() - start

# Generator standard_normal
rng = default_rng()
start = time.time()
_ = rng.standard_normal(n)
generator_time = time.time() - start

print(f"Legacy randn: {legacy_time:.4f}s")
print(f"Generator standard_normal: {generator_time:.4f}s")
print(f"Speedup: {legacy_time/generator_time:.2f}x")

The Generator API typically offers 10-30% performance improvements while providing better statistical properties and thread safety. For new code, always prefer default_rng() over the legacy np.random functions. Use standard_normal() when you need the standard distribution and normal() when you need custom parameters.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.