NumPy - Random Exponential Distribution

The exponential distribution describes the time between events in a process where events occur continuously and independently at a constant average rate. In NumPy, you generate exponentially...

Key Insights

  • The exponential distribution models time between events in a Poisson process, making it essential for simulating system failures, service times, and radioactive decay with NumPy’s random.exponential() function
  • Understanding the relationship between the scale parameter (β) and rate parameter (λ) is critical: β = 1/λ, where larger β values produce longer wait times between events
  • Real-world applications require combining exponential distributions with statistical analysis, visualization, and integration into Monte Carlo simulations for reliability engineering and queueing theory

Understanding the Exponential Distribution

The exponential distribution describes the time between events in a process where events occur continuously and independently at a constant average rate. In NumPy, you generate exponentially distributed random numbers using numpy.random.exponential(), which takes a scale parameter β (beta).

import numpy as np
import matplotlib.pyplot as plt

# Generate 1000 samples with scale parameter β = 2.0
samples = np.random.exponential(scale=2.0, size=1000)

print(f"Mean: {np.mean(samples):.2f}")
print(f"Standard deviation: {np.std(samples):.2f}")
print(f"Min: {np.min(samples):.2f}, Max: {np.max(samples):.2f}")

The scale parameter β represents the mean of the distribution. For an exponential distribution, the mean equals the standard deviation, both equal to β. This is a defining characteristic you can verify empirically with large sample sizes.

Scale vs Rate Parameter

Many statistical texts use the rate parameter λ (lambda) instead of scale. The relationship is inverse: β = 1/λ. If events occur at a rate of 5 per hour (λ = 5), the average time between events is 0.2 hours (β = 0.2).

# Simulating server requests
# Average 10 requests per minute (λ = 10)
rate = 10
scale = 1 / rate

# Generate inter-arrival times for 100 requests
inter_arrival_times = np.random.exponential(scale=scale, size=100)

# Calculate actual arrival times
arrival_times = np.cumsum(inter_arrival_times)

print(f"First 10 arrival times (minutes): {arrival_times[:10]}")
print(f"Average requests per minute: {1/np.mean(inter_arrival_times):.2f}")

This conversion is crucial when working with domain experts who think in terms of rates rather than average intervals.

Generating Random Samples

NumPy provides multiple ways to generate exponential random numbers. The modern approach uses the Generator interface for better statistical properties and reproducibility.

# Legacy approach (still works)
legacy_samples = np.random.exponential(scale=1.5, size=1000)

# Modern approach with Generator
rng = np.random.default_rng(seed=42)
modern_samples = rng.exponential(scale=1.5, size=1000)

# Multi-dimensional arrays
matrix_samples = rng.exponential(scale=2.0, size=(100, 50))
print(f"Matrix shape: {matrix_samples.shape}")

# Different scales for different columns (broadcasting)
scales = np.array([1.0, 2.0, 3.0, 4.0])
varied_samples = rng.exponential(scale=scales, size=(1000, 4))
print(f"Column means: {np.mean(varied_samples, axis=0)}")

Using default_rng() with a seed ensures reproducibility across runs, critical for testing and debugging simulations.

Probability Density and Cumulative Distribution

The probability density function (PDF) and cumulative distribution function (CDF) are essential for understanding and validating your simulations.

def exponential_pdf(x, scale):
    """Calculate exponential PDF: f(x) = (1/β)e^(-x/β)"""
    return (1/scale) * np.exp(-x/scale)

def exponential_cdf(x, scale):
    """Calculate exponential CDF: F(x) = 1 - e^(-x/β)"""
    return 1 - np.exp(-x/scale)

# Generate samples and compare with theoretical distributions
rng = np.random.default_rng(seed=42)
scale = 2.5
samples = rng.exponential(scale=scale, size=10000)

# Create comparison plot
x = np.linspace(0, 15, 1000)
theoretical_pdf = exponential_pdf(x, scale)

plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.hist(samples, bins=50, density=True, alpha=0.7, label='Samples')
plt.plot(x, theoretical_pdf, 'r-', linewidth=2, label='Theoretical PDF')
plt.xlabel('Value')
plt.ylabel('Density')
plt.legend()
plt.title('Probability Density Function')

plt.subplot(1, 2, 2)
empirical_cdf = np.arange(1, len(samples)+1) / len(samples)
sorted_samples = np.sort(samples)
plt.plot(sorted_samples, empirical_cdf, label='Empirical CDF')
plt.plot(x, exponential_cdf(x, scale), 'r-', linewidth=2, label='Theoretical CDF')
plt.xlabel('Value')
plt.ylabel('Cumulative Probability')
plt.legend()
plt.title('Cumulative Distribution Function')

plt.tight_layout()
plt.savefig('exponential_distribution.png', dpi=150)

This validation step confirms your random number generator produces statistically correct samples.

Simulating System Reliability

Exponential distributions model component failure times in reliability engineering. The memoryless property means the probability of failure in the next time interval is independent of how long the component has already operated.

def simulate_system_lifetime(component_mtbf, n_components, n_simulations=10000):
    """
    Simulate system with components in series (system fails when any component fails)
    
    Args:
        component_mtbf: Mean time between failures for each component
        n_components: Number of components in series
        n_simulations: Number of Monte Carlo simulations
    
    Returns:
        Array of system lifetimes
    """
    rng = np.random.default_rng(seed=42)
    
    # Generate failure times for all components in all simulations
    failure_times = rng.exponential(
        scale=component_mtbf, 
        size=(n_simulations, n_components)
    )
    
    # System fails when first component fails
    system_lifetimes = np.min(failure_times, axis=1)
    
    return system_lifetimes

# Example: System with 5 components, each with MTBF of 1000 hours
component_mtbf = 1000
n_components = 5

lifetimes = simulate_system_lifetime(component_mtbf, n_components)

print(f"Component MTBF: {component_mtbf} hours")
print(f"System MTBF: {np.mean(lifetimes):.2f} hours")
print(f"Theoretical system MTBF: {component_mtbf/n_components:.2f} hours")
print(f"Probability of failure within 100 hours: {np.mean(lifetimes < 100):.4f}")

# Calculate reliability at different time points
time_points = np.array([50, 100, 200, 500])
for t in time_points:
    reliability = np.mean(lifetimes > t)
    print(f"Reliability at {t} hours: {reliability:.4f}")

This simulation demonstrates how system reliability decreases dramatically with multiple components in series.

Queueing Theory Applications

Exponential distributions are fundamental in queueing theory, modeling both arrival processes and service times.

def simulate_mm1_queue(arrival_rate, service_rate, simulation_time, seed=42):
    """
    Simulate M/M/1 queue (exponential arrivals, exponential service, 1 server)
    
    Args:
        arrival_rate: Average arrivals per time unit (λ)
        service_rate: Average services per time unit (μ)
        simulation_time: Total simulation duration
    
    Returns:
        Dictionary with queue statistics
    """
    rng = np.random.default_rng(seed=seed)
    
    # Generate inter-arrival times
    arrival_scale = 1 / arrival_rate
    service_scale = 1 / service_rate
    
    arrivals = []
    current_time = 0
    
    while current_time < simulation_time:
        inter_arrival = rng.exponential(scale=arrival_scale)
        current_time += inter_arrival
        if current_time < simulation_time:
            arrivals.append(current_time)
    
    # Generate service times
    service_times = rng.exponential(scale=service_scale, size=len(arrivals))
    
    # Simulate queue
    departure_times = np.zeros(len(arrivals))
    wait_times = np.zeros(len(arrivals))
    
    for i, (arrival, service) in enumerate(zip(arrivals, service_times)):
        if i == 0:
            service_start = arrival
        else:
            service_start = max(arrival, departure_times[i-1])
        
        wait_times[i] = service_start - arrival
        departure_times[i] = service_start + service
    
    queue_lengths = []
    for arrival in arrivals:
        in_system = np.sum((np.array(arrivals) <= arrival) & 
                          (departure_times > arrival))
        queue_lengths.append(in_system)
    
    return {
        'avg_wait_time': np.mean(wait_times),
        'avg_queue_length': np.mean(queue_lengths),
        'max_queue_length': np.max(queue_lengths),
        'utilization': (arrival_rate / service_rate),
        'total_customers': len(arrivals)
    }

# Example: Help desk with 5 calls/hour arrival, 6 calls/hour service capacity
results = simulate_mm1_queue(arrival_rate=5, service_rate=6, simulation_time=480)

print("M/M/1 Queue Simulation Results (8-hour day):")
for key, value in results.items():
    print(f"{key}: {value:.2f}")

# Theoretical values for validation
rho = results['utilization']
theoretical_wait = rho / (6 * (1 - rho))  # in hours
print(f"\nTheoretical average wait time: {theoretical_wait*60:.2f} minutes")

This simulation provides practical insights into system capacity planning and service level optimization.

Testing for Exponential Distribution

When analyzing real-world data, verify whether it follows an exponential distribution using statistical tests and visual methods.

from scipy import stats

def test_exponential_fit(data, alpha=0.05):
    """Test if data fits exponential distribution"""
    
    # Estimate scale parameter using maximum likelihood (MLE = sample mean)
    scale_estimate = np.mean(data)
    
    # Kolmogorov-Smirnov test
    ks_statistic, ks_pvalue = stats.kstest(
        data, 
        lambda x: exponential_cdf(x, scale_estimate)
    )
    
    # Q-Q plot data
    theoretical_quantiles = np.linspace(0.01, 0.99, len(data))
    theoretical_values = -scale_estimate * np.log(1 - theoretical_quantiles)
    sample_quantiles = np.sort(data)
    
    print(f"Estimated scale parameter: {scale_estimate:.4f}")
    print(f"KS test statistic: {ks_statistic:.4f}")
    print(f"KS test p-value: {ks_pvalue:.4f}")
    print(f"Reject null hypothesis (not exponential): {ks_pvalue < alpha}")
    
    return {
        'scale': scale_estimate,
        'ks_statistic': ks_statistic,
        'ks_pvalue': ks_pvalue,
        'theoretical_quantiles': theoretical_values,
        'sample_quantiles': sample_quantiles
    }

# Test with true exponential data
rng = np.random.default_rng(seed=42)
true_exponential = rng.exponential(scale=3.0, size=1000)
results = test_exponential_fit(true_exponential)

# Test with non-exponential data (normal distribution)
non_exponential = rng.normal(loc=3.0, scale=1.0, size=1000)
results_normal = test_exponential_fit(np.abs(non_exponential))

The Kolmogorov-Smirnov test quantifies the goodness-of-fit, with p-values above 0.05 suggesting the data is consistent with an exponential distribution.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.