NumPy - Random Exponential Distribution
The exponential distribution describes the time between events in a process where events occur continuously and independently at a constant average rate. In NumPy, you generate exponentially...
Key Insights
- The exponential distribution models time between events in a Poisson process, making it essential for simulating system failures, service times, and radioactive decay with NumPy’s
random.exponential()function - Understanding the relationship between the scale parameter (β) and rate parameter (λ) is critical: β = 1/λ, where larger β values produce longer wait times between events
- Real-world applications require combining exponential distributions with statistical analysis, visualization, and integration into Monte Carlo simulations for reliability engineering and queueing theory
Understanding the Exponential Distribution
The exponential distribution describes the time between events in a process where events occur continuously and independently at a constant average rate. In NumPy, you generate exponentially distributed random numbers using numpy.random.exponential(), which takes a scale parameter β (beta).
import numpy as np
import matplotlib.pyplot as plt
# Generate 1000 samples with scale parameter β = 2.0
samples = np.random.exponential(scale=2.0, size=1000)
print(f"Mean: {np.mean(samples):.2f}")
print(f"Standard deviation: {np.std(samples):.2f}")
print(f"Min: {np.min(samples):.2f}, Max: {np.max(samples):.2f}")
The scale parameter β represents the mean of the distribution. For an exponential distribution, the mean equals the standard deviation, both equal to β. This is a defining characteristic you can verify empirically with large sample sizes.
Scale vs Rate Parameter
Many statistical texts use the rate parameter λ (lambda) instead of scale. The relationship is inverse: β = 1/λ. If events occur at a rate of 5 per hour (λ = 5), the average time between events is 0.2 hours (β = 0.2).
# Simulating server requests
# Average 10 requests per minute (λ = 10)
rate = 10
scale = 1 / rate
# Generate inter-arrival times for 100 requests
inter_arrival_times = np.random.exponential(scale=scale, size=100)
# Calculate actual arrival times
arrival_times = np.cumsum(inter_arrival_times)
print(f"First 10 arrival times (minutes): {arrival_times[:10]}")
print(f"Average requests per minute: {1/np.mean(inter_arrival_times):.2f}")
This conversion is crucial when working with domain experts who think in terms of rates rather than average intervals.
Generating Random Samples
NumPy provides multiple ways to generate exponential random numbers. The modern approach uses the Generator interface for better statistical properties and reproducibility.
# Legacy approach (still works)
legacy_samples = np.random.exponential(scale=1.5, size=1000)
# Modern approach with Generator
rng = np.random.default_rng(seed=42)
modern_samples = rng.exponential(scale=1.5, size=1000)
# Multi-dimensional arrays
matrix_samples = rng.exponential(scale=2.0, size=(100, 50))
print(f"Matrix shape: {matrix_samples.shape}")
# Different scales for different columns (broadcasting)
scales = np.array([1.0, 2.0, 3.0, 4.0])
varied_samples = rng.exponential(scale=scales, size=(1000, 4))
print(f"Column means: {np.mean(varied_samples, axis=0)}")
Using default_rng() with a seed ensures reproducibility across runs, critical for testing and debugging simulations.
Probability Density and Cumulative Distribution
The probability density function (PDF) and cumulative distribution function (CDF) are essential for understanding and validating your simulations.
def exponential_pdf(x, scale):
"""Calculate exponential PDF: f(x) = (1/β)e^(-x/β)"""
return (1/scale) * np.exp(-x/scale)
def exponential_cdf(x, scale):
"""Calculate exponential CDF: F(x) = 1 - e^(-x/β)"""
return 1 - np.exp(-x/scale)
# Generate samples and compare with theoretical distributions
rng = np.random.default_rng(seed=42)
scale = 2.5
samples = rng.exponential(scale=scale, size=10000)
# Create comparison plot
x = np.linspace(0, 15, 1000)
theoretical_pdf = exponential_pdf(x, scale)
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.hist(samples, bins=50, density=True, alpha=0.7, label='Samples')
plt.plot(x, theoretical_pdf, 'r-', linewidth=2, label='Theoretical PDF')
plt.xlabel('Value')
plt.ylabel('Density')
plt.legend()
plt.title('Probability Density Function')
plt.subplot(1, 2, 2)
empirical_cdf = np.arange(1, len(samples)+1) / len(samples)
sorted_samples = np.sort(samples)
plt.plot(sorted_samples, empirical_cdf, label='Empirical CDF')
plt.plot(x, exponential_cdf(x, scale), 'r-', linewidth=2, label='Theoretical CDF')
plt.xlabel('Value')
plt.ylabel('Cumulative Probability')
plt.legend()
plt.title('Cumulative Distribution Function')
plt.tight_layout()
plt.savefig('exponential_distribution.png', dpi=150)
This validation step confirms your random number generator produces statistically correct samples.
Simulating System Reliability
Exponential distributions model component failure times in reliability engineering. The memoryless property means the probability of failure in the next time interval is independent of how long the component has already operated.
def simulate_system_lifetime(component_mtbf, n_components, n_simulations=10000):
"""
Simulate system with components in series (system fails when any component fails)
Args:
component_mtbf: Mean time between failures for each component
n_components: Number of components in series
n_simulations: Number of Monte Carlo simulations
Returns:
Array of system lifetimes
"""
rng = np.random.default_rng(seed=42)
# Generate failure times for all components in all simulations
failure_times = rng.exponential(
scale=component_mtbf,
size=(n_simulations, n_components)
)
# System fails when first component fails
system_lifetimes = np.min(failure_times, axis=1)
return system_lifetimes
# Example: System with 5 components, each with MTBF of 1000 hours
component_mtbf = 1000
n_components = 5
lifetimes = simulate_system_lifetime(component_mtbf, n_components)
print(f"Component MTBF: {component_mtbf} hours")
print(f"System MTBF: {np.mean(lifetimes):.2f} hours")
print(f"Theoretical system MTBF: {component_mtbf/n_components:.2f} hours")
print(f"Probability of failure within 100 hours: {np.mean(lifetimes < 100):.4f}")
# Calculate reliability at different time points
time_points = np.array([50, 100, 200, 500])
for t in time_points:
reliability = np.mean(lifetimes > t)
print(f"Reliability at {t} hours: {reliability:.4f}")
This simulation demonstrates how system reliability decreases dramatically with multiple components in series.
Queueing Theory Applications
Exponential distributions are fundamental in queueing theory, modeling both arrival processes and service times.
def simulate_mm1_queue(arrival_rate, service_rate, simulation_time, seed=42):
"""
Simulate M/M/1 queue (exponential arrivals, exponential service, 1 server)
Args:
arrival_rate: Average arrivals per time unit (λ)
service_rate: Average services per time unit (μ)
simulation_time: Total simulation duration
Returns:
Dictionary with queue statistics
"""
rng = np.random.default_rng(seed=seed)
# Generate inter-arrival times
arrival_scale = 1 / arrival_rate
service_scale = 1 / service_rate
arrivals = []
current_time = 0
while current_time < simulation_time:
inter_arrival = rng.exponential(scale=arrival_scale)
current_time += inter_arrival
if current_time < simulation_time:
arrivals.append(current_time)
# Generate service times
service_times = rng.exponential(scale=service_scale, size=len(arrivals))
# Simulate queue
departure_times = np.zeros(len(arrivals))
wait_times = np.zeros(len(arrivals))
for i, (arrival, service) in enumerate(zip(arrivals, service_times)):
if i == 0:
service_start = arrival
else:
service_start = max(arrival, departure_times[i-1])
wait_times[i] = service_start - arrival
departure_times[i] = service_start + service
queue_lengths = []
for arrival in arrivals:
in_system = np.sum((np.array(arrivals) <= arrival) &
(departure_times > arrival))
queue_lengths.append(in_system)
return {
'avg_wait_time': np.mean(wait_times),
'avg_queue_length': np.mean(queue_lengths),
'max_queue_length': np.max(queue_lengths),
'utilization': (arrival_rate / service_rate),
'total_customers': len(arrivals)
}
# Example: Help desk with 5 calls/hour arrival, 6 calls/hour service capacity
results = simulate_mm1_queue(arrival_rate=5, service_rate=6, simulation_time=480)
print("M/M/1 Queue Simulation Results (8-hour day):")
for key, value in results.items():
print(f"{key}: {value:.2f}")
# Theoretical values for validation
rho = results['utilization']
theoretical_wait = rho / (6 * (1 - rho)) # in hours
print(f"\nTheoretical average wait time: {theoretical_wait*60:.2f} minutes")
This simulation provides practical insights into system capacity planning and service level optimization.
Testing for Exponential Distribution
When analyzing real-world data, verify whether it follows an exponential distribution using statistical tests and visual methods.
from scipy import stats
def test_exponential_fit(data, alpha=0.05):
"""Test if data fits exponential distribution"""
# Estimate scale parameter using maximum likelihood (MLE = sample mean)
scale_estimate = np.mean(data)
# Kolmogorov-Smirnov test
ks_statistic, ks_pvalue = stats.kstest(
data,
lambda x: exponential_cdf(x, scale_estimate)
)
# Q-Q plot data
theoretical_quantiles = np.linspace(0.01, 0.99, len(data))
theoretical_values = -scale_estimate * np.log(1 - theoretical_quantiles)
sample_quantiles = np.sort(data)
print(f"Estimated scale parameter: {scale_estimate:.4f}")
print(f"KS test statistic: {ks_statistic:.4f}")
print(f"KS test p-value: {ks_pvalue:.4f}")
print(f"Reject null hypothesis (not exponential): {ks_pvalue < alpha}")
return {
'scale': scale_estimate,
'ks_statistic': ks_statistic,
'ks_pvalue': ks_pvalue,
'theoretical_quantiles': theoretical_values,
'sample_quantiles': sample_quantiles
}
# Test with true exponential data
rng = np.random.default_rng(seed=42)
true_exponential = rng.exponential(scale=3.0, size=1000)
results = test_exponential_fit(true_exponential)
# Test with non-exponential data (normal distribution)
non_exponential = rng.normal(loc=3.0, scale=1.0, size=1000)
results_normal = test_exponential_fit(np.abs(non_exponential))
The Kolmogorov-Smirnov test quantifies the goodness-of-fit, with p-values above 0.05 suggesting the data is consistent with an exponential distribution.