NumPy - Random Poisson Distribution

The Poisson distribution describes the probability of a given number of events occurring in a fixed interval when these events happen independently at a constant average rate. The distribution is...

Key Insights

  • The Poisson distribution models discrete events occurring independently over a fixed interval, making it essential for analyzing event rates like server requests, customer arrivals, or defect counts in manufacturing systems
  • NumPy’s random.poisson() function generates random samples from a Poisson distribution using a single parameter λ (lambda), which represents both the mean and variance of the distribution
  • Understanding when to use Poisson versus other distributions is critical—Poisson works best for rare events with known average rates, while binomial distributions suit fixed-trial scenarios and normal distributions apply to continuous data

Understanding the Poisson Distribution

The Poisson distribution describes the probability of a given number of events occurring in a fixed interval when these events happen independently at a constant average rate. The distribution is defined by a single parameter λ (lambda), which represents the expected number of events in the interval.

The probability mass function is: P(X = k) = (λ^k * e^-λ) / k!

Where k is the number of events, λ is the average rate, and e is Euler’s number.

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

# Generate Poisson-distributed random samples
lambda_param = 5  # Average number of events
samples = np.random.poisson(lam=lambda_param, size=1000)

print(f"Mean: {np.mean(samples):.2f}")
print(f"Variance: {np.var(samples):.2f}")
print(f"Expected (λ): {lambda_param}")

# Output:
# Mean: 5.02
# Variance: 4.98
# Expected (λ): 5

The mean and variance of a Poisson distribution are both equal to λ, which you can verify empirically with large sample sizes.

Generating Poisson Random Variables

NumPy provides random.poisson() to generate random samples. You can control the shape of the output array to match your application needs.

# Single value
single_value = np.random.poisson(lam=3)
print(f"Single value: {single_value}")

# 1D array
array_1d = np.random.poisson(lam=3, size=10)
print(f"1D array: {array_1d}")

# 2D array (useful for simulations)
array_2d = np.random.poisson(lam=3, size=(5, 4))
print(f"2D array:\n{array_2d}")

# Multiple lambda values (broadcasting)
lambdas = np.array([1, 5, 10, 20])
samples_multi = np.random.poisson(lam=lambdas, size=(1000, 4))
print(f"Means for each lambda: {np.mean(samples_multi, axis=0)}")

The size parameter determines the output shape. When passing an array of lambda values, NumPy broadcasts the operation across all values simultaneously, which is more efficient than loops.

Real-World Application: Server Request Modeling

Model incoming server requests to determine capacity requirements and predict overload scenarios.

# Simulate hourly server requests over a week
hours_per_week = 24 * 7
avg_requests_per_hour = 45

# Generate request counts
requests = np.random.poisson(lam=avg_requests_per_hour, size=hours_per_week)

# Analysis
print(f"Total requests: {np.sum(requests)}")
print(f"Average per hour: {np.mean(requests):.2f}")
print(f"Max in any hour: {np.max(requests)}")
print(f"Hours with >60 requests: {np.sum(requests > 60)}")

# Calculate percentiles for capacity planning
percentiles = np.percentile(requests, [50, 75, 90, 95, 99])
print(f"\nPercentiles (50, 75, 90, 95, 99): {percentiles}")

# Probability of exceeding capacity
capacity = 70
prob_exceed = np.sum(requests > capacity) / len(requests)
print(f"Probability of exceeding {capacity} requests: {prob_exceed:.2%}")

This simulation helps architects determine server capacity. If 99th percentile requests reach 60 but your capacity is 50, you’ll experience issues during peak times.

Comparing Multiple Poisson Processes

When analyzing systems with different event rates, compare multiple Poisson distributions to understand behavior patterns.

# Simulate three different systems
lambda_low = 3    # Low-traffic system
lambda_med = 10   # Medium-traffic system
lambda_high = 25  # High-traffic system

samples_per_system = 10000

low_traffic = np.random.poisson(lam=lambda_low, size=samples_per_system)
med_traffic = np.random.poisson(lam=lambda_med, size=samples_per_system)
high_traffic = np.random.poisson(lam=lambda_high, size=samples_per_system)

# Statistical comparison
systems = [
    ("Low", low_traffic, lambda_low),
    ("Medium", med_traffic, lambda_med),
    ("High", high_traffic, lambda_high)
]

for name, data, expected in systems:
    print(f"\n{name} Traffic System (λ={expected}):")
    print(f"  Mean: {np.mean(data):.2f}")
    print(f"  Std Dev: {np.std(data):.2f}")
    print(f"  P(X=0): {np.sum(data == 0) / len(data):.4f}")
    print(f"  P(X>2λ): {np.sum(data > 2*expected) / len(data):.4f}")

The probability of zero events decreases exponentially as λ increases. For low-traffic systems (λ=3), you’ll see zero events about 5% of the time. For high-traffic systems (λ=25), zero events become virtually impossible.

Time-Series Simulation with Varying Rates

Real systems rarely have constant event rates. Model time-varying Poisson processes for more realistic simulations.

# Simulate daily customer arrivals with time-of-day variation
hours = np.arange(24)

# Define lambda as a function of time (higher during business hours)
base_rate = 5
time_multiplier = 1 + 2 * np.sin((hours - 6) * np.pi / 12)
time_multiplier = np.maximum(time_multiplier, 0.1)  # Avoid negative rates
lambda_by_hour = base_rate * time_multiplier

# Simulate 30 days
days = 30
arrivals = np.zeros((days, 24))

for day in range(days):
    for hour in range(24):
        arrivals[day, hour] = np.random.poisson(lam=lambda_by_hour[hour])

# Analysis
print(f"Total arrivals: {np.sum(arrivals):.0f}")
print(f"Average per hour: {np.mean(arrivals):.2f}")
print(f"Peak hour average: {np.mean(arrivals[:, 14]):.2f}")  # 2 PM
print(f"Off-peak average: {np.mean(arrivals[:, 3]):.2f}")    # 3 AM

# Find hours requiring extra capacity
threshold = 15
high_demand_hours = np.where(np.mean(arrivals, axis=0) > threshold)[0]
print(f"Hours needing extra capacity: {high_demand_hours}")

This pattern reflects real-world scenarios where event rates vary predictably. E-commerce sites see traffic spikes during lunch hours and evenings, while overnight traffic drops significantly.

Statistical Testing and Validation

Verify that your simulated data follows the expected Poisson distribution using statistical tests.

from scipy.stats import chisquare, poisson

# Generate sample data
lambda_true = 7
sample_size = 10000
observed_data = np.random.poisson(lam=lambda_true, size=sample_size)

# Estimate lambda from data
lambda_estimated = np.mean(observed_data)

# Create histogram bins
unique_values, counts = np.unique(observed_data, return_counts=True)

# Calculate expected frequencies
expected_freq = poisson.pmf(unique_values, lambda_estimated) * sample_size

# Chi-square goodness of fit test
chi_stat, p_value = chisquare(counts, expected_freq)

print(f"Estimated λ: {lambda_estimated:.2f}")
print(f"True λ: {lambda_true}")
print(f"Chi-square statistic: {chi_stat:.2f}")
print(f"P-value: {p_value:.4f}")

if p_value > 0.05:
    print("Data is consistent with Poisson distribution (fail to reject H0)")
else:
    print("Data may not follow Poisson distribution (reject H0)")

A high p-value (>0.05) indicates the data fits the Poisson distribution well. This validation is crucial when making business decisions based on simulated data.

Practical Considerations and Common Pitfalls

Seed Management for Reproducibility

# Set seed for reproducible results
np.random.seed(42)
sample1 = np.random.poisson(lam=5, size=10)

np.random.seed(42)
sample2 = np.random.poisson(lam=5, size=10)

print(f"Samples identical: {np.array_equal(sample1, sample2)}")  # True

Always set seeds in production code for debugging and testing. Use different seeds for different simulation runs to avoid correlation artifacts.

Lambda Parameter Constraints

# Lambda must be non-negative
valid_lambda = 5
invalid_lambda = -5

valid_sample = np.random.poisson(lam=valid_lambda, size=10)
# invalid_sample = np.random.poisson(lam=invalid_lambda, size=10)  # Raises ValueError

# Zero lambda produces only zeros
zero_lambda = np.random.poisson(lam=0, size=10)
print(f"Zero lambda result: {zero_lambda}")  # All zeros

Large Lambda Approximation

For large λ values (typically λ > 20), the Poisson distribution approximates a normal distribution. Consider using np.random.normal() for better performance.

lambda_large = 100
size = 10000

# Poisson approach
poisson_samples = np.random.poisson(lam=lambda_large, size=size)

# Normal approximation (mean=λ, variance=λ)
normal_samples = np.random.normal(loc=lambda_large, 
                                  scale=np.sqrt(lambda_large), 
                                  size=size)
normal_samples = np.round(normal_samples).astype(int)
normal_samples = np.maximum(normal_samples, 0)  # Ensure non-negative

print(f"Poisson mean: {np.mean(poisson_samples):.2f}")
print(f"Normal approx mean: {np.mean(normal_samples):.2f}")
print(f"Difference: {abs(np.mean(poisson_samples) - np.mean(normal_samples)):.2f}")

The Poisson distribution is fundamental for modeling discrete event systems. Master these patterns to build accurate simulations for capacity planning, performance testing, and system design validation.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.