How to Generate Random Numbers from a Poisson Distribution in Python
The Poisson distribution models the probability of a given number of events occurring in a fixed interval of time or space. The key assumption: these events occur independently at a constant average...
Key Insights
- NumPy’s
random.poisson()is the fastest and most practical method for generating Poisson random numbers in Python, handling both single values and large arrays efficiently. - SciPy’s
scipy.stats.poissonprovides an object-oriented interface with additional statistical methods (PMF, CDF, moments) that prove invaluable when you need more than just random samples. - Understanding the Poisson distribution’s single parameter λ (lambda) is crucial—it represents both the mean and variance, making it uniquely suited for modeling count data in fixed intervals.
Understanding the Poisson Distribution
The Poisson distribution models the probability of a given number of events occurring in a fixed interval of time or space. The key assumption: these events occur independently at a constant average rate.
You’ll encounter Poisson distributions everywhere in real systems:
- Customer arrivals: How many customers enter a store per hour
- Server requests: HTTP requests hitting your API per second
- Manufacturing defects: Number of defects per batch of products
- Call centers: Incoming calls per 15-minute window
- Natural phenomena: Radioactive decay events, meteor strikes
The distribution has a single parameter: λ (lambda), representing the average rate of occurrence. This simplicity makes it powerful—λ is simultaneously the mean and the variance of the distribution.
Using NumPy’s random.poisson()
NumPy provides the most straightforward and performant method for generating Poisson random numbers. This should be your default choice for most applications.
import numpy as np
# Generate a single Poisson random number with λ=5
single_value = np.random.poisson(lam=5)
print(f"Single value: {single_value}")
# Generate 10 random numbers from Poisson(λ=5)
samples = np.random.poisson(lam=5, size=10)
print(f"Ten samples: {samples}")
# Generate a 3x4 matrix of Poisson random numbers
matrix = np.random.poisson(lam=5, size=(3, 4))
print(f"Matrix:\n{matrix}")
Output might look like:
Single value: 7
Ten samples: [4 6 5 3 8 4 5 6 2 7]
Matrix:
[[5 4 6 3]
[7 5 4 8]
[6 3 5 4]]
For modern NumPy (1.17+), use the new random Generator API for better statistical properties and thread safety:
import numpy as np
# Create a Generator with a specific seed for reproducibility
rng = np.random.default_rng(seed=42)
# Generate Poisson samples using the Generator
samples = rng.poisson(lam=5, size=1000)
print(f"Mean: {samples.mean():.3f}") # Should be close to 5
print(f"Variance: {samples.var():.3f}") # Should also be close to 5
The Generator API is preferred because it uses the PCG64 algorithm by default, which has better statistical properties than the legacy Mersenne Twister.
Using scipy.stats.poisson
SciPy’s stats module provides an object-oriented interface that bundles random number generation with statistical analysis tools. This approach shines when you need to work with the distribution’s properties beyond just sampling.
from scipy.stats import poisson
# Create a Poisson distribution object with λ=5
dist = poisson(mu=5)
# Generate random variates
samples = dist.rvs(size=1000)
# Access distribution properties
print(f"Theoretical mean: {dist.mean()}")
print(f"Theoretical variance: {dist.var()}")
print(f"Sample mean: {samples.mean():.3f}")
# Calculate probabilities
print(f"P(X=3): {dist.pmf(3):.4f}") # Probability mass function
print(f"P(X≤3): {dist.cdf(3):.4f}") # Cumulative distribution function
# Find quantiles
print(f"Median: {dist.ppf(0.5)}") # Percent point function (inverse CDF)
You can also use the functional interface without creating a distribution object:
from scipy.stats import poisson
# Direct function calls
samples = poisson.rvs(mu=5, size=1000, random_state=42)
probability = poisson.pmf(k=3, mu=5)
cumulative = poisson.cdf(k=3, mu=5)
The random_state parameter accepts either an integer seed or a NumPy Generator, giving you full control over reproducibility.
Using Python’s Built-in random Module
Python’s standard library random module doesn’t include a Poisson generator. However, you can implement one using inverse transform sampling. This approach is useful when you can’t use external dependencies.
import random
import math
def poisson_random(lam):
"""
Generate a single Poisson random number using inverse transform sampling.
Based on Knuth's algorithm.
"""
L = math.exp(-lam)
k = 0
p = 1.0
while p > L:
k += 1
p *= random.random()
return k - 1
def poisson_samples(lam, size):
"""Generate multiple Poisson random numbers."""
return [poisson_random(lam) for _ in range(size)]
# Usage
random.seed(42)
samples = poisson_samples(lam=5, size=1000)
print(f"Mean: {sum(samples)/len(samples):.3f}")
Warning: This implementation has O(λ) time complexity per sample, making it impractical for large λ values. For λ > 30, consider using rejection sampling or the normal approximation. In practice, just use NumPy.
Visualizing Poisson-Distributed Data
Visualization helps verify your generated data matches the theoretical distribution. Here’s how to create a histogram comparing samples against the theoretical probability mass function:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import poisson
# Parameters
lam = 7
n_samples = 10000
# Generate samples
rng = np.random.default_rng(seed=42)
samples = rng.poisson(lam=lam, size=n_samples)
# Create the plot
fig, ax = plt.subplots(figsize=(10, 6))
# Histogram of samples (normalized to show probability)
max_val = samples.max()
bins = np.arange(-0.5, max_val + 1.5, 1)
ax.hist(samples, bins=bins, density=True, alpha=0.7,
label=f'Samples (n={n_samples})', color='steelblue', edgecolor='black')
# Theoretical PMF
x = np.arange(0, max_val + 1)
pmf = poisson.pmf(x, mu=lam)
ax.plot(x, pmf, 'ro-', markersize=8, linewidth=2,
label=f'Theoretical PMF (λ={lam})')
ax.set_xlabel('Number of Events', fontsize=12)
ax.set_ylabel('Probability', fontsize=12)
ax.set_title('Poisson Distribution: Samples vs. Theory', fontsize=14)
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('poisson_distribution.png', dpi=150)
plt.show()
To demonstrate convergence, compare different sample sizes:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import poisson
lam = 5
sample_sizes = [100, 1000, 10000]
rng = np.random.default_rng(seed=42)
fig, axes = plt.subplots(1, 3, figsize=(14, 4))
for ax, n in zip(axes, sample_sizes):
samples = rng.poisson(lam=lam, size=n)
bins = np.arange(-0.5, 15.5, 1)
ax.hist(samples, bins=bins, density=True, alpha=0.7,
color='steelblue', edgecolor='black')
x = np.arange(0, 15)
ax.plot(x, poisson.pmf(x, mu=lam), 'ro-', markersize=6)
ax.set_title(f'n = {n}')
ax.set_xlabel('k')
ax.set_ylabel('Probability')
plt.tight_layout()
plt.savefig('poisson_convergence.png', dpi=150)
plt.show()
Practical Application: Simulating Server Traffic
Let’s build a realistic simulation of server request arrivals over a day, with varying traffic rates:
import numpy as np
import matplotlib.pyplot as plt
def simulate_daily_traffic(base_rate=100, hours=24, seed=42):
"""
Simulate server requests over a day with time-varying rates.
Args:
base_rate: Average requests per hour during normal periods
hours: Number of hours to simulate
seed: Random seed for reproducibility
Returns:
Tuple of (hours array, requests per hour array)
"""
rng = np.random.default_rng(seed=seed)
# Define hourly rate multipliers (traffic pattern)
# Low at night, peak during business hours
hour_multipliers = np.array([
0.2, 0.1, 0.1, 0.1, 0.2, 0.4, # 00:00 - 05:00
0.7, 1.2, 1.8, 2.0, 1.9, 1.8, # 06:00 - 11:00
1.5, 1.7, 1.9, 2.0, 1.8, 1.5, # 12:00 - 17:00
1.2, 1.0, 0.8, 0.6, 0.4, 0.3 # 18:00 - 23:00
])
# Calculate lambda for each hour
hourly_rates = base_rate * hour_multipliers
# Generate Poisson samples for each hour
requests = rng.poisson(lam=hourly_rates)
return np.arange(hours), requests, hourly_rates
# Run simulation
hours, requests, expected_rates = simulate_daily_traffic(base_rate=100)
# Visualize
fig, ax = plt.subplots(figsize=(12, 5))
ax.bar(hours, requests, alpha=0.7, label='Actual Requests', color='steelblue')
ax.plot(hours, expected_rates, 'r-', linewidth=2, marker='o',
label='Expected Rate (λ)', markersize=6)
ax.set_xlabel('Hour of Day', fontsize=12)
ax.set_ylabel('Number of Requests', fontsize=12)
ax.set_title('Simulated Server Traffic Over 24 Hours', fontsize=14)
ax.set_xticks(hours)
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.savefig('server_traffic.png', dpi=150)
plt.show()
# Summary statistics
print(f"Total requests: {requests.sum()}")
print(f"Peak hour: {hours[requests.argmax()]}:00 with {requests.max()} requests")
print(f"Quietest hour: {hours[requests.argmin()]}:00 with {requests.min()} requests")
Performance Considerations and Best Practices
When working with Poisson distributions at scale, performance matters. Here’s a benchmark comparing approaches:
import numpy as np
from scipy.stats import poisson
import time
def benchmark(func, name, iterations=100):
start = time.perf_counter()
for _ in range(iterations):
func()
elapsed = time.perf_counter() - start
print(f"{name}: {elapsed:.4f}s ({elapsed/iterations*1000:.2f}ms per call)")
n_samples = 100000
lam = 10
# NumPy legacy
benchmark(lambda: np.random.poisson(lam, n_samples), "NumPy legacy")
# NumPy Generator
rng = np.random.default_rng()
benchmark(lambda: rng.poisson(lam, n_samples), "NumPy Generator")
# SciPy
benchmark(lambda: poisson.rvs(lam, size=n_samples), "SciPy")
Typical results show NumPy methods are 2-3x faster than SciPy for pure random number generation.
Best practices to follow:
- Always set seeds for reproducibility in research and testing
- Use NumPy’s Generator API for new code—it’s the future
- Batch your generation when possible; generating 10,000 numbers at once is far faster than generating 1 number 10,000 times
- Validate λ values: Poisson requires λ > 0; for very large λ (>1000), consider the normal approximation N(λ, λ)
- Use SciPy when you need statistical methods beyond random sampling; use NumPy for pure generation speed
The Poisson distribution is a workhorse for modeling count data. With these tools, you can simulate realistic scenarios, validate statistical models, and build robust applications that account for the inherent randomness in real-world events.