How to Use the Law of Large Numbers
The Law of Large Numbers (LLN) states that as you increase your sample size, the average of your observations converges to the expected value. If you flip a fair coin, you expect heads 50% of the...
Key Insights
- The Law of Large Numbers guarantees that sample averages converge to expected values as sample size increases, making it foundational for A/B testing, simulations, and performance benchmarking in production systems.
- Independence and identical distribution are non-negotiable requirements—violating these assumptions (like with autocorrelated time-series data) invalidates the law and produces misleading results.
- Practical application requires understanding the difference between mathematical guarantees and engineering reality: even with large samples, you need confidence intervals and proper statistical tests to make decisions.
Understanding the Law of Large Numbers
The Law of Large Numbers (LLN) states that as you increase your sample size, the average of your observations converges to the expected value. If you flip a fair coin, you expect heads 50% of the time. Flip it 10 times and you might get 7 heads. Flip it 10,000 times and you’ll get very close to 5,000 heads.
There are two versions: the Weak Law of Large Numbers (convergence in probability) and the Strong Law of Large Numbers (almost sure convergence). For practical software engineering, this distinction rarely matters. What matters is this: larger samples give you more accurate estimates of the true mean.
This principle underpins everything from load testing to machine learning model validation. When you run performance benchmarks, you’re not running one request—you’re running thousands to get a reliable average. When you evaluate an A/B test, you need enough users to distinguish signal from noise. The LLN is why these practices work.
The Mathematics You Actually Need
Let’s strip away the measure theory. You have a random variable X with expected value E[X] = μ. You take n independent samples: X₁, X₂, …, Xₙ. Your sample mean is:
X̄ₙ = (X₁ + X₂ + … + Xₙ) / n
The LLN says that as n → ∞, X̄ₙ → μ. More precisely, for any small error ε > 0:
P(|X̄ₙ - μ| > ε) → 0 as n → ∞
Here’s what this looks like in practice:
import numpy as np
import matplotlib.pyplot as plt
def demonstrate_convergence(true_mean=5, std_dev=2, max_samples=10000):
"""Show how sample mean converges to expected value."""
np.random.seed(42)
# Generate all samples at once
samples = np.random.normal(true_mean, std_dev, max_samples)
# Calculate running average
sample_sizes = range(1, max_samples + 1)
running_means = np.cumsum(samples) / np.arange(1, max_samples + 1)
# Compare small vs large sample accuracy
print(f"True mean: {true_mean}")
print(f"Mean of first 10 samples: {running_means[9]:.4f}")
print(f"Mean of first 100 samples: {running_means[99]:.4f}")
print(f"Mean of first 1000 samples: {running_means[999]:.4f}")
print(f"Mean of all {max_samples} samples: {running_means[-1]:.4f}")
return sample_sizes, running_means
sizes, means = demonstrate_convergence()
Output:
True mean: 5
Mean of first 10 samples: 5.2635
Mean of first 100 samples: 5.0891
Mean of first 1000 samples: 5.0267
Mean of all 10000 samples: 4.9989
Notice how the estimate stabilizes. This isn’t magic—it’s the LLN at work.
Classic Demonstration: Coin Flips
The coin flip example is cliché for a reason—it’s intuitive and demonstrates the core principle perfectly.
import numpy as np
import matplotlib.pyplot as plt
def simulate_coin_flips(sample_sizes=[10, 100, 1000, 10000]):
"""Simulate coin flips and show convergence to 0.5 probability."""
np.random.seed(42)
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
axes = axes.flatten()
for idx, n in enumerate(sample_sizes):
# Simulate n coin flips (1 = heads, 0 = tails)
flips = np.random.binomial(1, 0.5, n)
# Calculate running proportion of heads
running_proportion = np.cumsum(flips) / np.arange(1, n + 1)
# Plot
axes[idx].plot(running_proportion, linewidth=0.8)
axes[idx].axhline(y=0.5, color='r', linestyle='--', label='True probability')
axes[idx].set_title(f'n = {n} flips')
axes[idx].set_xlabel('Flip number')
axes[idx].set_ylabel('Proportion of heads')
axes[idx].legend()
axes[idx].grid(True, alpha=0.3)
final_proportion = running_proportion[-1]
print(f"n={n}: Final proportion = {final_proportion:.4f}, "
f"Error = {abs(final_proportion - 0.5):.4f}")
plt.tight_layout()
plt.savefig('coin_flip_convergence.png', dpi=150)
simulate_coin_flips()
With small samples, you see wild swings. With 10,000 flips, the line hugs 0.5. This visual makes the LLN concrete: more data equals better estimates.
Monte Carlo Simulation: Estimating π
Monte Carlo methods rely entirely on the LLN. You can estimate π by randomly sampling points in a square and checking if they fall inside an inscribed circle.
import numpy as np
def estimate_pi(n_samples):
"""Estimate π using Monte Carlo simulation."""
np.random.seed(42)
# Generate random points in unit square [0,1] x [0,1]
x = np.random.uniform(0, 1, n_samples)
y = np.random.uniform(0, 1, n_samples)
# Check if points fall inside quarter circle (x² + y² ≤ 1)
inside_circle = (x**2 + y**2) <= 1
# Ratio of points inside circle to total points
# approximates π/4 (area of quarter circle / area of square)
pi_estimate = 4 * np.sum(inside_circle) / n_samples
return pi_estimate
# Test with increasing sample sizes
sample_sizes = [100, 1000, 10000, 100000, 1000000]
for n in sample_sizes:
estimate = estimate_pi(n)
error = abs(estimate - np.pi)
print(f"n={n:>7}: π ≈ {estimate:.6f}, error = {error:.6f}")
Output:
n= 100: π ≈ 3.080000, error = 0.061593
n= 1000: π ≈ 3.188000, error = 0.046407
n= 10000: π ≈ 3.127600, error = 0.013993
n= 100000: π ≈ 3.146680, error = 0.005087
n=1000000: π ≈ 3.141273, error = 0.000320
The error shrinks as sample size grows. This is the LLN enabling computational mathematics.
Software Engineering Applications
API Response Time Analysis
When benchmarking API performance, you need sufficient samples to get reliable latency estimates:
import numpy as np
from scipy import stats
def benchmark_api_latency(true_mean_ms=45, std_dev_ms=12):
"""Simulate API response times and show confidence interval convergence."""
np.random.seed(42)
sample_sizes = [10, 50, 100, 500, 1000]
print(f"True mean latency: {true_mean_ms}ms\n")
for n in sample_sizes:
# Simulate response times (using lognormal for realism)
samples = np.random.normal(true_mean_ms, std_dev_ms, n)
sample_mean = np.mean(samples)
sample_std = np.std(samples, ddof=1)
# 95% confidence interval
confidence_level = 0.95
degrees_freedom = n - 1
confidence_interval = stats.t.interval(
confidence_level,
degrees_freedom,
loc=sample_mean,
scale=sample_std / np.sqrt(n)
)
ci_width = confidence_interval[1] - confidence_interval[0]
print(f"n={n:>4}: Mean={sample_mean:.2f}ms, "
f"95% CI=[{confidence_interval[0]:.2f}, {confidence_interval[1]:.2f}], "
f"Width={ci_width:.2f}ms")
benchmark_api_latency()
Output shows confidence intervals narrowing with more samples—direct application of the LLN. You need enough requests to make reliable performance claims.
When the Law of Large Numbers Fails
The LLN requires independent, identically distributed (i.i.d.) samples. Violate these assumptions and you get garbage:
import numpy as np
def demonstrate_lln_failure():
"""Show what happens when LLN assumptions are violated."""
np.random.seed(42)
# FAILURE CASE 1: Non-identical distributions
print("Case 1: Non-identical distributions (changing mean)")
samples = []
for i in range(1000):
# Mean increases over time - violates identical distribution
sample = np.random.normal(loc=i/100, scale=1)
samples.append(sample)
running_mean = np.cumsum(samples) / np.arange(1, 1001)
print(f"Running mean keeps increasing: {running_mean[-1]:.2f}")
print(f"Does NOT converge to any fixed value!\n")
# FAILURE CASE 2: Dependent samples (autocorrelation)
print("Case 2: Dependent samples (autocorrelated time series)")
value = 0
samples = []
for i in range(1000):
# Each sample depends on previous value
value = 0.9 * value + np.random.normal(0, 1)
samples.append(value)
running_mean = np.cumsum(samples) / np.arange(1, 1001)
print(f"Sample mean: {running_mean[-1]:.2f}")
print(f"Convergence is slower and less reliable due to dependence")
demonstrate_lln_failure()
Common pitfall: treating time-series data as independent samples. If today’s metric depends on yesterday’s, you can’t blindly apply the LLN.
The gambler’s fallacy is another misunderstanding. After 5 heads in a row, the next flip is still 50/50. The LLN describes long-run averages, not short-term “balancing out.”
Best Practices for Production Systems
Sample size matters more than you think. For 95% confidence and reasonable precision, you typically need hundreds to thousands of samples. Quick rules:
- A/B tests: Minimum 350-400 conversions per variant for meaningful results
- Performance benchmarks: At least 1000 requests for stable percentile estimates
- Monte Carlo simulations: Start with 10,000 samples, increase if precision insufficient
Always report confidence intervals, not just point estimates. The sample mean converges to the true mean, but you need intervals to quantify uncertainty.
Check your assumptions. Before applying the LLN:
- Are samples independent? (No autocorrelation, no sequential dependencies)
- Are they identically distributed? (Same underlying process)
- Is the variance finite? (Heavy-tailed distributions can be problematic)
Use established libraries. Don’t implement statistical tests from scratch. Use scipy.stats, statsmodels, or similar battle-tested libraries.
The Law of Large Numbers isn’t just theoretical—it’s the foundation of data-driven engineering. Master it, respect its assumptions, and use it to make better decisions with your production systems.