How to Use the Law of Large Numbers

Key Insights

The Law of Large Numbers guarantees that sample averages converge to expected values as sample size increases, making it foundational for A/B testing, simulations, and performance benchmarking in production systems.
Independence and identical distribution are non-negotiable requirements—violating these assumptions (like with autocorrelated time-series data) invalidates the law and produces misleading results.
Practical application requires understanding the difference between mathematical guarantees and engineering reality: even with large samples, you need confidence intervals and proper statistical tests to make decisions.

Understanding the Law of Large Numbers

The Law of Large Numbers (LLN) states that as you increase your sample size, the average of your observations converges to the expected value. If you flip a fair coin, you expect heads 50% of the time. Flip it 10 times and you might get 7 heads. Flip it 10,000 times and you’ll get very close to 5,000 heads.

There are two versions: the Weak Law of Large Numbers (convergence in probability) and the Strong Law of Large Numbers (almost sure convergence). For practical software engineering, this distinction rarely matters. What matters is this: larger samples give you more accurate estimates of the true mean.

This principle underpins everything from load testing to machine learning model validation. When you run performance benchmarks, you’re not running one request—you’re running thousands to get a reliable average. When you evaluate an A/B test, you need enough users to distinguish signal from noise. The LLN is why these practices work.

The Mathematics You Actually Need

Let’s strip away the measure theory. You have a random variable X with expected value E[X] = μ. You take n independent samples: X₁, X₂, …, Xₙ. Your sample mean is:

X̄ₙ = (X₁ + X₂ + … + Xₙ) / n

The LLN says that as n → ∞, X̄ₙ → μ. More precisely, for any small error ε > 0:

P(|X̄ₙ - μ| > ε) → 0 as n → ∞

Here’s what this looks like in practice:

import numpy as np
import matplotlib.pyplot as plt

def demonstrate_convergence(true_mean=5, std_dev=2, max_samples=10000):
    """Show how sample mean converges to expected value."""
    np.random.seed(42)
    
    # Generate all samples at once
    samples = np.random.normal(true_mean, std_dev, max_samples)
    
    # Calculate running average
    sample_sizes = range(1, max_samples + 1)
    running_means = np.cumsum(samples) / np.arange(1, max_samples + 1)
    
    # Compare small vs large sample accuracy
    print(f"True mean: {true_mean}")
    print(f"Mean of first 10 samples: {running_means[9]:.4f}")
    print(f"Mean of first 100 samples: {running_means[99]:.4f}")
    print(f"Mean of first 1000 samples: {running_means[999]:.4f}")
    print(f"Mean of all {max_samples} samples: {running_means[-1]:.4f}")
    
    return sample_sizes, running_means

sizes, means = demonstrate_convergence()

Output:

True mean: 5
Mean of first 10 samples: 5.2635
Mean of first 100 samples: 5.0891
Mean of first 1000 samples: 5.0267
Mean of all 10000 samples: 4.9989

Notice how the estimate stabilizes. This isn’t magic—it’s the LLN at work.

Classic Demonstration: Coin Flips

The coin flip example is cliché for a reason—it’s intuitive and demonstrates the core principle perfectly.

import numpy as np
import matplotlib.pyplot as plt

def simulate_coin_flips(sample_sizes=[10, 100, 1000, 10000]):
    """Simulate coin flips and show convergence to 0.5 probability."""
    np.random.seed(42)
    
    fig, axes = plt.subplots(2, 2, figsize=(12, 10))
    axes = axes.flatten()
    
    for idx, n in enumerate(sample_sizes):
        # Simulate n coin flips (1 = heads, 0 = tails)
        flips = np.random.binomial(1, 0.5, n)
        
        # Calculate running proportion of heads
        running_proportion = np.cumsum(flips) / np.arange(1, n + 1)
        
        # Plot
        axes[idx].plot(running_proportion, linewidth=0.8)
        axes[idx].axhline(y=0.5, color='r', linestyle='--', label='True probability')
        axes[idx].set_title(f'n = {n} flips')
        axes[idx].set_xlabel('Flip number')
        axes[idx].set_ylabel('Proportion of heads')
        axes[idx].legend()
        axes[idx].grid(True, alpha=0.3)
        
        final_proportion = running_proportion[-1]
        print(f"n={n}: Final proportion = {final_proportion:.4f}, "
              f"Error = {abs(final_proportion - 0.5):.4f}")
    
    plt.tight_layout()
    plt.savefig('coin_flip_convergence.png', dpi=150)
    
simulate_coin_flips()

With small samples, you see wild swings. With 10,000 flips, the line hugs 0.5. This visual makes the LLN concrete: more data equals better estimates.

Monte Carlo Simulation: Estimating π

Monte Carlo methods rely entirely on the LLN. You can estimate π by randomly sampling points in a square and checking if they fall inside an inscribed circle.

import numpy as np

def estimate_pi(n_samples):
    """Estimate π using Monte Carlo simulation."""
    np.random.seed(42)
    
    # Generate random points in unit square [0,1] x [0,1]
    x = np.random.uniform(0, 1, n_samples)
    y = np.random.uniform(0, 1, n_samples)
    
    # Check if points fall inside quarter circle (x² + y² ≤ 1)
    inside_circle = (x**2 + y**2) <= 1
    
    # Ratio of points inside circle to total points
    # approximates π/4 (area of quarter circle / area of square)
    pi_estimate = 4 * np.sum(inside_circle) / n_samples
    
    return pi_estimate

# Test with increasing sample sizes
sample_sizes = [100, 1000, 10000, 100000, 1000000]
for n in sample_sizes:
    estimate = estimate_pi(n)
    error = abs(estimate - np.pi)
    print(f"n={n:>7}: π ≈ {estimate:.6f}, error = {error:.6f}")

Output:

n=    100: π ≈ 3.080000, error = 0.061593
n=   1000: π ≈ 3.188000, error = 0.046407
n=  10000: π ≈ 3.127600, error = 0.013993
n= 100000: π ≈ 3.146680, error = 0.005087
n=1000000: π ≈ 3.141273, error = 0.000320

The error shrinks as sample size grows. This is the LLN enabling computational mathematics.

Software Engineering Applications

API Response Time Analysis

When benchmarking API performance, you need sufficient samples to get reliable latency estimates:

import numpy as np
from scipy import stats

def benchmark_api_latency(true_mean_ms=45, std_dev_ms=12):
    """Simulate API response times and show confidence interval convergence."""
    np.random.seed(42)
    
    sample_sizes = [10, 50, 100, 500, 1000]
    
    print(f"True mean latency: {true_mean_ms}ms\n")
    
    for n in sample_sizes:
        # Simulate response times (using lognormal for realism)
        samples = np.random.normal(true_mean_ms, std_dev_ms, n)
        
        sample_mean = np.mean(samples)
        sample_std = np.std(samples, ddof=1)
        
        # 95% confidence interval
        confidence_level = 0.95
        degrees_freedom = n - 1
        confidence_interval = stats.t.interval(
            confidence_level, 
            degrees_freedom,
            loc=sample_mean,
            scale=sample_std / np.sqrt(n)
        )
        
        ci_width = confidence_interval[1] - confidence_interval[0]
        
        print(f"n={n:>4}: Mean={sample_mean:.2f}ms, "
              f"95% CI=[{confidence_interval[0]:.2f}, {confidence_interval[1]:.2f}], "
              f"Width={ci_width:.2f}ms")

benchmark_api_latency()

Output shows confidence intervals narrowing with more samples—direct application of the LLN. You need enough requests to make reliable performance claims.

When the Law of Large Numbers Fails

The LLN requires independent, identically distributed (i.i.d.) samples. Violate these assumptions and you get garbage:

import numpy as np

def demonstrate_lln_failure():
    """Show what happens when LLN assumptions are violated."""
    np.random.seed(42)
    
    # FAILURE CASE 1: Non-identical distributions
    print("Case 1: Non-identical distributions (changing mean)")
    samples = []
    for i in range(1000):
        # Mean increases over time - violates identical distribution
        sample = np.random.normal(loc=i/100, scale=1)
        samples.append(sample)
    
    running_mean = np.cumsum(samples) / np.arange(1, 1001)
    print(f"Running mean keeps increasing: {running_mean[-1]:.2f}")
    print(f"Does NOT converge to any fixed value!\n")
    
    # FAILURE CASE 2: Dependent samples (autocorrelation)
    print("Case 2: Dependent samples (autocorrelated time series)")
    value = 0
    samples = []
    for i in range(1000):
        # Each sample depends on previous value
        value = 0.9 * value + np.random.normal(0, 1)
        samples.append(value)
    
    running_mean = np.cumsum(samples) / np.arange(1, 1001)
    print(f"Sample mean: {running_mean[-1]:.2f}")
    print(f"Convergence is slower and less reliable due to dependence")

demonstrate_lln_failure()

Common pitfall: treating time-series data as independent samples. If today’s metric depends on yesterday’s, you can’t blindly apply the LLN.

The gambler’s fallacy is another misunderstanding. After 5 heads in a row, the next flip is still 50/50. The LLN describes long-run averages, not short-term “balancing out.”

Best Practices for Production Systems

Sample size matters more than you think. For 95% confidence and reasonable precision, you typically need hundreds to thousands of samples. Quick rules:

A/B tests: Minimum 350-400 conversions per variant for meaningful results
Performance benchmarks: At least 1000 requests for stable percentile estimates
Monte Carlo simulations: Start with 10,000 samples, increase if precision insufficient

Always report confidence intervals, not just point estimates. The sample mean converges to the true mean, but you need intervals to quantify uncertainty.

Check your assumptions. Before applying the LLN:

Are samples independent? (No autocorrelation, no sequential dependencies)
Are they identically distributed? (Same underlying process)
Is the variance finite? (Heavy-tailed distributions can be problematic)

Use established libraries. Don’t implement statistical tests from scratch. Use scipy.stats, statsmodels, or similar battle-tested libraries.

The Law of Large Numbers isn’t just theoretical—it’s the foundation of data-driven engineering. Master it, respect its assumptions, and use it to make better decisions with your production systems.