Law of Large Numbers: Formula and Examples

Key Insights

• The Law of Large Numbers guarantees that sample averages converge to expected values as sample size increases, forming the mathematical foundation for statistical inference and Monte Carlo methods • The weak law proves convergence in probability while the strong law proves almost sure convergence—a subtle but important distinction that affects how we reason about long-run behavior • Understanding LLN prevents critical errors like the gambler’s fallacy and helps engineers determine appropriate sample sizes for A/B tests, simulations, and machine learning experiments

Introduction to the Law of Large Numbers

The Law of Large Numbers (LLN) is one of the most fundamental theorems in probability theory, yet it’s frequently misunderstood by practitioners. At its core, the law states that as you increase your sample size, the sample mean converges to the expected value (theoretical mean) of the distribution. This isn’t just mathematical curiosity—it’s the principle that makes statistical inference possible.

There are two forms: the weak law and the strong law. Both guarantee convergence, but they differ in the type of convergence they prove. The weak law states that the probability of the sample mean deviating from the expected value by more than any fixed amount approaches zero. The strong law makes an even stronger claim: with probability one, the sample mean will eventually converge to the expected value.

For software engineers and data scientists, LLN underpins everything from A/B testing to Monte Carlo simulations. It tells us why running more experiments gives us better estimates and why casinos always profit in the long run.

Mathematical Foundation and Formula

The weak law of large numbers can be formally stated as follows:

Let X₁, X₂, …, Xₙ be independent and identically distributed random variables with expected value μ and finite variance σ². Define the sample mean as X̄ₙ = (X₁ + X₂ + … + Xₙ) / n.

Then for any ε > 0:

lim P(|X̄ₙ - μ| > ε) = 0
n→∞

This reads: “The probability that the sample mean differs from the expected value by more than epsilon approaches zero as n approaches infinity.”

The key terms:

Sample mean (X̄ₙ): The average of your observed samples
Expected value (μ): The theoretical mean of the distribution
Convergence in probability: The probability of being far from the target shrinks to zero

Here’s a basic implementation of sample mean calculation:

import numpy as np

def calculate_sample_mean(samples):
    """Calculate sample mean from a list of observations."""
    return np.sum(samples) / len(samples)

# Example: Rolling a fair die
die_rolls = np.random.randint(1, 7, size=1000)
sample_mean = calculate_sample_mean(die_rolls)
expected_value = 3.5

print(f"Sample mean: {sample_mean:.4f}")
print(f"Expected value: {expected_value}")
print(f"Difference: {abs(sample_mean - expected_value):.4f}")

Weak vs. Strong Law of Large Numbers

The distinction between weak and strong laws is subtle but important. The weak law guarantees convergence in probability—for any fixed deviation threshold, the probability of exceeding it eventually becomes negligible. The strong law guarantees almost sure convergence—the sample mean will actually equal the expected value in the limit, except on a set of outcomes with probability zero.

Think of it this way: the weak law says “it becomes increasingly unlikely to be wrong,” while the strong law says “you will eventually be right and stay right.”

For practical purposes, both laws give us the same confidence in large samples. The strong law is theoretically more powerful but requires stricter conditions (finite first moment vs. finite variance).

import numpy as np
import matplotlib.pyplot as plt

def demonstrate_convergence(n_trials=10000, n_paths=5):
    """Show convergence for multiple simulation paths."""
    np.random.seed(42)
    
    # Simulate coin flips (1 for heads, 0 for tails)
    # Expected value = 0.5
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
    
    for path in range(n_paths):
        flips = np.random.binomial(1, 0.5, n_trials)
        running_mean = np.cumsum(flips) / np.arange(1, n_trials + 1)
        
        ax1.plot(running_mean, alpha=0.6, linewidth=0.8)
        ax2.semilogx(running_mean, alpha=0.6, linewidth=0.8)
    
    for ax in [ax1, ax2]:
        ax.axhline(y=0.5, color='r', linestyle='--', label='Expected Value')
        ax.set_xlabel('Number of Trials')
        ax.set_ylabel('Sample Mean')
        ax.legend()
    
    ax1.set_title('Linear Scale')
    ax2.set_title('Log Scale (shows early convergence)')
    plt.tight_layout()
    plt.savefig('lln_convergence.png', dpi=150)
    plt.close()

demonstrate_convergence()

Practical Demonstration: Coin Flip Simulation

Let’s walk through the classic coin flip example with increasing sample sizes. We’ll see how the proportion of heads converges to 0.5.

import numpy as np
import matplotlib.pyplot as plt

def coin_flip_convergence():
    """Demonstrate LLN with coin flips at different sample sizes."""
    np.random.seed(42)
    
    sample_sizes = [10, 100, 1000, 10000, 100000, 1000000]
    results = []
    
    for n in sample_sizes:
        flips = np.random.binomial(1, 0.5, n)
        proportion_heads = np.mean(flips)
        results.append({
            'n': n,
            'proportion': proportion_heads,
            'deviation': abs(proportion_heads - 0.5)
        })
        print(f"n={n:>7}: proportion={proportion_heads:.6f}, "
              f"deviation={abs(proportion_heads - 0.5):.6f}")
    
    # Visualize convergence
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
    
    # Show running average for largest sample
    large_sample = np.random.binomial(1, 0.5, 100000)
    running_avg = np.cumsum(large_sample) / np.arange(1, 100001)
    
    ax1.plot(running_avg, linewidth=0.5)
    ax1.axhline(y=0.5, color='r', linestyle='--', label='Expected Value')
    ax1.set_xlabel('Number of Flips')
    ax1.set_ylabel('Proportion of Heads')
    ax1.set_title('Convergence Over Time')
    ax1.legend()
    
    # Show deviation vs sample size
    ns = [r['n'] for r in results]
    devs = [r['deviation'] for r in results]
    ax2.loglog(ns, devs, 'o-', linewidth=2, markersize=8)
    ax2.set_xlabel('Sample Size (log scale)')
    ax2.set_ylabel('Absolute Deviation (log scale)')
    ax2.set_title('Deviation Decreases with Sample Size')
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig('coin_flip_convergence.png', dpi=150)
    plt.close()

coin_flip_convergence()

Real-World Applications

Monte Carlo Simulations: LLN is the theoretical justification for Monte Carlo methods. By sampling random scenarios, we can estimate complex integrals and expectations.

import numpy as np

def estimate_pi(n_samples):
    """Estimate π using Monte Carlo integration."""
    # Generate random points in unit square
    x = np.random.uniform(-1, 1, n_samples)
    y = np.random.uniform(-1, 1, n_samples)
    
    # Count points inside unit circle
    inside_circle = (x**2 + y**2) <= 1
    pi_estimate = 4 * np.mean(inside_circle)
    
    return pi_estimate

# Show improvement with sample size
for n in [100, 1000, 10000, 100000, 1000000]:
    estimate = estimate_pi(n)
    error = abs(estimate - np.pi)
    print(f"n={n:>7}: π≈{estimate:.6f}, error={error:.6f}")

A/B Testing: When running experiments, LLN tells us that larger sample sizes give more reliable estimates of true conversion rates. This is why you need to wait for sufficient traffic before declaring a winner.

Insurance and Risk Modeling: Insurance companies rely on LLN to predict aggregate claims. While individual claims are unpredictable, the average claim across thousands of policies converges to the expected value, allowing accurate premium pricing.

Casino Profitability: Casinos use LLN to guarantee profits. Each game has a negative expected value for the player. Over millions of bets, the casino’s actual profit converges to this expected value.

Common Misconceptions and Pitfalls

The most dangerous misconception is the gambler’s fallacy—the belief that past results influence future independent events. If a fair coin lands heads five times in a row, many people incorrectly believe tails is “due.” LLN doesn’t work this way.

import numpy as np

def gamblers_fallacy_demo():
    """Demonstrate that past results don't affect future probabilities."""
    np.random.seed(42)
    
    # Simulate scenarios where we've seen 5 heads in a row
    n_simulations = 10000
    next_flip_heads = 0
    
    for _ in range(n_simulations):
        # We don't even need to simulate the first 5 flips
        # The next flip is independent
        next_flip = np.random.binomial(1, 0.5)
        next_flip_heads += next_flip
    
    proportion_heads = next_flip_heads / n_simulations
    print(f"After 5 heads in a row, next flip is heads: {proportion_heads:.4f}")
    print("Expected: 0.5000 (completely independent!)")

gamblers_fallacy_demo()

Convergence Rate: LLN doesn’t specify how fast convergence happens. The rate depends on the variance of the distribution. High variance requires larger samples for the same accuracy.

Sample Size Requirements: “Large” is relative. For low-variance distributions, hundreds of samples might suffice. For heavy-tailed distributions, you might need millions.

Implementation Considerations for Engineers

When implementing systems that rely on LLN, consider these practical aspects:

Memory Efficiency: Don’t store all samples when you only need the mean. Use streaming algorithms:

class StreamingMean:
    """Calculate running mean without storing all samples."""
    
    def __init__(self):
        self.n = 0
        self.mean = 0.0
    
    def update(self, value):
        """Update mean with new observation."""
        self.n += 1
        # Incremental mean formula: μₙ = μₙ₋₁ + (xₙ - μₙ₋₁)/n
        self.mean += (value - self.mean) / self.n
    
    def get_mean(self):
        return self.mean

# Example usage
stream = StreamingMean()
for value in np.random.normal(10, 2, 1000000):
    stream.update(value)

print(f"Streaming mean: {stream.get_mean():.4f}")
print(f"Expected value: 10.0000")

Sample Size Determination: Use power analysis to determine required sample sizes for A/B tests. Don’t just “wait and see”—calculate upfront how much data you need for statistical significance.

Numerical Stability: When dealing with very large samples, be aware of floating-point precision issues. The streaming mean formula above is more numerically stable than summing all values and dividing.

The Law of Large Numbers isn’t just a theoretical result—it’s a practical tool that tells us when we can trust our estimates and how much data we need to collect. Understanding it deeply prevents costly mistakes in production systems and experimental design.