Bernoulli Distribution in Python: Complete Guide

Key Insights

The Bernoulli distribution is the foundation of binary outcome modeling—master it and you’ll understand everything from A/B testing to logistic regression
SciPy’s bernoulli class provides a clean API, but NumPy’s binomial with n=1 offers 10-50x better performance for large-scale sampling
Always set random seeds at the start of your script, not before each random call, and use NumPy’s modern Generator API instead of the legacy np.random.seed()

Introduction to Bernoulli Distribution

The Bernoulli distribution is the simplest probability distribution you’ll encounter, yet it underpins much of statistical modeling. It describes any random experiment with exactly two outcomes: success (1) or failure (0). That’s it.

You encounter Bernoulli trials constantly: a user clicks or doesn’t click, a patient recovers or doesn’t, a manufactured part passes inspection or fails, a coin lands heads or tails. Each of these is a single draw from a Bernoulli distribution.

The distribution has one parameter: p, the probability of success. Everything else derives from this single value. This simplicity makes Bernoulli distributions an excellent starting point for understanding probability in code.

import random

def coin_flip(p_heads=0.5):
    """Simulate a single coin flip as a Bernoulli trial."""
    return 1 if random.random() < p_heads else 0

# Simulate 10 flips of a fair coin
random.seed(42)
flips = [coin_flip() for _ in range(10)]
print(f"Flips: {flips}")
print(f"Heads: {sum(flips)}, Tails: {10 - sum(flips)}")
# Output: Flips: [0, 0, 1, 1, 0, 1, 1, 1, 0, 1]
# Output: Heads: 6, Tails: 4

This manual implementation works, but Python’s scientific stack offers far more powerful tools.

Mathematical Properties

Before diving into libraries, understand what you’re computing. The Bernoulli distribution’s probability mass function (PMF) is straightforward:

P(X = 1) = p
P(X = 0) = 1 - p

Or more compactly: P(X = k) = p^k × (1-p)^(1-k) for k ∈ {0, 1}

The mean (expected value) equals p. If you flip a coin with 70% chance of heads thousands of times, you’ll average about 0.7 heads per flip.

The variance equals p(1-p). This peaks at p=0.5 (maximum uncertainty) and approaches zero as p approaches 0 or 1 (near certainty).

The Bernoulli distribution is actually a special case of the binomial distribution where n=1. A binomial distribution models the number of successes in n independent Bernoulli trials.

import scipy.stats as stats

# Define our probability of success
p = 0.7

# Manual calculations
manual_mean = p
manual_variance = p * (1 - p)
manual_std = manual_variance ** 0.5

print(f"Manual calculations for p={p}:")
print(f"  Mean: {manual_mean}")
print(f"  Variance: {manual_variance}")
print(f"  Std Dev: {manual_std:.4f}")

# Verify with scipy
bernoulli_dist = stats.bernoulli(p)
scipy_mean, scipy_variance = bernoulli_dist.stats(moments='mv')

print(f"\nSciPy verification:")
print(f"  Mean: {scipy_mean}")
print(f"  Variance: {scipy_variance}")

# They match exactly
assert manual_mean == scipy_mean
assert manual_variance == scipy_variance

Implementing Bernoulli with SciPy

SciPy’s scipy.stats.bernoulli provides a full-featured distribution object. You create it once with your probability parameter, then call methods for sampling, PMF calculation, CDF values, and more.

import scipy.stats as stats
import numpy as np

# Create a Bernoulli distribution with p=0.65
p = 0.65
dist = stats.bernoulli(p)

# Generate random samples
np.random.seed(42)
samples = dist.rvs(size=1000)

# Calculate empirical probabilities
empirical_p_success = samples.mean()
empirical_p_failure = 1 - empirical_p_success

# Theoretical probabilities from PMF
theoretical_p_success = dist.pmf(1)
theoretical_p_failure = dist.pmf(0)

print(f"Sample size: {len(samples)}")
print(f"\nP(X=1) - Success:")
print(f"  Theoretical: {theoretical_p_success:.4f}")
print(f"  Empirical:   {empirical_p_success:.4f}")
print(f"  Difference:  {abs(theoretical_p_success - empirical_p_success):.4f}")

print(f"\nP(X=0) - Failure:")
print(f"  Theoretical: {theoretical_p_failure:.4f}")
print(f"  Empirical:   {empirical_p_failure:.4f}")

# CDF values
print(f"\nCDF values:")
print(f"  P(X <= 0): {dist.cdf(0):.4f}")
print(f"  P(X <= 1): {dist.cdf(1):.4f}")

The empirical probabilities converge to theoretical values as sample size increases—a demonstration of the law of large numbers.

Visualization Techniques

Visualizing Bernoulli distributions helps build intuition about how the probability parameter affects outcomes. Since we only have two possible values, bar charts work best.

import matplotlib.pyplot as plt
import scipy.stats as stats
import numpy as np

# Compare three different probability values
p_values = [0.3, 0.5, 0.7]
outcomes = [0, 1]

fig, axes = plt.subplots(1, 3, figsize=(12, 4), sharey=True)

for ax, p in zip(axes, p_values):
    dist = stats.bernoulli(p)
    probabilities = [dist.pmf(k) for k in outcomes]
    
    bars = ax.bar(outcomes, probabilities, color=['#e74c3c', '#2ecc71'], 
                  edgecolor='black', linewidth=1.5, width=0.5)
    
    ax.set_xlabel('Outcome', fontsize=11)
    ax.set_ylabel('Probability' if p == 0.3 else '', fontsize=11)
    ax.set_title(f'Bernoulli(p={p})', fontsize=12, fontweight='bold')
    ax.set_xticks([0, 1])
    ax.set_xticklabels(['Failure (0)', 'Success (1)'])
    ax.set_ylim(0, 1)
    
    # Add probability labels on bars
    for bar, prob in zip(bars, probabilities):
        ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.02,
                f'{prob:.2f}', ha='center', fontsize=10)

plt.tight_layout()
plt.savefig('bernoulli_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

Notice how p=0.5 shows equal bar heights (maximum variance), while p=0.3 and p=0.7 are mirror images of each other.

Practical Applications

The Bernoulli distribution shines in A/B testing. When comparing two variants, each user’s conversion is a Bernoulli trial. Here’s a complete simulation:

import scipy.stats as stats
import numpy as np

def run_ab_test(p_control, p_treatment, n_samples, alpha=0.05):
    """
    Simulate an A/B test using Bernoulli distributions.
    Returns test results with statistical significance.
    """
    np.random.seed(42)
    
    # Generate conversion data
    control = stats.bernoulli(p_control).rvs(n_samples)
    treatment = stats.bernoulli(p_treatment).rvs(n_samples)
    
    # Calculate observed conversion rates
    control_rate = control.mean()
    treatment_rate = treatment.mean()
    
    # Two-proportion z-test
    pooled_rate = (control.sum() + treatment.sum()) / (2 * n_samples)
    se = np.sqrt(2 * pooled_rate * (1 - pooled_rate) / n_samples)
    z_stat = (treatment_rate - control_rate) / se
    p_value = 2 * (1 - stats.norm.cdf(abs(z_stat)))
    
    return {
        'control_rate': control_rate,
        'treatment_rate': treatment_rate,
        'lift': (treatment_rate - control_rate) / control_rate * 100,
        'z_statistic': z_stat,
        'p_value': p_value,
        'significant': p_value < alpha
    }

# Simulate: Control converts at 10%, Treatment at 12%
results = run_ab_test(p_control=0.10, p_treatment=0.12, n_samples=5000)

print("A/B Test Results")
print("=" * 40)
print(f"Control conversion rate:   {results['control_rate']:.2%}")
print(f"Treatment conversion rate: {results['treatment_rate']:.2%}")
print(f"Relative lift:             {results['lift']:.1f}%")
print(f"Z-statistic:               {results['z_statistic']:.3f}")
print(f"P-value:                   {results['p_value']:.4f}")
print(f"Statistically significant: {results['significant']}")

This pattern applies to any binary outcome comparison: click-through rates, email open rates, defect rates in manufacturing.

NumPy Alternative Implementation

For high-performance applications, NumPy’s random number generation outperforms SciPy significantly. The trick: use binomial(n=1, p), which is mathematically equivalent to a Bernoulli distribution.

import numpy as np
import scipy.stats as stats
import time

def benchmark_sampling(n_samples, p, n_iterations=100):
    """Compare scipy.stats.bernoulli vs numpy.random.Generator."""
    
    # SciPy approach
    scipy_dist = stats.bernoulli(p)
    start = time.perf_counter()
    for _ in range(n_iterations):
        _ = scipy_dist.rvs(size=n_samples)
    scipy_time = (time.perf_counter() - start) / n_iterations
    
    # NumPy Generator approach (modern API)
    rng = np.random.default_rng(42)
    start = time.perf_counter()
    for _ in range(n_iterations):
        _ = rng.binomial(n=1, p=p, size=n_samples)
    numpy_time = (time.perf_counter() - start) / n_iterations
    
    return scipy_time, numpy_time

# Benchmark with increasing sample sizes
sample_sizes = [1_000, 10_000, 100_000, 1_000_000]

print(f"{'Samples':<12} {'SciPy (ms)':<12} {'NumPy (ms)':<12} {'Speedup':<10}")
print("-" * 46)

for n in sample_sizes:
    scipy_t, numpy_t = benchmark_sampling(n, p=0.5)
    speedup = scipy_t / numpy_t
    print(f"{n:<12} {scipy_t*1000:<12.3f} {numpy_t*1000:<12.3f} {speedup:<10.1f}x")

NumPy typically achieves 10-50x speedups for large sample sizes. Use SciPy when you need the full distribution API (PMF, CDF, moments); use NumPy when you’re generating millions of samples.

Common Pitfalls and Best Practices

Edge cases matter. When p=0 or p=1, you have a degenerate distribution—every trial produces the same outcome. This can cause division-by-zero errors in variance calculations or break downstream statistical tests.

import numpy as np
import scipy.stats as stats

# Proper seed management with NumPy's modern API
def reproducible_experiment():
    """Demonstrate proper random seed handling."""
    
    # Create a Generator instance with a seed
    rng = np.random.default_rng(seed=12345)
    
    # All random operations use this generator
    sample1 = rng.binomial(n=1, p=0.5, size=10)
    sample2 = rng.binomial(n=1, p=0.7, size=10)
    
    return sample1, sample2

# Running twice produces identical results
result_a = reproducible_experiment()
result_b = reproducible_experiment()

print(f"First run:  {result_a[0]}")
print(f"Second run: {result_b[0]}")
print(f"Identical: {np.array_equal(result_a[0], result_b[0])}")

# For scipy, set numpy's legacy seed before calling rvs
np.random.seed(42)
scipy_samples = stats.bernoulli(0.5).rvs(10)
print(f"\nSciPy with seed: {scipy_samples}")

When to use Bernoulli vs. Binomial: Use Bernoulli when modeling individual trials or when you need the raw 0/1 outcomes. Use Binomial when you only care about the count of successes across n trials—it’s more memory efficient than storing n individual Bernoulli outcomes.

Avoid the legacy API. Don’t use np.random.seed() and np.random.binomial() in new code. The Generator API (np.random.default_rng()) offers better statistical properties and thread safety.

The Bernoulli distribution’s simplicity is deceptive. Master it thoroughly—understand its properties, know when SciPy’s convenience beats NumPy’s speed, and always handle edge cases—and you’ll have a solid foundation for more complex probabilistic modeling.