Beta Distribution in Python: Complete Guide
The beta distribution answers a question that comes up constantly in data science: 'I know something is a probability between 0 and 1, but how certain am I about its exact value?'
Key Insights
- The beta distribution models probabilities themselves, making it the natural choice for representing uncertainty about rates, proportions, and conversion metrics in A/B testing and Bayesian inference.
- With just two shape parameters (α and β), you can represent everything from uniform uncertainty to highly confident beliefs—α represents “successes seen” and β represents “failures seen” in the intuitive Bayesian interpretation.
- Python’s
scipy.stats.betaprovides a complete toolkit for working with beta distributions, from generating samples to fitting parameters to real data.
Introduction to the Beta Distribution
The beta distribution answers a question that comes up constantly in data science: “I know something is a probability between 0 and 1, but how certain am I about its exact value?”
Unlike the normal distribution, which spans negative infinity to positive infinity, the beta distribution lives exclusively on the interval [0, 1]. This makes it perfect for modeling proportions, rates, and probabilities. Think conversion rates, click-through rates, success probabilities, or any metric bounded between zero and one.
The distribution has two shape parameters: alpha (α) and beta (β). Here’s the intuition that will save you hours of confusion: think of α as “successes plus one” and β as “failures plus one.” If you’ve seen 10 conversions and 90 non-conversions, a Beta(11, 91) distribution represents your updated belief about the true conversion rate.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
fig, axes = plt.subplots(2, 2, figsize=(10, 8))
x = np.linspace(0, 1, 200)
params = [
(1, 1, "α=1, β=1 (Uniform)"),
(5, 5, "α=5, β=5 (Symmetric)"),
(2, 8, "α=2, β=8 (Left-skewed)"),
(0.5, 0.5, "α=0.5, β=0.5 (U-shaped)")
]
for ax, (a, b, title) in zip(axes.flat, params):
y = stats.beta.pdf(x, a, b)
ax.plot(x, y, 'b-', linewidth=2)
ax.fill_between(x, y, alpha=0.3)
ax.set_title(title)
ax.set_xlabel('x')
ax.set_ylabel('Density')
plt.tight_layout()
plt.savefig('beta_shapes.png', dpi=150)
Working with Beta Distribution in SciPy
SciPy’s scipy.stats.beta gives you everything you need. The API follows SciPy’s consistent pattern for all distributions, so once you learn it for beta, you’ve learned it for dozens of others.
from scipy import stats
import numpy as np
# Create a beta distribution object
alpha, beta_param = 10, 30
dist = stats.beta(alpha, beta_param)
# Core methods
x = 0.25
print(f"PDF at x={x}: {dist.pdf(x):.4f}") # Probability density
print(f"CDF at x={x}: {dist.cdf(x):.4f}") # P(X <= x)
print(f"PPF at p=0.95: {dist.ppf(0.95):.4f}") # 95th percentile
# Generate random samples
samples = dist.rvs(size=10000, random_state=42)
# Summary statistics
print(f"\nTheoretical mean: {dist.mean():.4f}")
print(f"Sample mean: {samples.mean():.4f}")
print(f"Theoretical variance: {dist.var():.4f}")
print(f"Sample variance: {samples.var():.4f}")
# Credible interval (Bayesian equivalent of confidence interval)
lower, upper = dist.ppf(0.025), dist.ppf(0.975)
print(f"95% credible interval: [{lower:.4f}, {upper:.4f}]")
The ppf method (percent point function) is the inverse of cdf. It’s essential for computing credible intervals. If you want the range that contains 95% of the probability mass, grab the 2.5th and 97.5th percentiles.
Parameter Estimation and Fitting
Real-world data rarely comes with parameters attached. You need to estimate α and β from observations. SciPy offers two approaches.
Method of moments uses the sample mean and variance to solve for parameters. It’s fast but can be less accurate:
def beta_mom_estimate(data):
"""Estimate beta parameters using method of moments."""
mean = np.mean(data)
var = np.var(data)
# Derived from beta distribution mean and variance formulas
common = mean * (1 - mean) / var - 1
alpha = mean * common
beta = (1 - mean) * common
return alpha, beta
# Simulate some proportion data
true_alpha, true_beta = 5, 15
data = stats.beta.rvs(true_alpha, true_beta, size=500, random_state=42)
mom_alpha, mom_beta = beta_mom_estimate(data)
print(f"True: α={true_alpha}, β={true_beta}")
print(f"MoM estimate: α={mom_alpha:.2f}, β={mom_beta:.2f}")
Maximum likelihood estimation via beta.fit() is more robust:
# MLE estimation (SciPy's fit method)
# floc=0, fscale=1 fixes the support to [0,1]
mle_alpha, mle_beta, _, _ = stats.beta.fit(data, floc=0, fscale=1)
print(f"MLE estimate: α={mle_alpha:.2f}, β={mle_beta:.2f}")
# Compare fitted distribution to data
fig, ax = plt.subplots(figsize=(8, 5))
ax.hist(data, bins=30, density=True, alpha=0.7, label='Data')
x = np.linspace(0, 1, 200)
ax.plot(x, stats.beta.pdf(x, mle_alpha, mle_beta), 'r-',
linewidth=2, label=f'Fitted (α={mle_alpha:.1f}, β={mle_beta:.1f})')
ax.plot(x, stats.beta.pdf(x, true_alpha, true_beta), 'g--',
linewidth=2, label=f'True (α={true_alpha}, β={true_beta})')
ax.legend()
ax.set_xlabel('Value')
ax.set_ylabel('Density')
plt.savefig('beta_fit.png', dpi=150)
Bayesian Inference with Beta-Binomial Model
Here’s where the beta distribution shines. It’s the conjugate prior for binomial data, meaning when you combine a beta prior with binomial observations, you get a beta posterior. The math stays clean.
The update rule is simple: if your prior is Beta(α, β) and you observe k successes in n trials, your posterior is Beta(α + k, β + n - k).
def bayesian_update(prior_alpha, prior_beta, successes, trials):
"""Update beta prior with binomial observations."""
posterior_alpha = prior_alpha + successes
posterior_beta = prior_beta + (trials - successes)
return posterior_alpha, posterior_beta
# A/B testing scenario
# Prior belief: conversion rate around 10%, but uncertain
prior_a, prior_b = 2, 18 # Mean = 2/20 = 0.10
# Observed data: 45 conversions out of 500 visitors
conversions, visitors = 45, 500
# Update beliefs
post_a, post_b = bayesian_update(prior_a, prior_b, conversions, visitors)
prior_dist = stats.beta(prior_a, prior_b)
posterior_dist = stats.beta(post_a, post_b)
print(f"Prior mean: {prior_dist.mean():.3f}")
print(f"Posterior mean: {posterior_dist.mean():.3f}")
print(f"Observed rate: {conversions/visitors:.3f}")
print(f"95% credible interval: [{posterior_dist.ppf(0.025):.3f}, {posterior_dist.ppf(0.975):.3f}]")
Practical Applications
Let’s build a complete Bayesian A/B test calculator. This is something you’ll actually use:
class BayesianABTest:
"""Simple Bayesian A/B test using beta-binomial model."""
def __init__(self, prior_alpha=1, prior_beta=1):
self.prior_alpha = prior_alpha
self.prior_beta = prior_beta
def get_posterior(self, conversions, visitors):
"""Return posterior beta distribution."""
post_a = self.prior_alpha + conversions
post_b = self.prior_beta + (visitors - conversions)
return stats.beta(post_a, post_b)
def prob_b_beats_a(self, a_conversions, a_visitors,
b_conversions, b_visitors, n_samples=100000):
"""Monte Carlo estimate of P(B > A)."""
post_a = self.get_posterior(a_conversions, a_visitors)
post_b = self.get_posterior(b_conversions, b_visitors)
samples_a = post_a.rvs(n_samples)
samples_b = post_b.rvs(n_samples)
return (samples_b > samples_a).mean()
def expected_loss(self, a_conversions, a_visitors,
b_conversions, b_visitors, n_samples=100000):
"""Expected loss if we choose B but A is actually better."""
post_a = self.get_posterior(a_conversions, a_visitors)
post_b = self.get_posterior(b_conversions, b_visitors)
samples_a = post_a.rvs(n_samples)
samples_b = post_b.rvs(n_samples)
loss = np.maximum(samples_a - samples_b, 0).mean()
return loss
# Run the test
test = BayesianABTest(prior_alpha=1, prior_beta=1)
# Variant A: 120 conversions, 1000 visitors
# Variant B: 145 conversions, 1000 visitors
prob_b_wins = test.prob_b_beats_a(120, 1000, 145, 1000)
loss = test.expected_loss(120, 1000, 145, 1000)
print(f"P(B beats A): {prob_b_wins:.1%}")
print(f"Expected loss if choosing B: {loss:.4f}")
Visualization and Interpretation
Visualizing prior-to-posterior updates helps build intuition:
def plot_ab_test_posteriors(a_conv, a_vis, b_conv, b_vis, prior_a=1, prior_b=1):
"""Visualize A/B test posterior distributions."""
fig, ax = plt.subplots(figsize=(10, 6))
x = np.linspace(0, 0.25, 500)
# Prior
prior = stats.beta(prior_a, prior_b)
# Posteriors
post_a = stats.beta(prior_a + a_conv, prior_b + a_vis - a_conv)
post_b = stats.beta(prior_a + b_conv, prior_b + b_vis - b_conv)
ax.fill_between(x, post_a.pdf(x), alpha=0.4, label=f'A: {a_conv}/{a_vis}')
ax.fill_between(x, post_b.pdf(x), alpha=0.4, label=f'B: {b_conv}/{b_vis}')
ax.axvline(post_a.mean(), color='C0', linestyle='--', alpha=0.7)
ax.axvline(post_b.mean(), color='C1', linestyle='--', alpha=0.7)
ax.set_xlabel('Conversion Rate')
ax.set_ylabel('Density')
ax.set_title('Posterior Distributions for A/B Test')
ax.legend()
return fig
fig = plot_ab_test_posteriors(120, 1000, 145, 1000)
plt.savefig('ab_test_posteriors.png', dpi=150)
Tips, Edge Cases, and Best Practices
Handling α, β < 1: When both parameters are less than 1, you get a U-shaped distribution. This is valid but represents a belief that the true value is likely near 0 or 1, not in the middle. Use this when you expect extreme probabilities.
Numerical stability: For very large α and β, the beta distribution approaches a normal distribution. If you’re dealing with millions of observations, consider using normal approximation for speed.
When not to use beta: If you’re modeling multiple categories (not just success/failure), use the Dirichlet distribution—it’s the multivariate generalization. For continuous data that happens to be bounded, consider a truncated normal instead.
def beta_summary(alpha, beta_param):
"""Utility function for quick beta distribution summary."""
dist = stats.beta(alpha, beta_param)
return {
'mean': dist.mean(),
'median': dist.median(),
'mode': (alpha - 1) / (alpha + beta_param - 2) if alpha > 1 and beta_param > 1 else None,
'variance': dist.var(),
'ci_95': (dist.ppf(0.025), dist.ppf(0.975))
}
def combine_beta_posteriors(posteriors, n_samples=100000):
"""Sample from multiple beta posteriors for hierarchical models."""
samples = [stats.beta(a, b).rvs(n_samples) for a, b in posteriors]
return np.column_stack(samples)
# Example usage
summary = beta_summary(45 + 1, 500 - 45 + 1)
print(f"Conversion rate estimate: {summary['mean']:.3f}")
print(f"95% CI: [{summary['ci_95'][0]:.3f}, {summary['ci_95'][1]:.3f}]")
The beta distribution is one of those tools that, once you understand it, you’ll reach for constantly. It bridges the gap between frequentist point estimates and full Bayesian uncertainty quantification without requiring complex MCMC machinery. Start with simple A/B tests, and you’ll find yourself applying it everywhere proportions appear in your data.