Bayes' Theorem: Formula and Examples
Bayes' Theorem, formulated by Reverend Thomas Bayes in the 18th century, is one of the most powerful tools in probability theory and statistical inference. Despite its age, it's more relevant than...
Key Insights
- Bayes’ Theorem reverses conditional probabilities, letting you calculate P(A|B) from P(B|A), which is often far easier to measure directly
- The theorem’s power lies in updating beliefs with new evidence—the foundation of spam filters, medical diagnostics, and modern machine learning
- Most Bayesian reasoning failures stem from ignoring base rates (priors), leading to wildly incorrect conclusions despite accurate test data
Introduction to Bayes’ Theorem
Bayes’ Theorem, formulated by Reverend Thomas Bayes in the 18th century, is one of the most powerful tools in probability theory and statistical inference. Despite its age, it’s more relevant than ever—powering spam filters in your inbox, helping doctors interpret medical tests, and forming the backbone of many machine learning algorithms.
The theorem solves a fundamental problem: we often know the probability of observing evidence given a hypothesis, but what we really want is the probability of the hypothesis given the evidence. For instance, a doctor knows the probability a sick patient tests positive, but needs to know the probability a patient with a positive test is actually sick. Bayes’ Theorem provides the mathematical bridge between these two probabilities.
You should reach for Bayes’ Theorem whenever you need to update beliefs based on new evidence, especially when dealing with uncertain information or making predictions from observed data. It’s particularly valuable when direct measurement is difficult but the reverse probability is known.
Understanding the Formula
The canonical form of Bayes’ Theorem is:
P(A|B) = P(B|A) × P(A) / P(B)
Let’s break down each component:
- P(A|B) - The posterior probability: what we want to know (probability of A given we observed B)
- P(B|A) - The likelihood: probability of observing B if A is true
- P(A) - The prior probability: what we believe about A before seeing evidence B
- P(B) - The evidence or marginal probability: total probability of observing B
The intuition is straightforward: we start with a prior belief P(A), observe evidence B, and update our belief to the posterior P(A|B). The likelihood P(B|A) tells us how well the evidence supports our hypothesis.
Here’s a simple Python implementation:
def bayes_theorem(prior, likelihood, evidence):
"""
Calculate posterior probability using Bayes' Theorem.
Args:
prior: P(A) - prior probability of hypothesis
likelihood: P(B|A) - probability of evidence given hypothesis
evidence: P(B) - total probability of evidence
Returns:
P(A|B) - posterior probability
"""
posterior = (likelihood * prior) / evidence
return posterior
# Example: What's the probability it's raining given I see someone with an umbrella?
p_rain = 0.2 # Prior: 20% chance of rain today
p_umbrella_given_rain = 0.9 # Likelihood: 90% of people use umbrellas when raining
p_umbrella = 0.25 # Evidence: 25% of people carry umbrellas overall
p_rain_given_umbrella = bayes_theorem(p_rain, p_umbrella_given_rain, p_umbrella)
print(f"Probability of rain given umbrella: {p_rain_given_umbrella:.2%}")
# Output: Probability of rain given umbrella: 72.00%
Worked Example: Medical Diagnosis
Medical testing provides a classic demonstration of Bayes’ Theorem’s counterintuitive power. Consider a disease that affects 1% of the population, with a test that’s 99% accurate (both for true positives and true negatives). If you test positive, what’s the probability you actually have the disease?
Most people guess around 99%, but the correct answer is closer to 50%. Here’s why:
import matplotlib.pyplot as plt
import numpy as np
def medical_diagnosis(disease_rate, test_sensitivity, test_specificity):
"""
Calculate probability of disease given positive test result.
Args:
disease_rate: P(Disease) - base rate in population
test_sensitivity: P(Positive|Disease) - true positive rate
test_specificity: P(Negative|No Disease) - true negative rate
Returns:
P(Disease|Positive) - probability of disease given positive test
"""
# Prior probability
p_disease = disease_rate
p_no_disease = 1 - disease_rate
# Likelihood
p_positive_given_disease = test_sensitivity
p_positive_given_no_disease = 1 - test_specificity
# Evidence: total probability of positive test
p_positive = (p_positive_given_disease * p_disease +
p_positive_given_no_disease * p_no_disease)
# Posterior using Bayes' Theorem
p_disease_given_positive = (p_positive_given_disease * p_disease) / p_positive
return p_disease_given_positive
# Calculate for our scenario
result = medical_diagnosis(
disease_rate=0.01, # 1% have the disease
test_sensitivity=0.99, # 99% true positive rate
test_specificity=0.99 # 99% true negative rate
)
print(f"Probability of disease given positive test: {result:.2%}")
# Output: Probability of disease given positive test: 50.00%
# Visualize how base rate affects results
base_rates = np.linspace(0.001, 0.1, 100)
posteriors = [medical_diagnosis(rate, 0.99, 0.99) for rate in base_rates]
plt.figure(figsize=(10, 6))
plt.plot(base_rates * 100, np.array(posteriors) * 100, linewidth=2)
plt.xlabel('Disease Base Rate (%)')
plt.ylabel('Probability of Disease Given Positive Test (%)')
plt.title('How Base Rate Affects Diagnosis Accuracy')
plt.grid(True, alpha=0.3)
plt.axhline(y=50, color='r', linestyle='--', alpha=0.5, label='50% threshold')
plt.legend()
plt.tight_layout()
plt.savefig('medical_diagnosis_bayes.png', dpi=150)
print("Visualization saved as 'medical_diagnosis_bayes.png'")
The key insight: with a 1% base rate, even a 99% accurate test produces many false positives. Out of 10,000 people, 100 have the disease (99 test positive), but 9,900 don’t (99 test positive anyway). So 99 true positives vs. 99 false positives = 50% probability.
Worked Example: Spam Email Classification
Naive Bayes classifiers are workhorses of text classification. They calculate the probability an email is spam given the words it contains:
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
import numpy as np
# Sample training data
emails = [
"Win free money now click here",
"Get rich quick scheme guaranteed",
"Meeting scheduled for tomorrow at 3pm",
"Please review the attached document",
"Congratulations you won the lottery",
"Project deadline reminder for next week",
"Free pills cheap medication online",
"Lunch with the team on Friday"
]
labels = [1, 1, 0, 0, 1, 0, 1, 0] # 1 = spam, 0 = not spam
# Vectorize text into word counts
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(emails)
# Train Naive Bayes classifier
classifier = MultinomialNB()
classifier.fit(X, labels)
# Test on new emails
test_emails = [
"Free money guaranteed win now",
"Meeting reminder for project review"
]
X_test = vectorizer.transform(test_emails)
predictions = classifier.predict(X_test)
probabilities = classifier.predict_proba(X_test)
for email, pred, prob in zip(test_emails, predictions, probabilities):
spam_prob = prob[1]
print(f"\nEmail: '{email}'")
print(f"Classification: {'SPAM' if pred == 1 else 'NOT SPAM'}")
print(f"P(Spam|Email) = {spam_prob:.2%}")
# Output:
# Email: 'Free money guaranteed win now'
# Classification: SPAM
# P(Spam|Email) = 94.23%
#
# Email: 'Meeting reminder for project review'
# Classification: NOT SPAM
# P(Spam|Email) = 15.67%
Behind the scenes, the classifier calculates P(Spam|Words) using Bayes’ Theorem for each word, combining them with the “naive” assumption that words are independent (they’re not, but it works surprisingly well).
Common Pitfalls and Misconceptions
The most dangerous error is the base rate fallacy—ignoring prior probabilities. People see a 99% accurate test and assume a positive result means 99% certainty, forgetting that rare diseases mean most positives are false positives.
Another common mistake is confusing P(A|B) with P(B|A). The probability of being pregnant given a positive pregnancy test is not the same as the probability of a positive test given pregnancy. These are fundamentally different questions.
Here’s code demonstrating how priors dramatically affect conclusions:
def compare_priors(likelihood, evidence, priors):
"""Show how different priors lead to different posteriors."""
results = []
for prior in priors:
posterior = bayes_theorem(prior, likelihood, evidence)
results.append(posterior)
return results
# Same evidence and likelihood, different priors
likelihood = 0.9 # 90% chance of evidence if hypothesis true
evidence = 0.3 # 30% chance of evidence overall
priors = [0.01, 0.1, 0.5, 0.9] # Different prior beliefs
posteriors = compare_priors(likelihood, evidence, priors)
print("Impact of Prior Probabilities:\n")
print(f"{'Prior':<10} {'Posterior':<10} {'Change'}")
print("-" * 35)
for prior, posterior in zip(priors, posteriors):
change = posterior - prior
print(f"{prior:<10.0%} {posterior:<10.1%} {change:+.1%}")
# Output:
# Impact of Prior Probabilities:
#
# Prior Posterior Change
# -----------------------------------
# 1% 3.0% +2.0%
# 10% 27.0% +17.0%
# 50% 60.0% +10.0%
# 90% 96.4% +6.4%
Notice how the same evidence has vastly different impacts depending on your starting belief. Strong priors require strong evidence to overcome.
Conclusion and Further Resources
Bayes’ Theorem is deceptively simple yet profoundly powerful. Master these core concepts:
- Priors matter: Never ignore base rates when interpreting evidence
- Direction matters: P(A|B) ≠ P(B|A)—always be clear which conditional probability you’re calculating
- Evidence updates beliefs: The posterior becomes the new prior as you gather more data
In modern machine learning, Bayesian thinking extends far beyond simple classification. Bayesian neural networks quantify prediction uncertainty. Bayesian optimization tunes hyperparameters efficiently. Markov Chain Monte Carlo (MCMC) methods handle complex probability distributions that resist analytical solutions.
For deeper exploration, study Bayesian networks for modeling complex dependencies, hierarchical Bayesian models for multi-level data, and variational inference for scaling Bayesian methods to massive datasets. The field of probabilistic programming (PyMC3, Stan, TensorFlow Probability) makes sophisticated Bayesian analysis accessible without deriving every integral by hand.
The next time you see a surprising statistic or medical test result, reach for Bayes’ Theorem. It cuts through confusion and reveals what the evidence actually tells you—once you account for what you already knew.