Bayes' Theorem: Formula and Examples

Key Insights

Bayes’ Theorem reverses conditional probabilities, letting you calculate P(A|B) from P(B|A), which is often far easier to measure directly
The theorem’s power lies in updating beliefs with new evidence—the foundation of spam filters, medical diagnostics, and modern machine learning
Most Bayesian reasoning failures stem from ignoring base rates (priors), leading to wildly incorrect conclusions despite accurate test data

Introduction to Bayes’ Theorem

Bayes’ Theorem, formulated by Reverend Thomas Bayes in the 18th century, is one of the most powerful tools in probability theory and statistical inference. Despite its age, it’s more relevant than ever—powering spam filters in your inbox, helping doctors interpret medical tests, and forming the backbone of many machine learning algorithms.

The theorem solves a fundamental problem: we often know the probability of observing evidence given a hypothesis, but what we really want is the probability of the hypothesis given the evidence. For instance, a doctor knows the probability a sick patient tests positive, but needs to know the probability a patient with a positive test is actually sick. Bayes’ Theorem provides the mathematical bridge between these two probabilities.

You should reach for Bayes’ Theorem whenever you need to update beliefs based on new evidence, especially when dealing with uncertain information or making predictions from observed data. It’s particularly valuable when direct measurement is difficult but the reverse probability is known.

Understanding the Formula

The canonical form of Bayes’ Theorem is:

P(A|B) = P(B|A) × P(A) / P(B)

Let’s break down each component:

P(A|B) - The posterior probability: what we want to know (probability of A given we observed B)
P(B|A) - The likelihood: probability of observing B if A is true
P(A) - The prior probability: what we believe about A before seeing evidence B
P(B) - The evidence or marginal probability: total probability of observing B

The intuition is straightforward: we start with a prior belief P(A), observe evidence B, and update our belief to the posterior P(A|B). The likelihood P(B|A) tells us how well the evidence supports our hypothesis.

Here’s a simple Python implementation:

def bayes_theorem(prior, likelihood, evidence):
    """
    Calculate posterior probability using Bayes' Theorem.
    
    Args:
        prior: P(A) - prior probability of hypothesis
        likelihood: P(B|A) - probability of evidence given hypothesis
        evidence: P(B) - total probability of evidence
    
    Returns:
        P(A|B) - posterior probability
    """
    posterior = (likelihood * prior) / evidence
    return posterior

# Example: What's the probability it's raining given I see someone with an umbrella?
p_rain = 0.2  # Prior: 20% chance of rain today
p_umbrella_given_rain = 0.9  # Likelihood: 90% of people use umbrellas when raining
p_umbrella = 0.25  # Evidence: 25% of people carry umbrellas overall

p_rain_given_umbrella = bayes_theorem(p_rain, p_umbrella_given_rain, p_umbrella)
print(f"Probability of rain given umbrella: {p_rain_given_umbrella:.2%}")
# Output: Probability of rain given umbrella: 72.00%

Worked Example: Medical Diagnosis

Medical testing provides a classic demonstration of Bayes’ Theorem’s counterintuitive power. Consider a disease that affects 1% of the population, with a test that’s 99% accurate (both for true positives and true negatives). If you test positive, what’s the probability you actually have the disease?

Most people guess around 99%, but the correct answer is closer to 50%. Here’s why:

import matplotlib.pyplot as plt
import numpy as np

def medical_diagnosis(disease_rate, test_sensitivity, test_specificity):
    """
    Calculate probability of disease given positive test result.
    
    Args:
        disease_rate: P(Disease) - base rate in population
        test_sensitivity: P(Positive|Disease) - true positive rate
        test_specificity: P(Negative|No Disease) - true negative rate
    
    Returns:
        P(Disease|Positive) - probability of disease given positive test
    """
    # Prior probability
    p_disease = disease_rate
    p_no_disease = 1 - disease_rate
    
    # Likelihood
    p_positive_given_disease = test_sensitivity
    p_positive_given_no_disease = 1 - test_specificity
    
    # Evidence: total probability of positive test
    p_positive = (p_positive_given_disease * p_disease + 
                  p_positive_given_no_disease * p_no_disease)
    
    # Posterior using Bayes' Theorem
    p_disease_given_positive = (p_positive_given_disease * p_disease) / p_positive
    
    return p_disease_given_positive

# Calculate for our scenario
result = medical_diagnosis(
    disease_rate=0.01,      # 1% have the disease
    test_sensitivity=0.99,  # 99% true positive rate
    test_specificity=0.99   # 99% true negative rate
)

print(f"Probability of disease given positive test: {result:.2%}")
# Output: Probability of disease given positive test: 50.00%

# Visualize how base rate affects results
base_rates = np.linspace(0.001, 0.1, 100)
posteriors = [medical_diagnosis(rate, 0.99, 0.99) for rate in base_rates]

plt.figure(figsize=(10, 6))
plt.plot(base_rates * 100, np.array(posteriors) * 100, linewidth=2)
plt.xlabel('Disease Base Rate (%)')
plt.ylabel('Probability of Disease Given Positive Test (%)')
plt.title('How Base Rate Affects Diagnosis Accuracy')
plt.grid(True, alpha=0.3)
plt.axhline(y=50, color='r', linestyle='--', alpha=0.5, label='50% threshold')
plt.legend()
plt.tight_layout()
plt.savefig('medical_diagnosis_bayes.png', dpi=150)
print("Visualization saved as 'medical_diagnosis_bayes.png'")

The key insight: with a 1% base rate, even a 99% accurate test produces many false positives. Out of 10,000 people, 100 have the disease (99 test positive), but 9,900 don’t (99 test positive anyway). So 99 true positives vs. 99 false positives = 50% probability.

Worked Example: Spam Email Classification

Naive Bayes classifiers are workhorses of text classification. They calculate the probability an email is spam given the words it contains:

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
import numpy as np

# Sample training data
emails = [
    "Win free money now click here",
    "Get rich quick scheme guaranteed",
    "Meeting scheduled for tomorrow at 3pm",
    "Please review the attached document",
    "Congratulations you won the lottery",
    "Project deadline reminder for next week",
    "Free pills cheap medication online",
    "Lunch with the team on Friday"
]

labels = [1, 1, 0, 0, 1, 0, 1, 0]  # 1 = spam, 0 = not spam

# Vectorize text into word counts
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(emails)

# Train Naive Bayes classifier
classifier = MultinomialNB()
classifier.fit(X, labels)

# Test on new emails
test_emails = [
    "Free money guaranteed win now",
    "Meeting reminder for project review"
]

X_test = vectorizer.transform(test_emails)
predictions = classifier.predict(X_test)
probabilities = classifier.predict_proba(X_test)

for email, pred, prob in zip(test_emails, predictions, probabilities):
    spam_prob = prob[1]
    print(f"\nEmail: '{email}'")
    print(f"Classification: {'SPAM' if pred == 1 else 'NOT SPAM'}")
    print(f"P(Spam|Email) = {spam_prob:.2%}")

# Output:
# Email: 'Free money guaranteed win now'
# Classification: SPAM
# P(Spam|Email) = 94.23%
#
# Email: 'Meeting reminder for project review'
# Classification: NOT SPAM
# P(Spam|Email) = 15.67%

Behind the scenes, the classifier calculates P(Spam|Words) using Bayes’ Theorem for each word, combining them with the “naive” assumption that words are independent (they’re not, but it works surprisingly well).

Common Pitfalls and Misconceptions

The most dangerous error is the base rate fallacy—ignoring prior probabilities. People see a 99% accurate test and assume a positive result means 99% certainty, forgetting that rare diseases mean most positives are false positives.

Another common mistake is confusing P(A|B) with P(B|A). The probability of being pregnant given a positive pregnancy test is not the same as the probability of a positive test given pregnancy. These are fundamentally different questions.

Here’s code demonstrating how priors dramatically affect conclusions:

def compare_priors(likelihood, evidence, priors):
    """Show how different priors lead to different posteriors."""
    results = []
    for prior in priors:
        posterior = bayes_theorem(prior, likelihood, evidence)
        results.append(posterior)
    return results

# Same evidence and likelihood, different priors
likelihood = 0.9  # 90% chance of evidence if hypothesis true
evidence = 0.3    # 30% chance of evidence overall

priors = [0.01, 0.1, 0.5, 0.9]  # Different prior beliefs
posteriors = compare_priors(likelihood, evidence, priors)

print("Impact of Prior Probabilities:\n")
print(f"{'Prior':<10} {'Posterior':<10} {'Change'}")
print("-" * 35)
for prior, posterior in zip(priors, posteriors):
    change = posterior - prior
    print(f"{prior:<10.0%} {posterior:<10.1%} {change:+.1%}")

# Output:
# Impact of Prior Probabilities:
#
# Prior      Posterior  Change
# -----------------------------------
# 1%         3.0%       +2.0%
# 10%        27.0%      +17.0%
# 50%        60.0%      +10.0%
# 90%        96.4%      +6.4%

Notice how the same evidence has vastly different impacts depending on your starting belief. Strong priors require strong evidence to overcome.

Conclusion and Further Resources

Bayes’ Theorem is deceptively simple yet profoundly powerful. Master these core concepts:

Priors matter: Never ignore base rates when interpreting evidence
Direction matters: P(A|B) ≠ P(B|A)—always be clear which conditional probability you’re calculating
Evidence updates beliefs: The posterior becomes the new prior as you gather more data

In modern machine learning, Bayesian thinking extends far beyond simple classification. Bayesian neural networks quantify prediction uncertainty. Bayesian optimization tunes hyperparameters efficiently. Markov Chain Monte Carlo (MCMC) methods handle complex probability distributions that resist analytical solutions.

For deeper exploration, study Bayesian networks for modeling complex dependencies, hierarchical Bayesian models for multi-level data, and variational inference for scaling Bayesian methods to massive datasets. The field of probabilistic programming (PyMC3, Stan, TensorFlow Probability) makes sophisticated Bayesian analysis accessible without deriving every integral by hand.

The next time you see a surprising statistic or medical test result, reach for Bayes’ Theorem. It cuts through confusion and reveals what the evidence actually tells you—once you account for what you already knew.