Multinomial Distribution in Python: Complete Guide
The multinomial distribution answers a fundamental question: if you run n independent trials where each trial can result in one of k possible outcomes, what's the probability of observing a specific...
Key Insights
- The multinomial distribution generalizes the binomial to k categories, making it essential for modeling scenarios like dice rolls, text classification, and A/B/n testing where outcomes fall into more than two buckets.
- NumPy’s
random.multinomial()handles sampling while SciPy’sstats.multinomialprovides probability calculations—use them together for complete analysis workflows. - Always validate that your probability vector sums to 1.0 and use log-probabilities when working with many categories to avoid numerical underflow.
Introduction to the Multinomial Distribution
The multinomial distribution answers a fundamental question: if you run n independent trials where each trial can result in one of k possible outcomes, what’s the probability of observing a specific combination of results?
Think of it as the binomial distribution’s more versatile sibling. While the binomial handles coin flips (two outcomes), the multinomial handles dice rolls (six outcomes), survey responses (multiple choices), or any scenario with discrete categories.
Real-world applications include:
- Text classification: Modeling word frequencies in documents
- A/B/n testing: Analyzing conversion rates across multiple variants
- Genetics: Modeling allele frequencies in populations
- Quality control: Categorizing defects into multiple types
Understanding this distribution gives you a powerful tool for modeling categorical data across statistics, machine learning, and simulation.
Mathematical Foundation
The multinomial distribution has three key parameters:
- n: Total number of trials
- k: Number of possible categories
- p: Probability vector (p₁, p₂, …, pₖ) where Σpᵢ = 1
The probability mass function (PMF) gives the probability of observing exactly x₁ outcomes in category 1, x₂ in category 2, and so on:
P(X₁=x₁, X₂=x₂, ..., Xₖ=xₖ) = (n! / (x₁! × x₂! × ... × xₖ!)) × p₁^x₁ × p₂^x₂ × ... × pₖ^xₖ
The expected value for category i is E[Xᵢ] = n × pᵢ, and the variance is Var(Xᵢ) = n × pᵢ × (1 - pᵢ).
Let’s calculate a multinomial probability both manually and with SciPy:
import numpy as np
from scipy.stats import multinomial
from scipy.special import factorial
# Scenario: Roll a fair die 10 times
# What's P(exactly 2 ones, 2 twos, 1 three, 2 fours, 2 fives, 1 six)?
n = 10
probs = [1/6] * 6 # Fair die
observed = [2, 2, 1, 2, 2, 1]
# Manual calculation
def multinomial_pmf_manual(n, probs, observed):
"""Calculate multinomial PMF from scratch."""
# Multinomial coefficient: n! / (x1! * x2! * ... * xk!)
coefficient = factorial(n, exact=True)
for x in observed:
coefficient //= factorial(x, exact=True)
# Probability term: p1^x1 * p2^x2 * ... * pk^xk
prob_term = 1.0
for p, x in zip(probs, observed):
prob_term *= p ** x
return coefficient * prob_term
manual_result = multinomial_pmf_manual(n, probs, observed)
print(f"Manual calculation: {manual_result:.6f}")
# SciPy calculation
scipy_result = multinomial.pmf(observed, n=n, p=probs)
print(f"SciPy calculation: {scipy_result:.6f}")
# Output:
# Manual calculation: 0.027006
# SciPy calculation: 0.027006
Implementing with NumPy and SciPy
NumPy handles random sampling while SciPy provides statistical functions. Here’s how to use both effectively:
import numpy as np
from scipy.stats import multinomial
# Set seed for reproducibility
rng = np.random.default_rng(42)
# Parameters: 100 trials, 4 categories with different probabilities
n_trials = 100
probs = [0.4, 0.3, 0.2, 0.1]
# Generate random samples
# Single sample: counts for each category
single_sample = rng.multinomial(n_trials, probs)
print(f"Single sample: {single_sample}")
# Output: Single sample: [44 27 18 11]
# Multiple samples: useful for Monte Carlo simulations
n_simulations = 10000
samples = rng.multinomial(n_trials, probs, size=n_simulations)
print(f"Samples shape: {samples.shape}")
# Output: Samples shape: (10000, 4)
# Verify expected values match theory
print(f"Empirical means: {samples.mean(axis=0)}")
print(f"Theoretical means: {np.array(probs) * n_trials}")
# Output:
# Empirical means: [39.98 30.02 19.97 10.03]
# Theoretical means: [40. 30. 20. 10.]
For probability calculations and log-likelihoods:
from scipy.stats import multinomial
# Create a frozen distribution object
dist = multinomial(n=100, p=[0.4, 0.3, 0.2, 0.1])
# Calculate PMF for specific outcome
outcome = [40, 30, 20, 10]
pmf = dist.pmf(outcome)
print(f"P(X = {outcome}): {pmf:.6f}")
# Log-probability (crucial for numerical stability)
log_pmf = dist.logpmf(outcome)
print(f"Log P(X = {outcome}): {log_pmf:.4f}")
# Calculate log-likelihood for observed data
# Useful for model comparison and MLE
observed_data = [38, 32, 18, 12]
log_likelihood = dist.logpmf(observed_data)
print(f"Log-likelihood of observed data: {log_likelihood:.4f}")
Visualization Techniques
Visualizing multinomial distributions requires different approaches depending on the number of categories:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
rng = np.random.default_rng(42)
# Generate samples from two different distributions
n_trials = 50
n_samples = 1000
fair_probs = [0.25, 0.25, 0.25, 0.25]
biased_probs = [0.4, 0.3, 0.2, 0.1]
fair_samples = rng.multinomial(n_trials, fair_probs, size=n_samples)
biased_samples = rng.multinomial(n_trials, biased_probs, size=n_samples)
# Create visualization
fig, axes = plt.subplots(1, 3, figsize=(14, 4))
# Bar chart: Compare mean outcomes
categories = ['A', 'B', 'C', 'D']
x = np.arange(len(categories))
width = 0.35
axes[0].bar(x - width/2, fair_samples.mean(axis=0), width,
label='Fair', alpha=0.8)
axes[0].bar(x + width/2, biased_samples.mean(axis=0), width,
label='Biased', alpha=0.8)
axes[0].set_xlabel('Category')
axes[0].set_ylabel('Mean Count')
axes[0].set_title('Mean Outcomes by Category')
axes[0].set_xticks(x)
axes[0].set_xticklabels(categories)
axes[0].legend()
# Heatmap: Correlation between categories (biased distribution)
corr_matrix = np.corrcoef(biased_samples.T)
sns.heatmap(corr_matrix, annot=True, fmt='.2f',
xticklabels=categories, yticklabels=categories,
cmap='coolwarm', center=0, ax=axes[1])
axes[1].set_title('Category Correlations (Biased)')
# Distribution of single category
axes[2].hist(fair_samples[:, 0], bins=20, alpha=0.6,
label='Fair', density=True)
axes[2].hist(biased_samples[:, 0], bins=20, alpha=0.6,
label='Biased', density=True)
axes[2].set_xlabel('Count for Category A')
axes[2].set_ylabel('Density')
axes[2].set_title('Distribution of Category A Counts')
axes[2].legend()
plt.tight_layout()
plt.savefig('multinomial_visualization.png', dpi=150)
plt.show()
Notice the negative correlations in the heatmap—this is a key property of the multinomial. When one category increases, others must decrease since they sum to n.
Practical Applications
Simulating Dice Rolls and Testing Fairness
import numpy as np
from scipy.stats import chisquare
rng = np.random.default_rng(42)
# Simulate 600 rolls of a potentially loaded die
n_rolls = 600
# Slightly biased toward 6
loaded_probs = [0.15, 0.15, 0.15, 0.15, 0.15, 0.25]
observed_counts = rng.multinomial(n_rolls, loaded_probs)
print(f"Observed counts: {observed_counts}")
# Expected counts under fair die hypothesis
expected_counts = np.array([n_rolls / 6] * 6)
# Chi-square test for fairness
chi2, p_value = chisquare(observed_counts, expected_counts)
print(f"Chi-square statistic: {chi2:.2f}")
print(f"P-value: {p_value:.4f}")
print(f"Conclusion: {'Reject' if p_value < 0.05 else 'Fail to reject'} fair die hypothesis")
Document Word Frequency Analysis
import numpy as np
from scipy.stats import multinomial
# Simplified topic modeling: estimate topic from word frequencies
# Vocabulary: ['data', 'model', 'patient', 'treatment', 'algorithm', 'disease']
# Topic probability distributions over vocabulary
topics = {
'machine_learning': [0.30, 0.35, 0.05, 0.05, 0.20, 0.05],
'medicine': [0.05, 0.10, 0.25, 0.30, 0.05, 0.25]
}
# Observed word counts in a document
observed_words = [8, 12, 2, 1, 7, 0] # 30 words total
n_words = sum(observed_words)
# Calculate likelihood under each topic
for topic_name, probs in topics.items():
log_likelihood = multinomial.logpmf(observed_words, n=n_words, p=probs)
print(f"{topic_name}: log-likelihood = {log_likelihood:.2f}")
# Output:
# machine_learning: log-likelihood = -12.47
# medicine: log-likelihood = -22.89
# Conclusion: Document more likely from machine_learning topic
A/B/n Conversion Testing
import numpy as np
from scipy.stats import multinomial
rng = np.random.default_rng(42)
# Three landing page variants, tracking: no_action, signup, purchase
# True conversion rates (unknown in practice)
variant_a = [0.70, 0.20, 0.10] # Control
variant_b = [0.65, 0.25, 0.10] # Better signup
variant_c = [0.60, 0.22, 0.18] # Better purchase
n_visitors = 1000
# Simulate observed data
results = {
'A': rng.multinomial(n_visitors, variant_a),
'B': rng.multinomial(n_visitors, variant_b),
'C': rng.multinomial(n_visitors, variant_c)
}
# Estimate probabilities (MLE = observed proportions)
for variant, counts in results.items():
estimated_probs = counts / n_visitors
revenue = counts[1] * 10 + counts[2] * 50 # $10 signup, $50 purchase
print(f"Variant {variant}: {counts} -> Est. probs: {estimated_probs.round(3)}, Revenue: ${revenue}")
Statistical Inference and Hypothesis Testing
The chi-square goodness-of-fit test is the standard tool for testing multinomial hypotheses:
import numpy as np
from scipy.stats import chi2, chisquare
# Customer satisfaction survey results
# Categories: Very Dissatisfied, Dissatisfied, Neutral, Satisfied, Very Satisfied
observed = np.array([45, 65, 100, 150, 140])
n_total = observed.sum()
# Null hypothesis: uniform distribution
expected_uniform = np.array([n_total / 5] * 5)
# Alternative hypothesis: specific expected distribution
expected_target = np.array([0.05, 0.10, 0.20, 0.35, 0.30]) * n_total
# Test against uniform
chi2_uniform, p_uniform = chisquare(observed, expected_uniform)
print(f"Test vs Uniform: χ² = {chi2_uniform:.2f}, p = {p_uniform:.4f}")
# Test against target distribution
chi2_target, p_target = chisquare(observed, expected_target)
print(f"Test vs Target: χ² = {chi2_target:.2f}, p = {p_target:.4f}")
# Calculate confidence intervals for proportions (Wald interval)
def multinomial_ci(counts, alpha=0.05):
"""Calculate confidence intervals for multinomial proportions."""
n = counts.sum()
props = counts / n
z = 1.96 # 95% CI
intervals = []
for p in props:
se = np.sqrt(p * (1 - p) / n)
intervals.append((max(0, p - z * se), min(1, p + z * se)))
return intervals
cis = multinomial_ci(observed)
categories = ['Very Dissat.', 'Dissatisfied', 'Neutral', 'Satisfied', 'Very Sat.']
for cat, (low, high) in zip(categories, cis):
print(f"{cat}: [{low:.3f}, {high:.3f}]")
Summary and Best Practices
The multinomial distribution is your go-to tool for modeling categorical outcomes. Here are the key takeaways:
Use the right tool for the job:
numpy.random.multinomial()for simulation and samplingscipy.stats.multinomialfor probability calculations- Chi-square tests for hypothesis testing
Common pitfalls to avoid:
- Probabilities must sum to exactly 1.0—use
probs = np.array(probs) / sum(probs)to normalize - Use log-probabilities (
logpmf) when working with many categories or large n to prevent underflow - Remember that multinomial counts are negatively correlated—increases in one category force decreases in others
Related distributions worth knowing:
- Categorical: Single trial multinomial (n=1)
- Dirichlet: Conjugate prior for multinomial probabilities in Bayesian analysis
- Dirichlet-Multinomial: When category probabilities themselves vary
When sample sizes are small relative to the number of categories, consider exact tests or Bayesian approaches instead of chi-square approximations. For large-scale applications like text modeling, look into sparse representations and specialized libraries like Gensim.