Multinomial Distribution in Python: Complete Guide

Key Insights

The multinomial distribution generalizes the binomial to k categories, making it essential for modeling scenarios like dice rolls, text classification, and A/B/n testing where outcomes fall into more than two buckets.
NumPy’s random.multinomial() handles sampling while SciPy’s stats.multinomial provides probability calculations—use them together for complete analysis workflows.
Always validate that your probability vector sums to 1.0 and use log-probabilities when working with many categories to avoid numerical underflow.

Introduction to the Multinomial Distribution

The multinomial distribution answers a fundamental question: if you run n independent trials where each trial can result in one of k possible outcomes, what’s the probability of observing a specific combination of results?

Think of it as the binomial distribution’s more versatile sibling. While the binomial handles coin flips (two outcomes), the multinomial handles dice rolls (six outcomes), survey responses (multiple choices), or any scenario with discrete categories.

Real-world applications include:

Text classification: Modeling word frequencies in documents
A/B/n testing: Analyzing conversion rates across multiple variants
Genetics: Modeling allele frequencies in populations
Quality control: Categorizing defects into multiple types

Understanding this distribution gives you a powerful tool for modeling categorical data across statistics, machine learning, and simulation.

Mathematical Foundation

The multinomial distribution has three key parameters:

n: Total number of trials
k: Number of possible categories
p: Probability vector (p₁, p₂, …, pₖ) where Σpᵢ = 1

The probability mass function (PMF) gives the probability of observing exactly x₁ outcomes in category 1, x₂ in category 2, and so on:

P(X₁=x₁, X₂=x₂, ..., Xₖ=xₖ) = (n! / (x₁! × x₂! × ... × xₖ!)) × p₁^x₁ × p₂^x₂ × ... × pₖ^xₖ

The expected value for category i is E[Xᵢ] = n × pᵢ, and the variance is Var(Xᵢ) = n × pᵢ × (1 - pᵢ).

Let’s calculate a multinomial probability both manually and with SciPy:

import numpy as np
from scipy.stats import multinomial
from scipy.special import factorial

# Scenario: Roll a fair die 10 times
# What's P(exactly 2 ones, 2 twos, 1 three, 2 fours, 2 fives, 1 six)?

n = 10
probs = [1/6] * 6  # Fair die
observed = [2, 2, 1, 2, 2, 1]

# Manual calculation
def multinomial_pmf_manual(n, probs, observed):
    """Calculate multinomial PMF from scratch."""
    # Multinomial coefficient: n! / (x1! * x2! * ... * xk!)
    coefficient = factorial(n, exact=True)
    for x in observed:
        coefficient //= factorial(x, exact=True)
    
    # Probability term: p1^x1 * p2^x2 * ... * pk^xk
    prob_term = 1.0
    for p, x in zip(probs, observed):
        prob_term *= p ** x
    
    return coefficient * prob_term

manual_result = multinomial_pmf_manual(n, probs, observed)
print(f"Manual calculation: {manual_result:.6f}")

# SciPy calculation
scipy_result = multinomial.pmf(observed, n=n, p=probs)
print(f"SciPy calculation: {scipy_result:.6f}")

# Output:
# Manual calculation: 0.027006
# SciPy calculation: 0.027006

Implementing with NumPy and SciPy

NumPy handles random sampling while SciPy provides statistical functions. Here’s how to use both effectively:

import numpy as np
from scipy.stats import multinomial

# Set seed for reproducibility
rng = np.random.default_rng(42)

# Parameters: 100 trials, 4 categories with different probabilities
n_trials = 100
probs = [0.4, 0.3, 0.2, 0.1]

# Generate random samples
# Single sample: counts for each category
single_sample = rng.multinomial(n_trials, probs)
print(f"Single sample: {single_sample}")
# Output: Single sample: [44 27 18 11]

# Multiple samples: useful for Monte Carlo simulations
n_simulations = 10000
samples = rng.multinomial(n_trials, probs, size=n_simulations)
print(f"Samples shape: {samples.shape}")
# Output: Samples shape: (10000, 4)

# Verify expected values match theory
print(f"Empirical means: {samples.mean(axis=0)}")
print(f"Theoretical means: {np.array(probs) * n_trials}")
# Output:
# Empirical means: [39.98 30.02 19.97 10.03]
# Theoretical means: [40. 30. 20. 10.]

For probability calculations and log-likelihoods:

from scipy.stats import multinomial

# Create a frozen distribution object
dist = multinomial(n=100, p=[0.4, 0.3, 0.2, 0.1])

# Calculate PMF for specific outcome
outcome = [40, 30, 20, 10]
pmf = dist.pmf(outcome)
print(f"P(X = {outcome}): {pmf:.6f}")

# Log-probability (crucial for numerical stability)
log_pmf = dist.logpmf(outcome)
print(f"Log P(X = {outcome}): {log_pmf:.4f}")

# Calculate log-likelihood for observed data
# Useful for model comparison and MLE
observed_data = [38, 32, 18, 12]
log_likelihood = dist.logpmf(observed_data)
print(f"Log-likelihood of observed data: {log_likelihood:.4f}")

Visualization Techniques

Visualizing multinomial distributions requires different approaches depending on the number of categories:

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

rng = np.random.default_rng(42)

# Generate samples from two different distributions
n_trials = 50
n_samples = 1000

fair_probs = [0.25, 0.25, 0.25, 0.25]
biased_probs = [0.4, 0.3, 0.2, 0.1]

fair_samples = rng.multinomial(n_trials, fair_probs, size=n_samples)
biased_samples = rng.multinomial(n_trials, biased_probs, size=n_samples)

# Create visualization
fig, axes = plt.subplots(1, 3, figsize=(14, 4))

# Bar chart: Compare mean outcomes
categories = ['A', 'B', 'C', 'D']
x = np.arange(len(categories))
width = 0.35

axes[0].bar(x - width/2, fair_samples.mean(axis=0), width, 
            label='Fair', alpha=0.8)
axes[0].bar(x + width/2, biased_samples.mean(axis=0), width, 
            label='Biased', alpha=0.8)
axes[0].set_xlabel('Category')
axes[0].set_ylabel('Mean Count')
axes[0].set_title('Mean Outcomes by Category')
axes[0].set_xticks(x)
axes[0].set_xticklabels(categories)
axes[0].legend()

# Heatmap: Correlation between categories (biased distribution)
corr_matrix = np.corrcoef(biased_samples.T)
sns.heatmap(corr_matrix, annot=True, fmt='.2f', 
            xticklabels=categories, yticklabels=categories,
            cmap='coolwarm', center=0, ax=axes[1])
axes[1].set_title('Category Correlations (Biased)')

# Distribution of single category
axes[2].hist(fair_samples[:, 0], bins=20, alpha=0.6, 
             label='Fair', density=True)
axes[2].hist(biased_samples[:, 0], bins=20, alpha=0.6, 
             label='Biased', density=True)
axes[2].set_xlabel('Count for Category A')
axes[2].set_ylabel('Density')
axes[2].set_title('Distribution of Category A Counts')
axes[2].legend()

plt.tight_layout()
plt.savefig('multinomial_visualization.png', dpi=150)
plt.show()

Notice the negative correlations in the heatmap—this is a key property of the multinomial. When one category increases, others must decrease since they sum to n.

Practical Applications

Simulating Dice Rolls and Testing Fairness

import numpy as np
from scipy.stats import chisquare

rng = np.random.default_rng(42)

# Simulate 600 rolls of a potentially loaded die
n_rolls = 600
# Slightly biased toward 6
loaded_probs = [0.15, 0.15, 0.15, 0.15, 0.15, 0.25]

observed_counts = rng.multinomial(n_rolls, loaded_probs)
print(f"Observed counts: {observed_counts}")

# Expected counts under fair die hypothesis
expected_counts = np.array([n_rolls / 6] * 6)

# Chi-square test for fairness
chi2, p_value = chisquare(observed_counts, expected_counts)
print(f"Chi-square statistic: {chi2:.2f}")
print(f"P-value: {p_value:.4f}")
print(f"Conclusion: {'Reject' if p_value < 0.05 else 'Fail to reject'} fair die hypothesis")

Document Word Frequency Analysis

import numpy as np
from scipy.stats import multinomial

# Simplified topic modeling: estimate topic from word frequencies
# Vocabulary: ['data', 'model', 'patient', 'treatment', 'algorithm', 'disease']

# Topic probability distributions over vocabulary
topics = {
    'machine_learning': [0.30, 0.35, 0.05, 0.05, 0.20, 0.05],
    'medicine': [0.05, 0.10, 0.25, 0.30, 0.05, 0.25]
}

# Observed word counts in a document
observed_words = [8, 12, 2, 1, 7, 0]  # 30 words total
n_words = sum(observed_words)

# Calculate likelihood under each topic
for topic_name, probs in topics.items():
    log_likelihood = multinomial.logpmf(observed_words, n=n_words, p=probs)
    print(f"{topic_name}: log-likelihood = {log_likelihood:.2f}")

# Output:
# machine_learning: log-likelihood = -12.47
# medicine: log-likelihood = -22.89
# Conclusion: Document more likely from machine_learning topic

A/B/n Conversion Testing

import numpy as np
from scipy.stats import multinomial

rng = np.random.default_rng(42)

# Three landing page variants, tracking: no_action, signup, purchase
# True conversion rates (unknown in practice)
variant_a = [0.70, 0.20, 0.10]  # Control
variant_b = [0.65, 0.25, 0.10]  # Better signup
variant_c = [0.60, 0.22, 0.18]  # Better purchase

n_visitors = 1000

# Simulate observed data
results = {
    'A': rng.multinomial(n_visitors, variant_a),
    'B': rng.multinomial(n_visitors, variant_b),
    'C': rng.multinomial(n_visitors, variant_c)
}

# Estimate probabilities (MLE = observed proportions)
for variant, counts in results.items():
    estimated_probs = counts / n_visitors
    revenue = counts[1] * 10 + counts[2] * 50  # $10 signup, $50 purchase
    print(f"Variant {variant}: {counts} -> Est. probs: {estimated_probs.round(3)}, Revenue: ${revenue}")

Statistical Inference and Hypothesis Testing

The chi-square goodness-of-fit test is the standard tool for testing multinomial hypotheses:

import numpy as np
from scipy.stats import chi2, chisquare

# Customer satisfaction survey results
# Categories: Very Dissatisfied, Dissatisfied, Neutral, Satisfied, Very Satisfied
observed = np.array([45, 65, 100, 150, 140])
n_total = observed.sum()

# Null hypothesis: uniform distribution
expected_uniform = np.array([n_total / 5] * 5)

# Alternative hypothesis: specific expected distribution
expected_target = np.array([0.05, 0.10, 0.20, 0.35, 0.30]) * n_total

# Test against uniform
chi2_uniform, p_uniform = chisquare(observed, expected_uniform)
print(f"Test vs Uniform: χ² = {chi2_uniform:.2f}, p = {p_uniform:.4f}")

# Test against target distribution
chi2_target, p_target = chisquare(observed, expected_target)
print(f"Test vs Target: χ² = {chi2_target:.2f}, p = {p_target:.4f}")

# Calculate confidence intervals for proportions (Wald interval)
def multinomial_ci(counts, alpha=0.05):
    """Calculate confidence intervals for multinomial proportions."""
    n = counts.sum()
    props = counts / n
    z = 1.96  # 95% CI
    
    intervals = []
    for p in props:
        se = np.sqrt(p * (1 - p) / n)
        intervals.append((max(0, p - z * se), min(1, p + z * se)))
    return intervals

cis = multinomial_ci(observed)
categories = ['Very Dissat.', 'Dissatisfied', 'Neutral', 'Satisfied', 'Very Sat.']
for cat, (low, high) in zip(categories, cis):
    print(f"{cat}: [{low:.3f}, {high:.3f}]")

Summary and Best Practices

The multinomial distribution is your go-to tool for modeling categorical outcomes. Here are the key takeaways:

Use the right tool for the job:

numpy.random.multinomial() for simulation and sampling
scipy.stats.multinomial for probability calculations
Chi-square tests for hypothesis testing

Common pitfalls to avoid:

Probabilities must sum to exactly 1.0—use probs = np.array(probs) / sum(probs) to normalize
Use log-probabilities (logpmf) when working with many categories or large n to prevent underflow
Remember that multinomial counts are negatively correlated—increases in one category force decreases in others

Related distributions worth knowing:

Categorical: Single trial multinomial (n=1)
Dirichlet: Conjugate prior for multinomial probabilities in Bayesian analysis
Dirichlet-Multinomial: When category probabilities themselves vary

When sample sizes are small relative to the number of categories, consider exact tests or Bayesian approaches instead of chi-square approximations. For large-scale applications like text modeling, look into sparse representations and specialized libraries like Gensim.