Moment Generating Functions: Formula and Examples
A moment generating function (MGF) is a mathematical transform that encodes all moments of a probability distribution into a single function. If you've ever needed to find the mean, variance, or...
Key Insights
- Moment generating functions uniquely characterize probability distributions and simplify calculating moments through differentiation rather than integration
- The product property of MGFs makes finding distributions of sums of independent random variables trivial compared to convolution-based approaches
- MGFs don’t exist for all distributions (heavy-tailed cases), making characteristic functions the more general tool, but MGFs are computationally simpler when they do exist
Introduction to Moment Generating Functions
A moment generating function (MGF) is a mathematical transform that encodes all moments of a probability distribution into a single function. If you’ve ever needed to find the mean, variance, or higher moments of a random variable, you’ve likely done it through direct integration or summation. MGFs provide an alternative: encode the distribution once, then extract any moment through simple differentiation.
The MGF exists because of a beautiful property of exponential functions: their Taylor series expansion naturally produces moments as coefficients. This makes MGFs particularly powerful for two tasks: uniquely identifying distributions and finding the distribution of sums of independent random variables.
Understanding MGFs is essential for anyone working with probability theory, statistical inference, or stochastic processes. They appear throughout mathematical statistics, from proving the Central Limit Theorem to analyzing queuing systems and portfolio risk.
The MGF Formula and Mathematical Properties
The moment generating function of a random variable X is defined as:
M_X(t) = E[e^(tX)]
For discrete random variables: M_X(t) = Σ e^(tx) P(X = x)
For continuous random variables: M_X(t) = ∫ e^(tx) f_X(x) dx
The parameter t is real-valued, and the MGF exists only if this expectation is finite in some neighborhood around t = 0. This is a critical limitation—distributions with heavy tails (like Cauchy) don’t have MGFs.
Three properties make MGFs invaluable:
- Uniqueness: If two random variables have the same MGF, they have the same distribution
- Moment extraction: The nth moment equals the nth derivative evaluated at t = 0
- Independence: For independent X and Y, M_(X+Y)(t) = M_X(t) × M_Y(t)
Here’s a Python implementation for computing MGFs numerically for discrete distributions:
import numpy as np
from scipy.integrate import quad
def mgf_discrete(values, probabilities, t):
"""
Compute MGF for a discrete distribution.
Args:
values: array of possible values
probabilities: corresponding probabilities
t: parameter value
Returns:
MGF value at t
"""
return np.sum(np.exp(t * values) * probabilities)
# Example: Fair die
die_values = np.array([1, 2, 3, 4, 5, 6])
die_probs = np.array([1/6] * 6)
# Compute MGF at t = 0.5
mgf_value = mgf_discrete(die_values, die_probs, 0.5)
print(f"MGF at t=0.5: {mgf_value:.4f}")
def mgf_continuous(pdf, t, lower=-np.inf, upper=np.inf):
"""
Compute MGF for a continuous distribution.
Args:
pdf: probability density function
t: parameter value
lower, upper: integration bounds
Returns:
MGF value at t
"""
integrand = lambda x: np.exp(t * x) * pdf(x)
result, _ = quad(integrand, lower, upper)
return result
Deriving Moments from MGFs
The power of MGFs lies in their relationship to moments. The nth moment of X equals:
E[X^n] = M_X^(n)(0)
where M_X^(n)(0) denotes the nth derivative of M_X(t) evaluated at t = 0.
Why does this work? Expand e^(tX) as a Taylor series:
e^(tX) = 1 + tX + (t²X²)/2! + (t³X³)/3! + …
Taking expectations:
M_X(t) = 1 + tE[X] + (t²E[X²])/2! + (t³E[X³])/3! + …
Differentiating once and setting t = 0 gives E[X]. Differentiating twice gives E[X²], and so on.
For variance, we use: Var(X) = E[X²] - (E[X])² = M’’(0) - (M’(0))²
Here’s a practical implementation using symbolic differentiation:
from sympy import symbols, exp, diff, lambdify
import sympy as sp
def extract_moments(mgf_expr, t_symbol, num_moments=4):
"""
Extract moments from a symbolic MGF expression.
Args:
mgf_expr: SymPy expression for MGF
t_symbol: SymPy symbol for t
num_moments: number of moments to compute
Returns:
List of moments
"""
moments = []
for n in range(1, num_moments + 1):
derivative = diff(mgf_expr, t_symbol, n)
moment = derivative.subs(t_symbol, 0)
moments.append(float(moment))
return moments
# Example: Exponential distribution with rate λ = 2
t = symbols('t')
lam = 2
mgf_exp = lam / (lam - t)
moments = extract_moments(mgf_exp, t, num_moments=3)
print(f"Mean: {moments[0]}")
print(f"Second moment: {moments[1]}")
print(f"Variance: {moments[1] - moments[0]**2}")
Common Distribution MGFs
Here are MGFs for standard distributions:
| Distribution | Parameters | MGF |
|---|---|---|
| Exponential | λ | λ/(λ - t), t < λ |
| Normal | μ, σ² | exp(μt + σ²t²/2) |
| Poisson | λ | exp(λ(e^t - 1)) |
| Binomial | n, p | (1 - p + pe^t)^n |
Let’s derive the exponential distribution MGF. For X ~ Exp(λ):
M_X(t) = ∫₀^∞ e^(tx) λe^(-λx) dx = λ ∫₀^∞ e^((t-λ)x) dx
This integral converges only when t < λ, giving:
M_X(t) = λ/(λ - t)
Here’s an implementation with verification:
import numpy as np
from scipy import stats
class DistributionMGF:
"""MGFs for common distributions with moment verification."""
@staticmethod
def exponential(t, lam):
"""Exponential distribution MGF."""
if t >= lam:
return np.inf
return lam / (lam - t)
@staticmethod
def normal(t, mu, sigma):
"""Normal distribution MGF."""
return np.exp(mu * t + 0.5 * sigma**2 * t**2)
@staticmethod
def poisson(t, lam):
"""Poisson distribution MGF."""
return np.exp(lam * (np.exp(t) - 1))
@staticmethod
def binomial(t, n, p):
"""Binomial distribution MGF."""
return (1 - p + p * np.exp(t))**n
# Verify exponential distribution moments
lam = 3.0
mgf = DistributionMGF()
# Numerical differentiation for mean
h = 1e-8
mean_from_mgf = (mgf.exponential(h, lam) - mgf.exponential(0, lam)) / h
theoretical_mean = 1 / lam
print(f"Mean from MGF: {mean_from_mgf:.6f}")
print(f"Theoretical mean: {theoretical_mean:.6f}")
# Verify using scipy
X = stats.expon(scale=1/lam)
print(f"Scipy mean: {X.mean():.6f}")
MGFs for Sums of Independent Random Variables
The most powerful property of MGFs is how they handle sums. If X and Y are independent:
M_(X+Y)(t) = M_X(t) × M_Y(t)
This is exponentially simpler than computing the convolution of probability distributions. Consider summing n independent exponential random variables—the convolution approach requires n-1 convolutions, while the MGF approach is a simple product.
Example: Sum of two independent normal distributions. If X ~ N(μ₁, σ₁²) and Y ~ N(μ₂, σ₂²):
M_(X+Y)(t) = exp(μ₁t + σ₁²t²/2) × exp(μ₂t + σ₂²t²/2) = exp((μ₁+μ₂)t + (σ₁²+σ₂²)t²/2)
This is the MGF of N(μ₁+μ₂, σ₁²+σ₂²), proving that the sum is also normal.
def compare_sum_approaches(n_samples=10000):
"""Compare convolution vs MGF for sum of random variables."""
# Two independent exponential RVs
lam1, lam2 = 2.0, 3.0
# Approach 1: Direct simulation (convolution in practice)
X1 = np.random.exponential(1/lam1, n_samples)
X2 = np.random.exponential(1/lam2, n_samples)
sum_samples = X1 + X2
empirical_mean = np.mean(sum_samples)
empirical_var = np.var(sum_samples)
# Approach 2: MGF-based calculation
# M_X1(t) = lam1/(lam1-t), M_X2(t) = lam2/(lam2-t)
# Product gives MGF of sum
# Mean from MGF: d/dt[product]|_{t=0}
theoretical_mean = 1/lam1 + 1/lam2
theoretical_var = 1/lam1**2 + 1/lam2**2
print("Sum of two exponentials:")
print(f"Empirical mean: {empirical_mean:.4f}")
print(f"MGF-derived mean: {theoretical_mean:.4f}")
print(f"Empirical variance: {empirical_var:.4f}")
print(f"MGF-derived variance: {theoretical_var:.4f}")
compare_sum_approaches()
Practical Applications and Limitations
MGFs excel in theoretical derivations and when working with sums of independent variables. In risk analysis, if you model individual losses as random variables, the MGF of total loss is simply the product of individual MGFs (assuming independence).
However, MGFs have important limitations:
When MGFs don’t exist: Heavy-tailed distributions like Cauchy or Pareto often lack MGFs. Use characteristic functions instead—they always exist.
Numerical stability: For large t or extreme parameter values, MGFs can overflow. The log-MGF (cumulant generating function) is often more stable.
When to use MGFs:
- Proving theoretical results about distributions
- Finding distributions of sums
- Calculating moments when integration is difficult
When to use alternatives:
- Heavy-tailed distributions → characteristic functions
- Numerical computation → direct moment calculation
- High-precision work → cumulant generating functions
Here’s a complete workflow demonstrating MGF application:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
def portfolio_risk_mgf(returns, weights, confidence=0.95):
"""
Analyze portfolio risk using MGF approach.
Assumes returns are normally distributed.
"""
# Estimate parameters for each asset
means = np.array([np.mean(r) for r in returns])
stds = np.array([np.std(r) for r in returns])
# Portfolio moments using MGF properties
portfolio_mean = np.dot(weights, means)
portfolio_var = np.dot(weights**2, stds**2) # Assumes independence
portfolio_std = np.sqrt(portfolio_var)
# Value at Risk using normal MGF
z_score = stats.norm.ppf(1 - confidence)
var = portfolio_mean + z_score * portfolio_std
print(f"Portfolio mean return: {portfolio_mean:.4f}")
print(f"Portfolio std dev: {portfolio_std:.4f}")
print(f"Value at Risk ({confidence:.0%}): {var:.4f}")
return portfolio_mean, portfolio_std, var
# Simulate asset returns
np.random.seed(42)
asset1 = np.random.normal(0.08, 0.15, 252) # 8% mean, 15% vol
asset2 = np.random.normal(0.06, 0.10, 252) # 6% mean, 10% vol
asset3 = np.random.normal(0.10, 0.20, 252) # 10% mean, 20% vol
returns = [asset1, asset2, asset3]
weights = np.array([0.4, 0.3, 0.3])
portfolio_risk_mgf(returns, weights)
Conclusion
Moment generating functions are a fundamental tool in probability theory that transform the problem of computing moments into simple differentiation. Their uniqueness property makes them perfect for identifying distributions, while the product property for independent sums makes otherwise intractable calculations trivial.
Reach for MGFs when you need to derive theoretical results, work with sums of independent random variables, or calculate moments without direct integration. Remember their limitations: they don’t exist for all distributions, and numerical computation can be unstable for extreme values.
For practitioners, MGFs sit in that sweet spot between pure theory and computational statistics. They’re not always the fastest numerical method, but understanding them deepens your intuition about probability distributions and often reveals elegant solutions to complex problems.
Master MGFs, and you’ll find yourself recognizing patterns in probability problems that others miss—patterns that lead to simpler, more insightful solutions.