Moment Generating Functions: Formula and Examples

A moment generating function (MGF) is a mathematical transform that encodes all moments of a probability distribution into a single function. If you've ever needed to find the mean, variance, or...

Key Insights

  • Moment generating functions uniquely characterize probability distributions and simplify calculating moments through differentiation rather than integration
  • The product property of MGFs makes finding distributions of sums of independent random variables trivial compared to convolution-based approaches
  • MGFs don’t exist for all distributions (heavy-tailed cases), making characteristic functions the more general tool, but MGFs are computationally simpler when they do exist

Introduction to Moment Generating Functions

A moment generating function (MGF) is a mathematical transform that encodes all moments of a probability distribution into a single function. If you’ve ever needed to find the mean, variance, or higher moments of a random variable, you’ve likely done it through direct integration or summation. MGFs provide an alternative: encode the distribution once, then extract any moment through simple differentiation.

The MGF exists because of a beautiful property of exponential functions: their Taylor series expansion naturally produces moments as coefficients. This makes MGFs particularly powerful for two tasks: uniquely identifying distributions and finding the distribution of sums of independent random variables.

Understanding MGFs is essential for anyone working with probability theory, statistical inference, or stochastic processes. They appear throughout mathematical statistics, from proving the Central Limit Theorem to analyzing queuing systems and portfolio risk.

The MGF Formula and Mathematical Properties

The moment generating function of a random variable X is defined as:

M_X(t) = E[e^(tX)]

For discrete random variables: M_X(t) = Σ e^(tx) P(X = x)

For continuous random variables: M_X(t) = ∫ e^(tx) f_X(x) dx

The parameter t is real-valued, and the MGF exists only if this expectation is finite in some neighborhood around t = 0. This is a critical limitation—distributions with heavy tails (like Cauchy) don’t have MGFs.

Three properties make MGFs invaluable:

  1. Uniqueness: If two random variables have the same MGF, they have the same distribution
  2. Moment extraction: The nth moment equals the nth derivative evaluated at t = 0
  3. Independence: For independent X and Y, M_(X+Y)(t) = M_X(t) × M_Y(t)

Here’s a Python implementation for computing MGFs numerically for discrete distributions:

import numpy as np
from scipy.integrate import quad

def mgf_discrete(values, probabilities, t):
    """
    Compute MGF for a discrete distribution.
    
    Args:
        values: array of possible values
        probabilities: corresponding probabilities
        t: parameter value
    
    Returns:
        MGF value at t
    """
    return np.sum(np.exp(t * values) * probabilities)

# Example: Fair die
die_values = np.array([1, 2, 3, 4, 5, 6])
die_probs = np.array([1/6] * 6)

# Compute MGF at t = 0.5
mgf_value = mgf_discrete(die_values, die_probs, 0.5)
print(f"MGF at t=0.5: {mgf_value:.4f}")

def mgf_continuous(pdf, t, lower=-np.inf, upper=np.inf):
    """
    Compute MGF for a continuous distribution.
    
    Args:
        pdf: probability density function
        t: parameter value
        lower, upper: integration bounds
    
    Returns:
        MGF value at t
    """
    integrand = lambda x: np.exp(t * x) * pdf(x)
    result, _ = quad(integrand, lower, upper)
    return result

Deriving Moments from MGFs

The power of MGFs lies in their relationship to moments. The nth moment of X equals:

E[X^n] = M_X^(n)(0)

where M_X^(n)(0) denotes the nth derivative of M_X(t) evaluated at t = 0.

Why does this work? Expand e^(tX) as a Taylor series:

e^(tX) = 1 + tX + (t²X²)/2! + (t³X³)/3! + …

Taking expectations:

M_X(t) = 1 + tE[X] + (t²E[X²])/2! + (t³E[X³])/3! + …

Differentiating once and setting t = 0 gives E[X]. Differentiating twice gives E[X²], and so on.

For variance, we use: Var(X) = E[X²] - (E[X])² = M’’(0) - (M’(0))²

Here’s a practical implementation using symbolic differentiation:

from sympy import symbols, exp, diff, lambdify
import sympy as sp

def extract_moments(mgf_expr, t_symbol, num_moments=4):
    """
    Extract moments from a symbolic MGF expression.
    
    Args:
        mgf_expr: SymPy expression for MGF
        t_symbol: SymPy symbol for t
        num_moments: number of moments to compute
    
    Returns:
        List of moments
    """
    moments = []
    for n in range(1, num_moments + 1):
        derivative = diff(mgf_expr, t_symbol, n)
        moment = derivative.subs(t_symbol, 0)
        moments.append(float(moment))
    return moments

# Example: Exponential distribution with rate λ = 2
t = symbols('t')
lam = 2
mgf_exp = lam / (lam - t)

moments = extract_moments(mgf_exp, t, num_moments=3)
print(f"Mean: {moments[0]}")
print(f"Second moment: {moments[1]}")
print(f"Variance: {moments[1] - moments[0]**2}")

Common Distribution MGFs

Here are MGFs for standard distributions:

Distribution Parameters MGF
Exponential λ λ/(λ - t), t < λ
Normal μ, σ² exp(μt + σ²t²/2)
Poisson λ exp(λ(e^t - 1))
Binomial n, p (1 - p + pe^t)^n

Let’s derive the exponential distribution MGF. For X ~ Exp(λ):

M_X(t) = ∫₀^∞ e^(tx) λe^(-λx) dx = λ ∫₀^∞ e^((t-λ)x) dx

This integral converges only when t < λ, giving:

M_X(t) = λ/(λ - t)

Here’s an implementation with verification:

import numpy as np
from scipy import stats

class DistributionMGF:
    """MGFs for common distributions with moment verification."""
    
    @staticmethod
    def exponential(t, lam):
        """Exponential distribution MGF."""
        if t >= lam:
            return np.inf
        return lam / (lam - t)
    
    @staticmethod
    def normal(t, mu, sigma):
        """Normal distribution MGF."""
        return np.exp(mu * t + 0.5 * sigma**2 * t**2)
    
    @staticmethod
    def poisson(t, lam):
        """Poisson distribution MGF."""
        return np.exp(lam * (np.exp(t) - 1))
    
    @staticmethod
    def binomial(t, n, p):
        """Binomial distribution MGF."""
        return (1 - p + p * np.exp(t))**n

# Verify exponential distribution moments
lam = 3.0
mgf = DistributionMGF()

# Numerical differentiation for mean
h = 1e-8
mean_from_mgf = (mgf.exponential(h, lam) - mgf.exponential(0, lam)) / h
theoretical_mean = 1 / lam

print(f"Mean from MGF: {mean_from_mgf:.6f}")
print(f"Theoretical mean: {theoretical_mean:.6f}")

# Verify using scipy
X = stats.expon(scale=1/lam)
print(f"Scipy mean: {X.mean():.6f}")

MGFs for Sums of Independent Random Variables

The most powerful property of MGFs is how they handle sums. If X and Y are independent:

M_(X+Y)(t) = M_X(t) × M_Y(t)

This is exponentially simpler than computing the convolution of probability distributions. Consider summing n independent exponential random variables—the convolution approach requires n-1 convolutions, while the MGF approach is a simple product.

Example: Sum of two independent normal distributions. If X ~ N(μ₁, σ₁²) and Y ~ N(μ₂, σ₂²):

M_(X+Y)(t) = exp(μ₁t + σ₁²t²/2) × exp(μ₂t + σ₂²t²/2) = exp((μ₁+μ₂)t + (σ₁²+σ₂²)t²/2)

This is the MGF of N(μ₁+μ₂, σ₁²+σ₂²), proving that the sum is also normal.

def compare_sum_approaches(n_samples=10000):
    """Compare convolution vs MGF for sum of random variables."""
    
    # Two independent exponential RVs
    lam1, lam2 = 2.0, 3.0
    
    # Approach 1: Direct simulation (convolution in practice)
    X1 = np.random.exponential(1/lam1, n_samples)
    X2 = np.random.exponential(1/lam2, n_samples)
    sum_samples = X1 + X2
    
    empirical_mean = np.mean(sum_samples)
    empirical_var = np.var(sum_samples)
    
    # Approach 2: MGF-based calculation
    # M_X1(t) = lam1/(lam1-t), M_X2(t) = lam2/(lam2-t)
    # Product gives MGF of sum
    # Mean from MGF: d/dt[product]|_{t=0}
    
    theoretical_mean = 1/lam1 + 1/lam2
    theoretical_var = 1/lam1**2 + 1/lam2**2
    
    print("Sum of two exponentials:")
    print(f"Empirical mean: {empirical_mean:.4f}")
    print(f"MGF-derived mean: {theoretical_mean:.4f}")
    print(f"Empirical variance: {empirical_var:.4f}")
    print(f"MGF-derived variance: {theoretical_var:.4f}")

compare_sum_approaches()

Practical Applications and Limitations

MGFs excel in theoretical derivations and when working with sums of independent variables. In risk analysis, if you model individual losses as random variables, the MGF of total loss is simply the product of individual MGFs (assuming independence).

However, MGFs have important limitations:

When MGFs don’t exist: Heavy-tailed distributions like Cauchy or Pareto often lack MGFs. Use characteristic functions instead—they always exist.

Numerical stability: For large t or extreme parameter values, MGFs can overflow. The log-MGF (cumulant generating function) is often more stable.

When to use MGFs:

  • Proving theoretical results about distributions
  • Finding distributions of sums
  • Calculating moments when integration is difficult

When to use alternatives:

  • Heavy-tailed distributions → characteristic functions
  • Numerical computation → direct moment calculation
  • High-precision work → cumulant generating functions

Here’s a complete workflow demonstrating MGF application:

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

def portfolio_risk_mgf(returns, weights, confidence=0.95):
    """
    Analyze portfolio risk using MGF approach.
    Assumes returns are normally distributed.
    """
    # Estimate parameters for each asset
    means = np.array([np.mean(r) for r in returns])
    stds = np.array([np.std(r) for r in returns])
    
    # Portfolio moments using MGF properties
    portfolio_mean = np.dot(weights, means)
    portfolio_var = np.dot(weights**2, stds**2)  # Assumes independence
    portfolio_std = np.sqrt(portfolio_var)
    
    # Value at Risk using normal MGF
    z_score = stats.norm.ppf(1 - confidence)
    var = portfolio_mean + z_score * portfolio_std
    
    print(f"Portfolio mean return: {portfolio_mean:.4f}")
    print(f"Portfolio std dev: {portfolio_std:.4f}")
    print(f"Value at Risk ({confidence:.0%}): {var:.4f}")
    
    return portfolio_mean, portfolio_std, var

# Simulate asset returns
np.random.seed(42)
asset1 = np.random.normal(0.08, 0.15, 252)  # 8% mean, 15% vol
asset2 = np.random.normal(0.06, 0.10, 252)  # 6% mean, 10% vol
asset3 = np.random.normal(0.10, 0.20, 252)  # 10% mean, 20% vol

returns = [asset1, asset2, asset3]
weights = np.array([0.4, 0.3, 0.3])

portfolio_risk_mgf(returns, weights)

Conclusion

Moment generating functions are a fundamental tool in probability theory that transform the problem of computing moments into simple differentiation. Their uniqueness property makes them perfect for identifying distributions, while the product property for independent sums makes otherwise intractable calculations trivial.

Reach for MGFs when you need to derive theoretical results, work with sums of independent random variables, or calculate moments without direct integration. Remember their limitations: they don’t exist for all distributions, and numerical computation can be unstable for extreme values.

For practitioners, MGFs sit in that sweet spot between pure theory and computational statistics. They’re not always the fastest numerical method, but understanding them deepens your intuition about probability distributions and often reveals elegant solutions to complex problems.

Master MGFs, and you’ll find yourself recognizing patterns in probability problems that others miss—patterns that lead to simpler, more insightful solutions.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.