How to Perform the Wald Test in Python

Key Insights

The Wald test is the default method statsmodels uses to assess coefficient significance in regression models—those z-scores and p-values in your summary output are Wald statistics
Use the .wald_test() method to test complex hypotheses like whether multiple coefficients are simultaneously zero or whether two coefficients are equal to each other
Wald tests can be unreliable with small samples or when parameters are near boundary values; consider the likelihood ratio test as a more robust alternative in these situations

Introduction to the Wald Test

The Wald test is one of the three classical approaches to hypothesis testing in statistical models, alongside the likelihood ratio test and the score test. Named after statistician Abraham Wald, it’s the go-to method for testing whether model parameters are significantly different from a hypothesized value—usually zero.

Here’s the practical reality: if you’ve ever looked at a regression summary in Python and checked whether a coefficient’s p-value is below 0.05, you’ve already used the Wald test. It’s baked into the standard output of most statistical modeling libraries.

The test works by comparing your estimated parameter to its hypothesized value, scaled by the standard error of that estimate. When your sample size is large enough, this ratio follows a known distribution, letting you calculate p-values and make inference decisions.

You’ll reach for the Wald test when you need to:

Determine if individual predictors are significant in logistic or linear regression
Test whether multiple coefficients are jointly significant
Check if coefficients satisfy specific constraints (like being equal to each other)

The test pairs naturally with maximum likelihood estimation because MLE provides both the parameter estimates and the standard errors needed for the calculation.

Mathematical Foundation

The Wald statistic has a straightforward formula. For a single parameter:

$$W = \frac{(\hat{\theta} - \theta_0)^2}{\text{Var}(\hat{\theta})}$$

Where $\hat{\theta}$ is your estimated parameter, $\theta_0$ is the hypothesized value (typically zero), and $\text{Var}(\hat{\theta})$ is the variance of the estimate.

For testing a single coefficient, you’ll often see this expressed as a z-statistic:

$$z = \frac{\hat{\theta} - \theta_0}{\text{SE}(\hat{\theta})}$$

Under the null hypothesis, this z-statistic follows a standard normal distribution asymptotically. The squared z-statistic follows a chi-squared distribution with one degree of freedom.

When testing multiple parameters simultaneously, the Wald statistic generalizes to:

$$W = (\mathbf{R}\hat{\boldsymbol{\beta}} - \mathbf{r})^T [\mathbf{R} \hat{\mathbf{V}} \mathbf{R}^T]^{-1} (\mathbf{R}\hat{\boldsymbol{\beta}} - \mathbf{r})$$

Where $\mathbf{R}$ is a restriction matrix, $\hat{\boldsymbol{\beta}}$ is your coefficient vector, $\mathbf{r}$ is the vector of hypothesized values, and $\hat{\mathbf{V}}$ is the estimated covariance matrix. This statistic follows a chi-squared distribution with degrees of freedom equal to the number of restrictions.

Don’t let the matrix notation intimidate you—Python handles all of this automatically.

Wald Test in Logistic Regression with statsmodels

Let’s start with the most common scenario: interpreting Wald statistics from a logistic regression model. The summary output gives you everything you need.

import numpy as np
import pandas as pd
import statsmodels.api as sm

# Generate sample data
np.random.seed(42)
n = 500

age = np.random.normal(45, 12, n)
income = np.random.normal(50000, 15000, n)
education_years = np.random.normal(14, 3, n)

# Create binary outcome with known relationships
log_odds = -5 + 0.03 * age + 0.00004 * income + 0.15 * education_years
prob = 1 / (1 + np.exp(-log_odds))
purchased = np.random.binomial(1, prob)

# Prepare data
df = pd.DataFrame({
    'age': age,
    'income': income,
    'education_years': education_years,
    'purchased': purchased
})

X = sm.add_constant(df[['age', 'income', 'education_years']])
y = df['purchased']

# Fit logistic regression
model = sm.Logit(y, X)
result = model.fit()

print(result.summary())

The output includes a table with columns for coefficient estimates, standard errors, z-values, and p-values:

                           Logit Regression Results                           
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const         -4.8732      0.892     -5.463      0.000      -6.622      -3.124
age            0.0287      0.012      2.392      0.017       0.005       0.052
income         0.0000      0.000      2.847      0.004       0.000       0.000
education      0.1423      0.048      2.965      0.003       0.048       0.236
==============================================================================

Those z-values are Wald statistics. The p-values tell you whether each coefficient is significantly different from zero. In this output, all predictors are significant at the 0.05 level.

You can also access these values programmatically:

# Extract Wald statistics
print("Wald z-statistics:", result.tvalues)
print("P-values:", result.pvalues)
print("Standard errors:", result.bse)

Testing Linear Restrictions with wald_test()

The real power of the Wald test emerges when you need to test hypotheses beyond “is this coefficient zero?” The .wald_test() method handles complex restrictions elegantly.

Testing if multiple coefficients are jointly zero:

# Test if age and income coefficients are both zero
# This tests the joint significance of these predictors
hypothesis = '(age = 0), (income = 0)'
wald_result = result.wald_test(hypothesis)

print(f"Wald statistic: {wald_result.statistic[0][0]:.4f}")
print(f"P-value: {wald_result.pvalue:.4f}")
print(f"Degrees of freedom: {wald_result.df_denom}")

Testing coefficient equality:

# Test if the effect of age equals the effect of education
# (after appropriate scaling consideration)
hypothesis_equality = 'age = education_years'
wald_equality = result.wald_test(hypothesis_equality)

print(f"Wald statistic: {wald_equality.statistic[0][0]:.4f}")
print(f"P-value: {wald_equality.pvalue:.4f}")

Using restriction matrices directly:

For more control, specify restrictions as matrices:

# Test that age coefficient equals 0.03
# R @ params = r format
R = np.array([[0, 1, 0, 0]])  # Select age coefficient
r = np.array([0.03])          # Hypothesized value

wald_matrix = result.wald_test((R, r))
print(f"Testing if age coefficient = 0.03")
print(f"Wald statistic: {wald_matrix.statistic[0][0]:.4f}")
print(f"P-value: {wald_matrix.pvalue:.4f}")

Manual Wald Test Implementation

Understanding the manual calculation helps you verify results and implement custom tests. Here’s how to compute the Wald statistic from scratch:

def manual_wald_test(estimate, std_error, null_value=0):
    """
    Calculate Wald statistic and p-value for a single parameter.
    
    Parameters:
    -----------
    estimate : float
        The estimated parameter value
    std_error : float
        Standard error of the estimate
    null_value : float
        Hypothesized value under H0 (default: 0)
    
    Returns:
    --------
    dict with z_stat, wald_stat, and p_value
    """
    from scipy import stats
    
    z_stat = (estimate - null_value) / std_error
    wald_stat = z_stat ** 2
    
    # Two-tailed p-value from normal distribution
    p_value_z = 2 * (1 - stats.norm.cdf(abs(z_stat)))
    
    # Equivalent p-value from chi-squared distribution
    p_value_chi2 = 1 - stats.chi2.cdf(wald_stat, df=1)
    
    return {
        'z_statistic': z_stat,
        'wald_statistic': wald_stat,
        'p_value': p_value_z,
        'p_value_chi2': p_value_chi2
    }

# Verify against statsmodels output
age_coef = result.params['age']
age_se = result.bse['age']

manual_result = manual_wald_test(age_coef, age_se)
print(f"Manual z-statistic: {manual_result['z_statistic']:.4f}")
print(f"Statsmodels z-statistic: {result.tvalues['age']:.4f}")
print(f"Manual p-value: {manual_result['p_value']:.4f}")
print(f"Statsmodels p-value: {result.pvalues['age']:.4f}")

For multivariate tests:

def multivariate_wald_test(R, params, cov_matrix, r=None):
    """
    Calculate Wald statistic for linear restrictions R @ params = r
    """
    from scipy import stats
    
    if r is None:
        r = np.zeros(R.shape[0])
    
    diff = R @ params - r
    var_diff = R @ cov_matrix @ R.T
    
    wald_stat = diff.T @ np.linalg.inv(var_diff) @ diff
    df = R.shape[0]
    p_value = 1 - stats.chi2.cdf(wald_stat, df=df)
    
    return {'wald_statistic': wald_stat, 'df': df, 'p_value': p_value}

# Test age and income jointly equal zero
R = np.array([[0, 1, 0, 0],
              [0, 0, 1, 0]])
params = result.params.values
cov = result.cov_params().values

manual_joint = multivariate_wald_test(R, params, cov)
print(f"Manual joint Wald statistic: {manual_joint['wald_statistic']:.4f}")

Wald Test with scipy.stats

When working outside statsmodels or building custom testing procedures, scipy.stats provides the distribution functions you need:

from scipy import stats

# Given a Wald statistic and degrees of freedom, calculate p-value
wald_statistic = 8.52
degrees_of_freedom = 2

p_value = 1 - stats.chi2.cdf(wald_statistic, df=degrees_of_freedom)
print(f"Chi-squared p-value: {p_value:.4f}")

# For single-parameter tests, use the normal distribution
z_statistic = 2.39
p_value_two_tailed = 2 * stats.norm.sf(abs(z_statistic))
print(f"Two-tailed p-value from z: {p_value_two_tailed:.4f}")

# Calculate critical values
alpha = 0.05
critical_chi2 = stats.chi2.ppf(1 - alpha, df=degrees_of_freedom)
critical_z = stats.norm.ppf(1 - alpha/2)

print(f"Chi-squared critical value (df=2): {critical_chi2:.4f}")
print(f"Z critical value (two-tailed): {critical_z:.4f}")

This approach is particularly useful when implementing Wald tests for custom estimators or when you need fine-grained control over the testing procedure.

Limitations and Alternatives

The Wald test has known weaknesses you should understand:

Small sample problems: The Wald test relies on asymptotic theory. With small samples, the actual distribution of the test statistic may differ substantially from the assumed chi-squared distribution, leading to incorrect p-values.

Boundary parameter issues: When true parameter values are near the boundary of the parameter space (like probabilities near 0 or 1), Wald tests become unreliable. The standard errors can be poorly estimated in these regions.

Parameterization dependence: Unlike the likelihood ratio test, Wald test results can change depending on how you parameterize your model. Testing $\theta = 0$ gives different results than testing $e^\theta = 1$, even though these are mathematically equivalent hypotheses.

Alternatives to consider:

The likelihood ratio test compares the fit of nested models directly and is generally more reliable with smaller samples. In statsmodels:

# Fit restricted model (without age)
X_restricted = sm.add_constant(df[['income', 'education_years']])
result_restricted = sm.Logit(y, X_restricted).fit()

# Likelihood ratio test
lr_stat = 2 * (result.llf - result_restricted.llf)
lr_pvalue = 1 - stats.chi2.cdf(lr_stat, df=1)
print(f"LR test for age: statistic={lr_stat:.4f}, p-value={lr_pvalue:.4f}")

The score test (Lagrange multiplier test) only requires fitting the restricted model, making it computationally efficient when testing many restrictions.

As a rule of thumb: use Wald tests for quick coefficient significance checks in large samples. Switch to likelihood ratio tests when sample sizes are modest, when you’re testing parameters that might be near boundaries, or when the stakes of your inference are high.