How to Perform White's Test for Heteroscedasticity in Python

Key Insights

White’s test detects heteroscedasticity without assuming a specific functional form, making it more general than Breusch-Pagan when you don’t know how variance relates to your predictors.
A significant p-value (typically < 0.05) rejects the null hypothesis of homoscedasticity, meaning your OLS standard errors are unreliable and hypothesis tests are invalid.
When heteroscedasticity is present, robust standard errors (HC3 recommended for small samples) provide valid inference without requiring you to correctly specify the variance structure.

Introduction to Heteroscedasticity

Heteroscedasticity occurs when the variance of regression residuals changes across levels of your independent variables. This violates a core assumption of ordinary least squares (OLS) regression: that error terms have constant variance (homoscedasticity).

The consequences are serious. While your coefficient estimates remain unbiased, your standard errors become incorrect. This means confidence intervals are wrong, t-tests are unreliable, and you might declare effects significant when they’re not—or miss real effects entirely.

Let’s visualize the difference between well-behaved residuals and problematic ones:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)
n = 200

# Homoscedastic data: constant variance
x_homo = np.linspace(1, 10, n)
y_homo = 2 + 3 * x_homo + np.random.normal(0, 2, n)

# Heteroscedastic data: variance increases with x
x_hetero = np.linspace(1, 10, n)
y_hetero = 2 + 3 * x_hetero + np.random.normal(0, 0.5 * x_hetero, n)

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

axes[0].scatter(x_homo, y_homo, alpha=0.6)
axes[0].set_title('Homoscedastic Residuals')
axes[0].set_xlabel('X')
axes[0].set_ylabel('Y')

axes[1].scatter(x_hetero, y_hetero, alpha=0.6)
axes[1].set_title('Heteroscedastic Residuals')
axes[1].set_xlabel('X')
axes[1].set_ylabel('Y')

plt.tight_layout()
plt.savefig('heteroscedasticity_comparison.png', dpi=150)
plt.show()

The left plot shows constant spread around the regression line. The right plot shows the classic “funnel” pattern where variance increases with X. Your eyes can often spot severe heteroscedasticity, but formal tests provide objective evidence.

Understanding White’s Test

White’s test, developed by Halbert White in 1980, tests the null hypothesis that residuals are homoscedastic. The alternative hypothesis is that some form of heteroscedasticity exists.

The test works by regressing squared residuals on the original regressors, their squares, and their cross-products. If this auxiliary regression has significant explanatory power, heteroscedasticity is present.

Why choose White’s test over the Breusch-Pagan test? Breusch-Pagan assumes heteroscedasticity follows a specific linear form with respect to your predictors. White’s test makes no such assumption—it’s a general test that detects any form of heteroscedasticity related to your model’s regressors.

The test statistic follows a chi-squared distribution under the null hypothesis:

LM = n * R² (from auxiliary regression)

Where n is sample size and R² comes from regressing squared residuals on the expanded set of regressors. The degrees of freedom equal the number of regressors in the auxiliary regression (excluding the constant).

Setting Up Your Environment

You’ll need statsmodels for regression and diagnostic tests, plus the usual scientific Python stack:

import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.stats.diagnostic import het_white
from statsmodels.stats.stattools import durbin_watson
import matplotlib.pyplot as plt

# Create a realistic dataset with heteroscedasticity
np.random.seed(123)
n = 500

# Predictors
experience = np.random.uniform(0, 20, n)
education = np.random.uniform(12, 22, n)

# Salary with heteroscedastic errors (variance increases with experience)
base_salary = 30000 + 2000 * experience + 1500 * education
error_variance = 1000 + 500 * experience  # Variance depends on experience
errors = np.random.normal(0, error_variance, n)
salary = base_salary + errors

# Build DataFrame
df = pd.DataFrame({
    'salary': salary,
    'experience': experience,
    'education': education
})

print(df.describe())

This dataset simulates salary data where prediction uncertainty increases with experience—a realistic scenario where early-career salaries are more predictable than senior-level compensation.

Running White’s Test with Statsmodels

Here’s the complete workflow from regression to diagnostic testing:

# Define features and target
X = df[['experience', 'education']]
y = df['salary']

# Add constant for intercept
X_const = sm.add_constant(X)

# Fit OLS model
model = sm.OLS(y, X_const)
results = model.fit()

print("=== OLS Regression Results ===")
print(results.summary())

# Perform White's test
white_test = het_white(results.resid, X_const)

# Unpack results
lm_stat, lm_pvalue, f_stat, f_pvalue = white_test

print("\n=== White's Test Results ===")
print(f"LM Statistic: {lm_stat:.4f}")
print(f"LM p-value: {lm_pvalue:.4f}")
print(f"F-Statistic: {f_stat:.4f}")
print(f"F p-value: {f_pvalue:.4f}")

The het_white() function returns four values:

LM Statistic: The Lagrange Multiplier test statistic (n × R² from auxiliary regression)
LM p-value: Probability of observing this LM statistic under homoscedasticity
F-Statistic: An F-test version of the same hypothesis
F p-value: Probability for the F-test

Both tests evaluate the same hypothesis. The LM test is asymptotically valid, while the F-test may perform better in smaller samples.

Interpreting Results and Making Decisions

The interpretation is straightforward: a small p-value means you reject homoscedasticity.

def interpret_white_test(lm_pvalue, alpha=0.05):
    """
    Interpret White's test results and provide recommendations.
    
    Parameters:
    -----------
    lm_pvalue : float
        P-value from White's test LM statistic
    alpha : float
        Significance level (default 0.05)
    
    Returns:
    --------
    dict : Interpretation and recommendations
    """
    result = {
        'p_value': lm_pvalue,
        'alpha': alpha,
        'reject_null': lm_pvalue < alpha,
        'interpretation': '',
        'recommendation': ''
    }
    
    if lm_pvalue < alpha:
        result['interpretation'] = (
            f"Heteroscedasticity detected (p={lm_pvalue:.4f} < {alpha}). "
            "OLS standard errors are unreliable."
        )
        result['recommendation'] = (
            "Use heteroscedasticity-robust standard errors (HC3 recommended) "
            "or consider weighted least squares if you can model the variance structure."
        )
    else:
        result['interpretation'] = (
            f"No significant heteroscedasticity detected (p={lm_pvalue:.4f} >= {alpha}). "
            "OLS assumptions appear satisfied."
        )
        result['recommendation'] = (
            "Standard OLS inference is appropriate. Consider robust standard errors "
            "anyway for robustness, especially with smaller samples."
        )
    
    return result

# Apply to our test results
interpretation = interpret_white_test(lm_pvalue)
print(f"\nInterpretation: {interpretation['interpretation']}")
print(f"Recommendation: {interpretation['recommendation']}")

A few practical notes on threshold selection:

α = 0.05 is conventional but arbitrary. In exploratory analysis, you might use 0.10 to be more conservative about assuming homoscedasticity.
Very large samples will detect trivial heteroscedasticity. Consider the practical significance, not just statistical significance.
Very small samples have low power. Non-rejection doesn’t prove homoscedasticity exists.

Remediation Strategies

When White’s test indicates heteroscedasticity, you have several options. Robust standard errors are the most practical for most applications.

Robust Standard Errors

Statsmodels supports several heteroscedasticity-consistent (HC) covariance estimators:

# Compare standard and robust standard errors
print("=== Standard OLS Results ===")
print(results.summary())

# HC0: White's original estimator
results_hc0 = results.get_robustcov_results(cov_type='HC0')

# HC1: Degrees of freedom correction
results_hc1 = results.get_robustcov_results(cov_type='HC1')

# HC2: Accounts for leverage
results_hc2 = results.get_robustcov_results(cov_type='HC2')

# HC3: Best for small samples (more conservative)
results_hc3 = results.get_robustcov_results(cov_type='HC3')

print("\n=== Comparison of Standard Errors ===")
print(f"{'Estimator':<12} {'experience SE':<15} {'education SE':<15}")
print("-" * 42)
print(f"{'OLS':<12} {results.bse['experience']:<15.4f} {results.bse['education']:<15.4f}")
print(f"{'HC0':<12} {results_hc0.bse['experience']:<15.4f} {results_hc0.bse['education']:<15.4f}")
print(f"{'HC1':<12} {results_hc1.bse['experience']:<15.4f} {results_hc1.bse['education']:<15.4f}")
print(f"{'HC2':<12} {results_hc2.bse['experience']:<15.4f} {results_hc2.bse['education']:<15.4f}")
print(f"{'HC3':<12} {results_hc3.bse['experience']:<15.4f} {results_hc3.bse['education']:<15.4f}")

# Use HC3 for inference
print("\n=== Robust Regression Results (HC3) ===")
print(results_hc3.summary())

Which estimator to use? HC3 is generally recommended for samples under 250 observations. It’s more conservative and provides better coverage in finite samples. For large samples, the differences become negligible.

Weighted Least Squares

If you can model how variance changes with your predictors, weighted least squares (WLS) is more efficient:

# Estimate variance function (simplified approach)
# Assume variance is proportional to experience
weights = 1 / (df['experience'] + 1)  # Add 1 to avoid division by zero

# Fit WLS model
wls_model = sm.WLS(y, X_const, weights=weights)
wls_results = wls_model.fit()

print("=== Weighted Least Squares Results ===")
print(wls_results.summary())

# Verify: White's test on WLS residuals
white_test_wls = het_white(wls_results.resid, X_const)
print(f"\nWhite's test on WLS residuals: p-value = {white_test_wls[1]:.4f}")

WLS requires correctly specifying the variance structure. If you get it wrong, you may introduce bias. Robust standard errors are safer when you’re uncertain.

Data Transformations

Log transformations often stabilize variance when dealing with positive, right-skewed data:

# Log transformation (common for salary data)
df['log_salary'] = np.log(df['salary'])

X_log = sm.add_constant(df[['experience', 'education']])
model_log = sm.OLS(df['log_salary'], X_log)
results_log = model_log.fit()

# Test transformed model
white_test_log = het_white(results_log.resid, X_log)
print(f"White's test on log-transformed model: p-value = {white_test_log[1]:.4f}")

Be aware that log transformation changes your model’s interpretation. Coefficients now represent percentage changes rather than absolute changes.

Conclusion

White’s test belongs in your standard regression diagnostic toolkit. Here’s the workflow:

Fit your OLS model
Run White’s test on the residuals
If p-value < 0.05, use robust standard errors (HC3 for small samples, HC0/HC1 for large)
Report robust standard errors in your results

Don’t skip diagnostics just because your coefficients look reasonable. Heteroscedasticity doesn’t bias your estimates, but it does invalidate your inference. A “significant” result with wrong standard errors is meaningless.

For production code, consider always using robust standard errors. The cost is minimal (slightly wider confidence intervals when homoscedasticity holds), and the protection against misspecification is valuable. Many econometricians now recommend this as default practice.