How to Perform White's Test for Heteroscedasticity in R

Heteroscedasticity occurs when the variance of residuals in a regression model is not constant across observations. This violates a core assumption of ordinary least squares (OLS) regression: that...

Key Insights

  • White’s test detects heteroscedasticity without assuming a specific functional form, making it more general than Breusch-Pagan but potentially less powerful for detecting specific patterns
  • The test works by regressing squared residuals on all regressors, their squares, and cross-products, then checking if this auxiliary regression explains significant variance
  • When heteroscedasticity is detected, use heteroscedasticity-consistent (HC) standard errors rather than transforming your data—this preserves coefficient interpretability while correcting inference

Introduction to Heteroscedasticity

Heteroscedasticity occurs when the variance of residuals in a regression model is not constant across observations. This violates a core assumption of ordinary least squares (OLS) regression: that errors have constant variance (homoscedasticity). While heteroscedasticity doesn’t bias your coefficient estimates, it wreaks havoc on standard errors, making your hypothesis tests and confidence intervals unreliable.

In practice, heteroscedasticity appears constantly. Income data shows more variance at higher levels. Firm-level data exhibits greater variability for larger companies. Time series often display volatility clustering. Ignoring these patterns leads to overconfident conclusions.

Let’s visualize the difference between homoscedastic and heteroscedastic residuals:

set.seed(42)
n <- 200

# Homoscedastic data
x_homo <- runif(n, 1, 10)
y_homo <- 2 + 3 * x_homo + rnorm(n, 0, 2)

# Heteroscedastic data (variance increases with x)
x_hetero <- runif(n, 1, 10)
y_hetero <- 2 + 3 * x_hetero + rnorm(n, 0, 0.5 * x_hetero)

par(mfrow = c(1, 2))

# Plot homoscedastic
plot(x_homo, y_homo, main = "Homoscedastic Residuals",
     xlab = "X", ylab = "Y", pch = 19, col = rgb(0, 0, 0, 0.5))
abline(lm(y_homo ~ x_homo), col = "red", lwd = 2)

# Plot heteroscedastic
plot(x_hetero, y_hetero, main = "Heteroscedastic Residuals",
     xlab = "X", ylab = "Y", pch = 19, col = rgb(0, 0, 0, 0.5))
abline(lm(y_hetero ~ x_hetero), col = "red", lwd = 2)

The heteroscedastic plot shows the classic “fan” or “cone” shape—residual spread increases as X grows. This pattern invalidates standard OLS inference.

Understanding White’s Test

White’s test, developed by Halbert White in 1980, provides a general test for heteroscedasticity. Unlike the Breusch-Pagan test, which assumes a specific linear relationship between variance and regressors, White’s test makes no such assumption. It detects any form of heteroscedasticity, including nonlinear patterns.

The test works through an auxiliary regression:

  1. Fit your original model and obtain residuals
  2. Square the residuals
  3. Regress squared residuals on all original regressors, their squares, and all cross-products
  4. Test whether this auxiliary regression has explanatory power using an LM (Lagrange Multiplier) statistic

The null and alternative hypotheses are:

  • H₀: Homoscedasticity (constant variance)
  • H₁: Heteroscedasticity of unknown form

The test statistic follows a chi-squared distribution with degrees of freedom equal to the number of regressors in the auxiliary regression (excluding the intercept).

When to use White’s test vs. Breusch-Pagan:

Use White’s test when you have no prior belief about the form of heteroscedasticity. Use Breusch-Pagan when you suspect variance is linearly related to specific variables. White’s test is more general but less powerful for detecting specific patterns—it trades specificity for flexibility.

Setting Up Your R Environment

You’ll need several packages. The skedastic package provides the cleanest implementation of White’s test. The lmtest package offers the Breusch-Pagan test for comparison. The sandwich package provides robust standard errors when heteroscedasticity is detected.

# Install packages if needed
install.packages(c("skedastic", "lmtest", "sandwich", "car"))

# Load packages
library(skedastic)
library(lmtest)
library(sandwich)
library(car)

Performing White’s Test Step-by-Step

Let’s work through a complete example using simulated data with known heteroscedasticity:

# Generate heteroscedastic data
set.seed(123)
n <- 300

x1 <- rnorm(n, 50, 10)
x2 <- rnorm(n, 30, 5)

# Error variance depends on x1 (heteroscedastic)
error_sd <- 0.1 * x1
y <- 10 + 0.5 * x1 + 0.8 * x2 + rnorm(n, 0, error_sd)

# Create data frame
df <- data.frame(y = y, x1 = x1, x2 = x2)

# Fit the linear model
model <- lm(y ~ x1 + x2, data = df)
summary(model)

Now perform White’s test using the skedastic package:

# White's test using skedastic
white_result <- white(model)
print(white_result)

The skedastic::white() function handles the auxiliary regression automatically. It includes the original regressors, their squares, and cross-products.

For educational purposes, here’s the manual implementation:

# Manual White's test implementation
residuals_sq <- residuals(model)^2

# Create auxiliary regression variables
aux_data <- data.frame(
  resid_sq = residuals_sq,
  x1 = x1,
  x2 = x2,
  x1_sq = x1^2,
  x2_sq = x2^2,
  x1_x2 = x1 * x2
)

# Fit auxiliary regression
aux_model <- lm(resid_sq ~ x1 + x2 + x1_sq + x2_sq + x1_x2, data = aux_data)

# Calculate test statistic (n * R-squared)
r_squared <- summary(aux_model)$r.squared
test_stat <- n * r_squared
df_aux <- length(coef(aux_model)) - 1  # Degrees of freedom

# P-value from chi-squared distribution
p_value <- 1 - pchisq(test_stat, df_aux)

cat("White's Test (Manual)\n")
cat("Test statistic:", round(test_stat, 4), "\n")
cat("Degrees of freedom:", df_aux, "\n")
cat("P-value:", round(p_value, 4), "\n")

Interpreting Results

Let’s examine typical output from White’s test:

# Run White's test and examine output
white_result <- white(model)
print(white_result)

# Typical output structure:
# White's Test for Heteroscedasticity
# 
# data:  model
# Test statistic = 45.234, df = 5, p-value = 1.234e-08

Interpretation guidelines:

  • Test statistic: The LM statistic (n × R² from auxiliary regression). Larger values indicate stronger evidence of heteroscedasticity.
  • Degrees of freedom: Number of terms in auxiliary regression minus one.
  • P-value: Probability of observing this test statistic under homoscedasticity.

Decision rules:

# Decision framework
alpha <- 0.05  # Significance level

if (white_result$p.value < alpha) {
  cat("Reject H0: Evidence of heteroscedasticity detected.\n")
  cat("Standard errors from OLS are unreliable.\n")
  cat("Consider robust standard errors or WLS.\n")
} else {
  cat("Fail to reject H0: No significant evidence of heteroscedasticity.\n")
  cat("OLS standard errors are likely reliable.\n")
}

A common mistake is treating a non-significant result as proof of homoscedasticity. It’s not—it simply means you lack sufficient evidence to reject homoscedasticity. With small samples, White’s test has limited power.

Comparing with Breusch-Pagan:

# Breusch-Pagan test for comparison
bp_result <- bptest(model)
print(bp_result)

cat("\nComparison:\n")
cat("White's test p-value:", white_result$p.value, "\n")
cat("Breusch-Pagan p-value:", bp_result$p.value, "\n")

When both tests agree, you have stronger evidence. When they disagree, White’s test may be detecting nonlinear heteroscedasticity that Breusch-Pagan misses.

Addressing Heteroscedasticity

Once you’ve detected heteroscedasticity, you have three main options. I recommend robust standard errors as the default approach.

Heteroscedasticity-consistent (HC) standard errors correct inference without changing coefficient estimates:

# Original summary with potentially invalid SEs
summary(model)

# Robust standard errors using sandwich package
robust_se <- vcovHC(model, type = "HC3")

# Coefficient test with robust SEs
coeftest(model, vcov = robust_se)

# Compare standard errors
cat("\nStandard Error Comparison:\n")
cat("Original SEs:", sqrt(diag(vcov(model))), "\n")
cat("Robust SEs:  ", sqrt(diag(robust_se)), "\n")

The type argument specifies which HC estimator to use:

  • HC0: Original White estimator (downward biased in small samples)
  • HC1: Degrees-of-freedom correction
  • HC3: Recommended for small to moderate samples (more conservative)
  • HC4: Better for high-leverage observations

Use HC3 as your default.

Option 2: Weighted Least Squares

If you know the variance structure, WLS can be more efficient:

# Estimate weights (inverse of estimated variance)
# Here we assume variance proportional to x1^2
weights <- 1 / (x1^2)

# Fit weighted least squares
wls_model <- lm(y ~ x1 + x2, data = df, weights = weights)
summary(wls_model)

# Check if heteroscedasticity is resolved
white(wls_model)

WLS requires knowing or correctly specifying the variance function. If you specify it incorrectly, you may introduce bias. This is why robust standard errors are often preferred.

Option 3: Data Transformations

Log transformations can stabilize variance when the response is strictly positive:

# Log transformation (only if y > 0)
if (all(y > 0)) {
  log_model <- lm(log(y) ~ x1 + x2, data = df)
  white(log_model)
}

Be cautious with transformations—they change coefficient interpretation. A coefficient in a log-linear model represents percentage change, not unit change.

Complete Workflow Example

# Full diagnostic and correction workflow
diagnostic_workflow <- function(model) {
  cat("=== Heteroscedasticity Diagnostic Workflow ===\n\n")
  
  # 1. Visual inspection
  par(mfrow = c(1, 2))
  plot(fitted(model), residuals(model),
       main = "Residuals vs Fitted",
       xlab = "Fitted values", ylab = "Residuals")
  abline(h = 0, col = "red", lty = 2)
  
  plot(fitted(model), sqrt(abs(residuals(model))),
       main = "Scale-Location Plot",
       xlab = "Fitted values", ylab = "√|Residuals|")
  
  # 2. Formal tests
  cat("White's Test:\n")
  print(white(model))
  
  cat("\nBreusch-Pagan Test:\n")
  print(bptest(model))
  
  # 3. Robust inference
  cat("\n=== Robust Standard Errors (HC3) ===\n")
  print(coeftest(model, vcov = vcovHC(model, type = "HC3")))
}

# Run workflow
diagnostic_workflow(model)

Conclusion

White’s test provides a flexible, assumption-free approach to detecting heteroscedasticity. The workflow is straightforward: fit your model, run skedastic::white(), interpret the p-value, and apply robust standard errors if needed.

Best practices:

  1. Always visualize residuals before running formal tests
  2. Use White’s test when you don’t know the form of heteroscedasticity
  3. Default to HC3 robust standard errors when heteroscedasticity is detected
  4. Report both original and robust standard errors for transparency
  5. Don’t over-interpret non-significant results—they don’t prove homoscedasticity

Common pitfalls to avoid:

  • Running White’s test on small samples (low power leads to false negatives)
  • Using HC0 instead of HC3 in small samples
  • Applying WLS with an incorrectly specified variance function
  • Transforming data without considering interpretation changes

Heteroscedasticity is common in real data. Detecting and addressing it properly ensures your statistical inference remains valid.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.