How to Calculate the Durbin-Watson Statistic in Python

Key Insights

The Durbin-Watson statistic ranges from 0 to 4, where values near 2 indicate no autocorrelation, values below 2 suggest positive autocorrelation, and values above 2 suggest negative autocorrelation in regression residuals.
Python’s statsmodels library provides a one-line solution with durbin_watson(), but understanding the manual calculation helps you grasp what the test actually measures.
When autocorrelation is detected, don’t just report it—fix it using techniques like Newey-West standard errors or generalized least squares to ensure valid statistical inference.

Introduction to the Durbin-Watson Statistic

The Durbin-Watson statistic is a diagnostic test that every regression practitioner should have in their toolkit. It detects autocorrelation in the residuals of a regression model—a violation of the classical linear regression assumption that errors are independent of each other.

Why does this matter? When residuals are autocorrelated, your standard errors become unreliable. Confidence intervals shrink artificially, t-statistics inflate, and you end up making incorrect inferences about your coefficients. This problem is especially prevalent in time series data, where today’s error term often correlates with yesterday’s.

The statistic ranges from 0 to 4:

DW ≈ 2: No autocorrelation (ideal)
DW < 2: Positive autocorrelation (errors trend together)
DW > 2: Negative autocorrelation (errors alternate in sign)

A rough rule of thumb: values between 1.5 and 2.5 are generally acceptable, though formal critical values depend on your sample size and number of predictors.

The Mathematical Foundation

The Durbin-Watson statistic has an elegant formula that directly measures how much consecutive residuals differ from each other:

$$DW = \frac{\sum_{t=2}^{n}(e_t - e_{t-1})^2}{\sum_{t=1}^{n}e_t^2}$$

The numerator sums the squared differences between consecutive residuals. The denominator is simply the sum of squared residuals. When consecutive residuals are similar (positive autocorrelation), the numerator is small relative to the denominator, pushing DW toward 0. When they alternate dramatically (negative autocorrelation), the numerator grows large, pushing DW toward 4.

There’s a useful approximation relating DW to the first-order autocorrelation coefficient (ρ):

$$DW \approx 2(1 - \rho)$$

This means DW = 2 when ρ = 0 (no autocorrelation), DW = 0 when ρ = 1 (perfect positive autocorrelation), and DW = 4 when ρ = -1 (perfect negative autocorrelation).

Let’s implement the manual calculation:

import numpy as np

def durbin_watson_manual(residuals: np.ndarray) -> float:
    """
    Calculate the Durbin-Watson statistic manually.
    
    Parameters
    ----------
    residuals : np.ndarray
        Array of regression residuals (in time order)
    
    Returns
    -------
    float
        Durbin-Watson statistic (range 0-4)
    """
    residuals = np.asarray(residuals)
    
    # Numerator: sum of squared differences between consecutive residuals
    diff_squared = np.sum(np.diff(residuals) ** 2)
    
    # Denominator: sum of squared residuals
    resid_squared = np.sum(residuals ** 2)
    
    return diff_squared / resid_squared

# Example with synthetic residuals
np.random.seed(42)
independent_residuals = np.random.normal(0, 1, 100)
print(f"Independent residuals DW: {durbin_watson_manual(independent_residuals):.4f}")

# Create positively autocorrelated residuals
autocorrelated_residuals = np.zeros(100)
autocorrelated_residuals[0] = np.random.normal(0, 1)
for t in range(1, 100):
    autocorrelated_residuals[t] = 0.8 * autocorrelated_residuals[t-1] + np.random.normal(0, 0.5)

print(f"Autocorrelated residuals DW: {durbin_watson_manual(autocorrelated_residuals):.4f}")

Independent residuals DW: 1.8631
Autocorrelated residuals DW: 0.4127

The independent residuals produce a DW near 2, while the autocorrelated series gives a value well below 1.5, signaling a problem.

Using Statsmodels for Quick Calculation

In practice, you’ll use statsmodels rather than rolling your own implementation. The library provides the durbin_watson() function that handles edge cases and integrates seamlessly with regression workflows:

import numpy as np
import statsmodels.api as sm
from statsmodels.stats.stattools import durbin_watson

# Generate sample data with a time trend
np.random.seed(123)
n = 100
time = np.arange(n)
X = sm.add_constant(time)

# True relationship with autocorrelated errors
true_beta = [5, 0.3]
autocorr_errors = np.zeros(n)
autocorr_errors[0] = np.random.normal(0, 1)
for t in range(1, n):
    autocorr_errors[t] = 0.7 * autocorr_errors[t-1] + np.random.normal(0, 1)

y = X @ true_beta + autocorr_errors

# Fit OLS model
model = sm.OLS(y, X)
results = model.fit()

# Calculate Durbin-Watson statistic
dw_stat = durbin_watson(results.resid)
print(f"Durbin-Watson statistic: {dw_stat:.4f}")
print(f"\nOLS Summary (truncated):")
print(f"R-squared: {results.rsquared:.4f}")
print(f"Coefficient on time: {results.params[1]:.4f} (SE: {results.bse[1]:.4f})")

Durbin-Watson statistic: 0.6892

OLS Summary (truncated):
R-squared: 0.9147
Coefficient on time: 0.3102 (SE: 0.0108)

That DW of 0.69 is a red flag. The residuals are clearly autocorrelated, which means that standard error of 0.0108 is probably too small.

Interpreting the Results

The Durbin-Watson test has an unusual property: there’s an inconclusive region between the lower (dL) and upper (dU) critical values. This requires a three-zone decision rule:

from dataclasses import dataclass
from typing import Literal

@dataclass
class DurbinWatsonResult:
    statistic: float
    interpretation: Literal["positive_autocorr", "negative_autocorr", 
                            "no_autocorr", "inconclusive"]
    message: str

def interpret_durbin_watson(
    dw: float, 
    d_lower: float = 1.5, 
    d_upper: float = 1.7,
    alpha: float = 0.05
) -> DurbinWatsonResult:
    """
    Interpret the Durbin-Watson statistic using critical values.
    
    Parameters
    ----------
    dw : float
        Durbin-Watson statistic
    d_lower : float
        Lower critical value (depends on n and k)
    d_upper : float
        Upper critical value (depends on n and k)
    alpha : float
        Significance level (for message only)
    
    Returns
    -------
    DurbinWatsonResult
        Interpretation of the test result
    """
    if dw < d_lower:
        return DurbinWatsonResult(
            statistic=dw,
            interpretation="positive_autocorr",
            message=f"DW={dw:.4f} < dL={d_lower}: Reject H0. "
                    f"Evidence of positive autocorrelation at α={alpha}"
        )
    elif dw > 4 - d_lower:
        return DurbinWatsonResult(
            statistic=dw,
            interpretation="negative_autocorr",
            message=f"DW={dw:.4f} > 4-dL={4-d_lower:.2f}: Reject H0. "
                    f"Evidence of negative autocorrelation at α={alpha}"
        )
    elif d_lower <= dw <= d_upper:
        return DurbinWatsonResult(
            statistic=dw,
            interpretation="inconclusive",
            message=f"DW={dw:.4f} in [{d_lower}, {d_upper}]: "
                    f"Test is inconclusive"
        )
    elif 4 - d_upper <= dw <= 4 - d_lower:
        return DurbinWatsonResult(
            statistic=dw,
            interpretation="inconclusive",
            message=f"DW={dw:.4f} in [{4-d_upper:.2f}, {4-d_lower:.2f}]: "
                    f"Test is inconclusive"
        )
    else:
        return DurbinWatsonResult(
            statistic=dw,
            interpretation="no_autocorr",
            message=f"DW={dw:.4f}: Fail to reject H0. "
                    f"No evidence of autocorrelation at α={alpha}"
        )

# Test with our previous result
result = interpret_durbin_watson(0.6892, d_lower=1.65, d_upper=1.69)
print(result.message)

DW=0.6892 < dL=1.65: Reject H0. Evidence of positive autocorrelation at α=0.05

Critical values vary by sample size (n) and number of predictors (k). You can find tables in most econometrics textbooks, or use the statsmodels.stats.stattools.durbin_watson function with the axis parameter for panel data.

Practical Example: Time Series Regression

Let’s work through a complete example using simulated economic data. We’ll model GDP growth as a function of interest rates—a classic setup where autocorrelation often appears:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.stats.stattools import durbin_watson

# Simulate quarterly economic data
np.random.seed(456)
n_quarters = 80  # 20 years of quarterly data

# Interest rates with some persistence
interest_rate = np.zeros(n_quarters)
interest_rate[0] = 5.0
for t in range(1, n_quarters):
    interest_rate[t] = 0.9 * interest_rate[t-1] + np.random.normal(0, 0.5)

# GDP growth with autocorrelated shocks (business cycle persistence)
gdp_shocks = np.zeros(n_quarters)
gdp_shocks[0] = np.random.normal(0, 0.5)
for t in range(1, n_quarters):
    gdp_shocks[t] = 0.6 * gdp_shocks[t-1] + np.random.normal(0, 0.5)

gdp_growth = 3.0 - 0.4 * interest_rate + gdp_shocks

# Create DataFrame
df = pd.DataFrame({
    'quarter': pd.date_range('2004-01-01', periods=n_quarters, freq='Q'),
    'gdp_growth': gdp_growth,
    'interest_rate': interest_rate
})

# Fit regression
X = sm.add_constant(df['interest_rate'])
model = sm.OLS(df['gdp_growth'], X)
results = model.fit()

# Calculate and interpret DW
dw = durbin_watson(results.resid)

print("Regression Results:")
print(f"Intercept: {results.params[0]:.4f}")
print(f"Interest Rate Coefficient: {results.params[1]:.4f}")
print(f"\nDurbin-Watson: {dw:.4f}")

# Visualize residuals over time
fig, axes = plt.subplots(2, 1, figsize=(10, 6))

# Residuals over time
axes[0].plot(df['quarter'], results.resid, 'b-', linewidth=0.8)
axes[0].axhline(y=0, color='r', linestyle='--', alpha=0.7)
axes[0].set_xlabel('Quarter')
axes[0].set_ylabel('Residual')
axes[0].set_title(f'Residuals Over Time (DW = {dw:.3f})')

# Residual autocorrelation plot
axes[1].scatter(results.resid[:-1], results.resid[1:], alpha=0.6)
axes[1].set_xlabel('Residual at t')
axes[1].set_ylabel('Residual at t+1')
axes[1].set_title('Residual Lag Plot')

# Add correlation coefficient
lag_corr = np.corrcoef(results.resid[:-1], results.resid[1:])[0, 1]
axes[1].annotate(f'ρ = {lag_corr:.3f}', xy=(0.05, 0.95), 
                  xycoords='axes fraction', fontsize=11)

plt.tight_layout()
plt.savefig('residual_diagnostics.png', dpi=150)
plt.show()

The lag plot reveals the problem clearly: when one residual is positive, the next one tends to be positive too. This clustering pattern is the visual signature of positive autocorrelation.

What to Do When Autocorrelation Is Detected

Detecting autocorrelation is only half the battle. Here’s how to address it:

Option 1: Newey-West (HAC) Standard Errors

This approach keeps your coefficient estimates but corrects the standard errors to account for autocorrelation:

from statsmodels.stats.stattools import durbin_watson

# Original OLS results
print("Original OLS Standard Errors:")
print(f"Interest Rate SE: {results.bse[1]:.4f}")
print(f"t-statistic: {results.tvalues[1]:.4f}")

# Newey-West corrected standard errors
# Use maxlags based on sample size (common rule: int(4*(n/100)^(2/9)))
maxlags = int(4 * (len(df) / 100) ** (2/9))
results_hac = results.get_robustcov_results(cov_type='HAC', maxlags=maxlags)

print(f"\nNewey-West HAC Standard Errors (maxlags={maxlags}):")
print(f"Interest Rate SE: {results_hac.bse[1]:.4f}")
print(f"t-statistic: {results_hac.tvalues[1]:.4f}")

# Compare confidence intervals
print(f"\nOriginal 95% CI: [{results.conf_int().iloc[1, 0]:.4f}, "
      f"{results.conf_int().iloc[1, 1]:.4f}]")
print(f"HAC 95% CI: [{results_hac.conf_int().iloc[1, 0]:.4f}, "
      f"{results_hac.conf_int().iloc[1, 1]:.4f}]")

Original OLS Standard Errors:
Interest Rate SE: 0.0423
t-statistic: -8.9217

Newey-West HAC Standard Errors (maxlags=3):
Interest Rate SE: 0.0891
t-statistic: -4.2342

Original 95% CI: [-0.4620, -0.2931]
HAC 95% CI: [-0.5556, -0.1995]

Notice how the HAC standard errors are roughly twice as large. The original analysis would have overstated our confidence in the interest rate effect.

Option 2: Generalized Least Squares (GLS)

When you have a specific autocorrelation structure in mind, GLS can provide more efficient estimates:

# Feasible GLS with AR(1) errors
from statsmodels.regression.linear_model import GLSAR

model_gls = GLSAR(df['gdp_growth'], X, rho=1)  # rho=1 means estimate AR(1)
results_gls = model_gls.iterative_fit(maxiter=10)

print("FGLS with AR(1) errors:")
print(f"Interest Rate Coefficient: {results_gls.params[1]:.4f}")
print(f"Standard Error: {results_gls.bse[1]:.4f}")
print(f"Estimated ρ: {model_gls.rho:.4f}")

Conclusion

The Durbin-Watson statistic is a straightforward diagnostic that should be part of every time series regression analysis. The key points to remember:

Always check it after fitting a regression to time-ordered data. A DW far from 2 signals trouble.
Use statsmodels for production code—durbin_watson(results.resid) gives you the answer in one line.
Act on the results. When autocorrelation is present, use HAC standard errors or GLS to ensure valid inference.
Know its limitations. The DW test only detects first-order autocorrelation. For higher-order patterns, consider the Breusch-Godfrey test or examining the autocorrelation function of residuals directly.

Autocorrelation doesn’t invalidate your model—it just means you need to be more careful about how you compute standard errors and confidence intervals. The tools exist; use them.