How to Perform the Augmented Dickey-Fuller Test in Python

Key Insights

The ADF test checks whether your time series has a unit root (non-stationary); rejecting the null hypothesis (p-value < 0.05) means your data is stationary and safe for modeling
Always combine visual inspection with statistical testing—plot rolling statistics before running the test, and consider using KPSS alongside ADF for confirmation
Choose your regression parameter carefully: use 'c' for most economic data, 'ct' when you suspect a deterministic trend, and let autolag='AIC' handle lag selection

Introduction to Stationarity and Why It Matters

Stationarity is the foundation of time series analysis. A stationary series has statistical properties—mean, variance, and autocorrelation—that remain constant over time. The data fluctuates around a fixed level without trending upward or downward indefinitely.

Why should you care? Non-stationary data breaks your models. When you regress one non-stationary series on another, you get spurious regression: high R-squared values and significant coefficients that mean absolutely nothing. Your ARIMA model will produce garbage forecasts. Your confidence intervals will be wrong.

Before fitting ARIMA, VAR, or any autoregressive model, you need to verify stationarity. The Augmented Dickey-Fuller (ADF) test is the standard tool for this job. It’s not perfect, but it’s widely understood, well-implemented in Python, and gets the job done in most practical scenarios.

What the Augmented Dickey-Fuller Test Does

The ADF test is a hypothesis test with a specific structure:

Null hypothesis (H₀): The series has a unit root (it’s non-stationary)
Alternative hypothesis (H₁): The series is stationary

This framing matters. You’re trying to reject the null hypothesis. A low p-value (typically < 0.05) gives you evidence that the series is stationary. A high p-value means you fail to reject the null—the series might be non-stationary.

The “augmented” part refers to how the test handles autocorrelation. The basic Dickey-Fuller test assumes errors are white noise, which is unrealistic for most real data. The ADF test adds lagged difference terms to the regression equation, soaking up the autocorrelation and producing valid test statistics.

The test outputs a test statistic that you compare against critical values. More negative test statistics provide stronger evidence against the null hypothesis. If your test statistic is more negative than the critical value at your chosen significance level, you reject the null.

Running the ADF Test with statsmodels

The statsmodels library provides the adfuller() function. Here’s the basic usage:

import numpy as np
import pandas as pd
from statsmodels.tsa.stattools import adfuller

# Generate sample stationary data
np.random.seed(42)
stationary_data = np.random.randn(500)

# Run the ADF test
result = adfuller(stationary_data)

print(f'ADF Statistic: {result[0]}')
print(f'p-value: {result[1]}')
print(f'Lags Used: {result[2]}')
print(f'Number of Observations: {result[3]}')
print('Critical Values:')
for key, value in result[4].items():
    print(f'   {key}: {value}')

The function returns a tuple with several values:

result[0]: The ADF test statistic
result[1]: The p-value
result[2]: Number of lags used in the regression
result[3]: Number of observations used
result[4]: Dictionary of critical values at 1%, 5%, and 10% significance levels

For cleaner code, wrap this in a helper function:

def adf_test(series, name='Series'):
    """
    Perform ADF test and print formatted results.
    
    Returns True if series is stationary (p < 0.05), False otherwise.
    """
    result = adfuller(series, autolag='AIC')
    
    print(f'ADF Test Results for {name}')
    print('=' * 50)
    print(f'Test Statistic:    {result[0]:.4f}')
    print(f'p-value:           {result[1]:.4f}')
    print(f'Lags Used:         {result[2]}')
    print(f'Observations:      {result[3]}')
    print('Critical Values:')
    for key, value in result[4].items():
        print(f'   {key}: {value:.4f}')
    
    is_stationary = result[1] < 0.05
    print(f'\nConclusion: {"Stationary" if is_stationary else "Non-stationary"} (α=0.05)')
    
    return is_stationary

Choosing Test Parameters

The adfuller() function has two parameters you need to understand: regression and autolag.

Regression Options

The regression parameter controls what deterministic terms are included:

'c' (default): Constant only. Use this for most economic and financial data.
'ct': Constant and linear trend. Use when you suspect a deterministic trend.
'ctt': Constant, linear trend, and quadratic trend. Rarely needed.
'n': No constant or trend. Use only when you’re certain the series has zero mean.

Lag Selection

The autolag parameter determines how many lagged differences to include:

'AIC' (default): Minimizes Akaike Information Criterion
'BIC': Minimizes Bayesian Information Criterion (tends to select fewer lags)
't-stat': Starts with maxlag and removes lags until the last one is significant
None: Uses maxlag directly

Here’s how different settings affect results:

# Generate non-stationary data (random walk)
np.random.seed(42)
random_walk = np.cumsum(np.random.randn(500))

# Compare different regression settings
print("Testing Random Walk with Different Settings\n")

for regression in ['c', 'ct', 'n']:
    result = adfuller(random_walk, regression=regression, autolag='AIC')
    print(f"Regression='{regression}': Statistic={result[0]:.3f}, p-value={result[1]:.3f}")

print("\nComparing lag selection methods:\n")

for autolag in ['AIC', 'BIC', 't-stat']:
    result = adfuller(random_walk, regression='c', autolag=autolag)
    print(f"Autolag='{autolag}': Lags={result[2]}, Statistic={result[0]:.3f}")

In practice, stick with regression='c' and autolag='AIC' unless you have specific reasons to change them. If your data clearly trends upward or downward over time, use 'ct'.

Practical Example: Testing Real-World Data

Let’s work through a complete example using stock price data:

import matplotlib.pyplot as plt
import yfinance as yf

# Download Apple stock data
data = yf.download('AAPL', start='2020-01-01', end='2024-01-01', progress=False)
prices = data['Close']

# Visualize with rolling statistics
def plot_rolling_stats(series, window=30, title=''):
    """Plot series with rolling mean and standard deviation."""
    rolling_mean = series.rolling(window=window).mean()
    rolling_std = series.rolling(window=window).std()
    
    fig, axes = plt.subplots(2, 1, figsize=(12, 8))
    
    axes[0].plot(series, label='Original')
    axes[0].plot(rolling_mean, label=f'Rolling Mean ({window}d)', color='red')
    axes[0].set_title(f'{title} - Price and Rolling Mean')
    axes[0].legend()
    
    axes[1].plot(rolling_std, label=f'Rolling Std ({window}d)', color='orange')
    axes[1].set_title('Rolling Standard Deviation')
    axes[1].legend()
    
    plt.tight_layout()
    plt.show()

# Visualize original prices
plot_rolling_stats(prices, title='AAPL Stock Price')

# Test original prices
print("Testing Original Price Series:")
adf_test(prices.dropna(), 'AAPL Prices')

Stock prices are almost always non-stationary. The rolling mean will trend with the price, and the ADF test will fail to reject the null hypothesis.

Now let’s difference the series and re-test:

# First difference (returns)
returns = prices.diff().dropna()

# Visualize differenced series
plot_rolling_stats(returns, title='AAPL Daily Returns')

# Test differenced series
print("\nTesting Differenced Series (Returns):")
adf_test(returns, 'AAPL Returns')

The differenced series (daily returns) should be stationary. The rolling mean will hover around zero, and the ADF test should reject the null hypothesis with a very low p-value.

Common Pitfalls and Best Practices

Structural Breaks

The ADF test assumes the data-generating process is consistent throughout the sample. If your series contains structural breaks—sudden shifts in mean or trend due to policy changes, crises, or regime shifts—the test loses power. You might fail to reject the null even when the series is stationary within each regime.

Consider using the Zivot-Andrews test or splitting your data at known break points.

Sample Size

The ADF test needs sufficient data to work reliably. With fewer than 50-100 observations, the test has low power and wide confidence intervals. For very short series, visual inspection and domain knowledge become more important than formal testing.

Complementary Tests

Never rely on a single test. The ADF test has known weaknesses, particularly low power against alternatives close to a unit root. The KPSS test provides a useful complement because it flips the hypotheses:

KPSS null hypothesis: The series is stationary
KPSS alternative: The series has a unit root

Running both tests gives you four possible outcomes:

from statsmodels.tsa.stattools import kpss

def stationarity_tests(series, name='Series'):
    """Run both ADF and KPSS tests for robust stationarity checking."""
    
    # ADF Test
    adf_result = adfuller(series, autolag='AIC')
    adf_stationary = adf_result[1] < 0.05
    
    # KPSS Test (using 'c' for level stationarity)
    kpss_result = kpss(series, regression='c', nlags='auto')
    kpss_stationary = kpss_result[1] > 0.05  # Note: opposite interpretation
    
    print(f'Stationarity Tests for {name}')
    print('=' * 50)
    print(f'ADF:  Statistic={adf_result[0]:.4f}, p-value={adf_result[1]:.4f}')
    print(f'KPSS: Statistic={kpss_result[0]:.4f}, p-value={kpss_result[1]:.4f}')
    print()
    
    if adf_stationary and kpss_stationary:
        print('Conclusion: Series is STATIONARY (both tests agree)')
    elif not adf_stationary and not kpss_stationary:
        print('Conclusion: Series is NON-STATIONARY (both tests agree)')
    elif adf_stationary and not kpss_stationary:
        print('Conclusion: TREND-STATIONARY (difference the series)')
    else:
        print('Conclusion: DIFFERENCE-STATIONARY (difference the series)')

# Test both original and differenced series
stationarity_tests(prices.dropna(), 'AAPL Prices')
print()
stationarity_tests(returns, 'AAPL Returns')

When both tests agree, you have strong evidence. When they disagree, you likely need differencing or detrending.

Conclusion

Testing for stationarity follows a straightforward workflow:

Visualize your series with rolling mean and standard deviation
Test using adfuller() with appropriate parameters
Transform if necessary (differencing, log transformation)
Re-test to confirm stationarity
Validate with KPSS for robustness

If your series passes the stationarity tests after transformation, proceed with modeling. If the tests give conflicting results or you suspect structural breaks, investigate further before building your model. A few extra hours spent understanding your data’s properties will save you from building models that produce meaningless results.