How to Perform Cointegration Test in Python

Key Insights

Cointegration identifies pairs of non-stationary time series that maintain a stable long-term relationship, making it essential for pairs trading strategies and avoiding spurious regression in econometric models.
Always test for stationarity first using the ADF test—cointegration only applies to non-stationary series integrated of the same order, typically I(1) variables.
The Engle-Granger method works for two-variable systems while the Johansen test handles multiple series, with both available in Python’s statsmodels library for production-ready implementations.

Introduction to Cointegration

Cointegration is a statistical property of time series data that reveals when two or more non-stationary variables share a stable, long-term equilibrium relationship. While correlation measures how variables move together in the short term, cointegration tells you whether they’re bound by an economic or fundamental relationship that persists over time.

This matters enormously in quantitative finance and econometrics. In pairs trading, hedge funds exploit cointegrated stock pairs by betting on mean reversion when the spread widens. Economic modelers use cointegration to identify valid long-run relationships between variables like GDP and consumption. Portfolio managers apply it to construct hedged positions that reduce systematic risk.

The key distinction from correlation: two random walks can be highly correlated by chance, but cointegrated series have a statistically significant relationship that pulls them back together after temporary divergences. This prevents spurious regression—the dangerous phenomenon where unrelated non-stationary variables appear significantly related purely due to shared trends.

Understanding the Theory

Time series are either stationary or non-stationary. Stationary series have constant mean, variance, and autocovariance over time. Non-stationary series exhibit trends, changing variance, or other time-dependent statistical properties. Most financial and economic data is non-stationary—stock prices follow random walks, GDP grows over time, inflation rates drift.

The spurious regression problem occurs when you regress one non-stationary series on another. Even if they’re completely unrelated, you’ll often get high R-squared values and significant t-statistics. This is statistical noise masquerading as signal.

Cointegration solves this. If two I(1) series (integrated of order 1, meaning they become stationary after first differencing) are cointegrated, a linear combination of them is I(0) (stationary). Mathematically, if y_t and x_t are both I(1), but (y_t - βx_t) is I(0), they’re cointegrated with cointegrating coefficient β.

Let’s visualize the difference:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(42)
n = 500

# Two cointegrated series
error = np.random.normal(0, 1, n)
x = np.cumsum(np.random.normal(0, 1, n))  # Random walk
y = 2 * x + error  # Cointegrated with x

# Two non-cointegrated random walks
rw1 = np.cumsum(np.random.normal(0, 1, n))
rw2 = np.cumsum(np.random.normal(0, 1, n))

fig, axes = plt.subplots(2, 2, figsize=(14, 8))

# Cointegrated series
axes[0, 0].plot(x, label='X')
axes[0, 0].plot(y, label='Y = 2X + error')
axes[0, 0].set_title('Cointegrated Series')
axes[0, 0].legend()

# Spread is stationary
axes[0, 1].plot(y - 2*x)
axes[0, 1].set_title('Spread (Stationary)')
axes[0, 1].axhline(y=0, color='r', linestyle='--')

# Non-cointegrated series
axes[1, 0].plot(rw1, label='RW1')
axes[1, 0].plot(rw2, label='RW2')
axes[1, 0].set_title('Non-Cointegrated Series')
axes[1, 0].legend()

# Spread is non-stationary
axes[1, 1].plot(rw1 - rw2)
axes[1, 1].set_title('Spread (Non-Stationary)')

plt.tight_layout()
plt.show()

Notice how the cointegrated series move together with a stable spread, while the random walks diverge indefinitely.

Testing for Stationarity (Prerequisite)

Before testing cointegration, verify that your series are non-stationary and integrated of the same order. The Augmented Dickey-Fuller (ADF) test is the standard approach. It tests the null hypothesis that a unit root is present (non-stationary).

from statsmodels.tsa.stattools import adfuller
import yfinance as yf

# Download real financial data
data = yf.download(['GLD', 'GDX'], start='2015-01-01', end='2023-12-31')['Adj Close']

def adf_test(series, name=''):
    result = adfuller(series.dropna(), autolag='AIC')
    print(f'\n{name} ADF Test Results:')
    print(f'ADF Statistic: {result[0]:.4f}')
    print(f'p-value: {result[1]:.4f}')
    print(f'Critical Values:')
    for key, value in result[4].items():
        print(f'  {key}: {value:.3f}')
    
    if result[1] <= 0.05:
        print("Result: Reject null hypothesis - Series is stationary")
    else:
        print("Result: Fail to reject null - Series is non-stationary")
    
    return result[1]

# Test both series
adf_test(data['GLD'], 'GLD (Gold ETF)')
adf_test(data['GDX'], 'GDX (Gold Miners ETF)')

For most price series, you’ll fail to reject the null (p-value > 0.05), confirming non-stationarity. This is expected and required for cointegration testing.

Engle-Granger Two-Step Method

The Engle-Granger approach is straightforward for two variables:

Regress y on x to estimate the cointegrating relationship
Test the residuals for stationarity using ADF

If residuals are stationary, the series are cointegrated.

from statsmodels.regression.linear_model import OLS
from statsmodels.tsa.stattools import coint

# Manual implementation
model = OLS(data['GDX'], data['GLD']).fit()
residuals = model.resid

print(f'Cointegrating coefficient: {model.params[0]:.4f}')
adf_test(residuals, 'Residuals')

# Using statsmodels built-in function
score, pvalue, _ = coint(data['GLD'], data['GDX'])
print(f'\nEngle-Granger Test:')
print(f'Test Statistic: {score:.4f}')
print(f'p-value: {pvalue:.4f}')

if pvalue < 0.05:
    print('Series are cointegrated')
else:
    print('No evidence of cointegration')

The coint() function is production-ready and handles the testing automatically. A p-value below 0.05 indicates cointegration at the 5% significance level.

Johansen Test for Multiple Series

When analyzing more than two series, the Johansen test is superior. It can identify multiple cointegrating relationships in a system and doesn’t require arbitrary choice of dependent variable.

The test produces two statistics:

Trace statistic: Tests whether the number of cointegrating vectors is r or fewer
Eigenvalue statistic: Tests whether exactly r cointegrating vectors exist

from statsmodels.tsa.vector_ar.vecm import coint_johansen

# Download multiple related stocks
tickers = ['XLE', 'XOP', 'OIH']  # Energy sector ETFs
energy_data = yf.download(tickers, start='2015-01-01', end='2023-12-31')['Adj Close']

def johansen_test(df, significance_level=0.05):
    result = coint_johansen(df, det_order=0, k_ar_diff=1)
    
    traces = result.lr1  # Trace statistics
    cvts = result.cvt    # Critical values for trace
    
    print('Johansen Cointegration Test Results:')
    print('\nTrace Statistic:')
    for i, (trace, cvt) in enumerate(zip(traces, cvts)):
        print(f'r <= {i}: {trace:.4f} | Critical value (95%): {cvt[1]:.4f}', end='')
        if trace > cvt[1]:
            print(' -> Reject null (cointegration exists)')
        else:
            print(' -> Cannot reject null')
    
    return result

johansen_result = johansen_test(energy_data.dropna())

The test output shows how many cointegrating relationships exist. For trading, even one cointegrating vector among three assets provides opportunities.

Practical Application: Pairs Trading Strategy

Here’s a complete workflow for identifying and trading cointegrated pairs:

import yfinance as yf
from scipy import stats

# Download potential pair
pair_data = yf.download(['KO', 'PEP'], start='2020-01-01', end='2023-12-31')['Adj Close']

# Test cointegration
score, pvalue, _ = coint(pair_data['KO'], pair_data['PEP'])
print(f'Cointegration p-value: {pvalue:.4f}')

if pvalue < 0.05:
    # Calculate spread
    model = OLS(pair_data['PEP'], pair_data['KO']).fit()
    hedge_ratio = model.params[0]
    spread = pair_data['PEP'] - hedge_ratio * pair_data['KO']
    
    # Calculate z-score for trading signals
    spread_mean = spread.mean()
    spread_std = spread.std()
    z_score = (spread - spread_mean) / spread_std
    
    # Generate signals
    signals = pd.DataFrame(index=pair_data.index)
    signals['spread'] = spread
    signals['z_score'] = z_score
    signals['long_entry'] = z_score < -2   # Buy spread when undervalued
    signals['short_entry'] = z_score > 2   # Sell spread when overvalued
    signals['exit'] = np.abs(z_score) < 0.5
    
    print(f'\nHedge Ratio: {hedge_ratio:.4f}')
    print(f'Mean Spread: {spread_mean:.4f}')
    print(f'Spread Std Dev: {spread_std:.4f}')
    
    # Plot trading signals
    plt.figure(figsize=(14, 7))
    plt.subplot(2, 1, 1)
    plt.plot(pair_data.index, pair_data['KO'], label='KO')
    plt.plot(pair_data.index, pair_data['PEP'], label='PEP')
    plt.legend()
    plt.title('Price Series')
    
    plt.subplot(2, 1, 2)
    plt.plot(signals.index, signals['z_score'])
    plt.axhline(2, color='r', linestyle='--', label='Short Entry')
    plt.axhline(-2, color='g', linestyle='--', label='Long Entry')
    plt.axhline(0, color='k', linestyle='-', alpha=0.3)
    plt.legend()
    plt.title('Spread Z-Score')
    plt.tight_layout()
    plt.show()

This code identifies entry points when the spread deviates significantly from its mean and exit points when it reverts.

Interpretation and Best Practices

P-value interpretation: Use 0.05 as your threshold. Values below indicate cointegration at 95% confidence. Don’t p-hack by testing hundreds of pairs and cherry-picking—this inflates false positives.

Sample size matters: You need sufficient data for reliable results. Aim for at least 100-200 observations. Daily data over 6-12 months is minimum; 2-3 years is better.

Stationarity of residuals: The ADF test on residuals should show strong stationarity (p-value well below 0.05). Marginal results (0.04-0.05) are unreliable.

Regime changes: Cointegration relationships can break down due to structural changes—mergers, regulatory shifts, technological disruption. Continuously monitor your pairs and re-test periodically.

When not to use cointegration: Don’t apply it to stationary series (use correlation instead). Don’t use it for short-term trading without understanding the fundamental relationship. And don’t assume past cointegration guarantees future mean reversion.

Rolling windows: For production systems, implement rolling cointegration tests to detect relationship breakdowns:

window = 252  # One trading year
rolling_pvalues = []

for i in range(window, len(pair_data)):
    window_data = pair_data.iloc[i-window:i]
    _, pval, _ = coint(window_data['KO'], window_data['PEP'])
    rolling_pvalues.append(pval)

# Monitor when p-value exceeds threshold
relationship_stable = pd.Series(rolling_pvalues) < 0.05

Cointegration is powerful but not magic. Use it as one tool in a broader analytical framework, always grounded in economic intuition about why two series should share a long-term relationship.