How to Perform Granger Causality Test for Time Series in Python

Key Insights

Granger causality tests whether past values of one time series help predict another—it’s about predictive power, not true causation, so naming your variables carefully matters for interpretation.
Both time series must be stationary before testing; failing to check stationarity with ADF tests will produce misleading results that suggest relationships where none exist.
The test is sensitive to lag selection—always test multiple lag values and use information criteria (AIC/BIC) to avoid both underfitting and overfitting your temporal relationships.

Introduction to Granger Causality

Granger causality is a statistical hypothesis test that determines whether one time series can predict another. Developed by Nobel laureate Clive Granger, the test asks: “Does including past values of variable X improve the prediction of variable Y beyond what we could achieve using only past values of Y?”

Despite its name, Granger causality doesn’t establish true causation. It identifies predictive relationships. If X “Granger-causes” Y, it means X’s historical values contain information useful for forecasting Y. This distinction matters in practice—finding that ice cream sales Granger-cause drowning deaths doesn’t mean ice cream is dangerous; both are likely driven by temperature.

Use Granger causality when you need to understand temporal precedence between variables: Does consumer sentiment predict stock prices? Does advertising spend precede sales increases? Does one sensor’s readings forecast another’s in industrial systems?

Prerequisites and Setup

You’ll need statsmodels for the Granger test, pandas for data manipulation, and matplotlib or seaborn for visualization. Install these if you haven’t already:

pip install statsmodels pandas numpy matplotlib seaborn

Let’s work with a practical example using economic data—specifically, examining whether changes in unemployment rates help predict consumer spending patterns:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.stattools import grangercausalitytests, adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import warnings
warnings.filterwarnings('ignore')

# Load sample data (using FRED economic data as example)
# In practice, load your own CSV or API data
np.random.seed(42)
dates = pd.date_range(start='2010-01-01', end='2023-12-31', freq='M')

# Simulating realistic economic data with correlation
unemployment = np.cumsum(np.random.randn(len(dates))) * 0.5 + 6
consumer_spending = np.cumsum(np.random.randn(len(dates))) * 2 + 100
# Add lagged relationship
consumer_spending[3:] -= 0.3 * unemployment[:-3]

df = pd.DataFrame({
    'unemployment': unemployment,
    'consumer_spending': consumer_spending
}, index=dates)

print(df.head())
print(f"\nDataset shape: {df.shape}")

Data Preparation and Stationarity Testing

Granger causality requires stationary time series—data where statistical properties like mean and variance remain constant over time. Non-stationary data produces spurious results.

Test for stationarity using the Augmented Dickey-Fuller (ADF) test. The null hypothesis is that the series has a unit root (non-stationary). A p-value below 0.05 indicates stationarity:

def check_stationarity(series, name=''):
    """Perform ADF test and print results"""
    result = adfuller(series.dropna())
    print(f'\n{name} ADF Test Results:')
    print(f'ADF Statistic: {result[0]:.4f}')
    print(f'p-value: {result[1]:.4f}')
    print(f'Critical Values:')
    for key, value in result[4].items():
        print(f'   {key}: {value:.3f}')
    
    if result[1] <= 0.05:
        print(f"=> {name} is stationary (reject null hypothesis)")
        return True
    else:
        print(f"=> {name} is non-stationary (fail to reject null hypothesis)")
        return False

# Test original series
unemployment_stationary = check_stationarity(df['unemployment'], 'Unemployment')
spending_stationary = check_stationarity(df['consumer_spending'], 'Consumer Spending')

If either series is non-stationary, apply differencing. First-order differencing usually suffices:

def make_stationary(df, columns):
    """Apply differencing to make series stationary"""
    df_stationary = df.copy()
    
    for col in columns:
        if not check_stationarity(df[col], col):
            # Apply first-order differencing
            df_stationary[f'{col}_diff'] = df[col].diff()
            print(f"\nApplied differencing to {col}")
            check_stationarity(df_stationary[f'{col}_diff'].dropna(), 
                             f'{col}_diff')
    
    return df_stationary

# Make series stationary
df_stationary = make_stationary(df, ['unemployment', 'consumer_spending'])

# Use differenced series for Granger test
test_data = df_stationary[['unemployment_diff', 'consumer_spending_diff']].dropna()

Performing the Granger Causality Test

The grangercausalitytests function tests whether one variable Granger-causes another at various lag lengths. The lag parameter determines how many historical periods to include:

def perform_granger_test(data, x_col, y_col, max_lag=12):
    """
    Test if x_col Granger-causes y_col
    Returns p-values for each lag
    """
    print(f"\n{'='*60}")
    print(f"Testing: Does '{x_col}' Granger-cause '{y_col}'?")
    print(f"{'='*60}")
    
    # Prepare data (y_col first, x_col second for statsmodels)
    test_df = data[[y_col, x_col]].dropna()
    
    # Perform test
    results = grangercausalitytests(test_df, max_lag, verbose=False)
    
    # Extract p-values
    p_values = []
    for lag in range(1, max_lag + 1):
        # Using F-test p-value (first test in the results)
        p_value = results[lag][0]['ssr_ftest'][1]
        p_values.append(p_value)
        
        significance = "***" if p_value < 0.01 else "**" if p_value < 0.05 else "*" if p_value < 0.1 else ""
        print(f"Lag {lag:2d}: p-value = {p_value:.4f} {significance}")
    
    return p_values

# Test both directions
p_values_unemp_to_spending = perform_granger_test(
    test_data, 
    'unemployment_diff', 
    'consumer_spending_diff',
    max_lag=8
)

p_values_spending_to_unemp = perform_granger_test(
    test_data,
    'consumer_spending_diff',
    'unemployment_diff',
    max_lag=8
)

Interpret p-values conservatively. Values below 0.05 suggest Granger causality at that lag. However, test multiple lags—relationships might exist at 3-month lags but not 1-month lags.

Visualizing Results

Visual analysis helps identify patterns and validate statistical findings:

# Plot 1: Time series comparison
fig, axes = plt.subplots(2, 1, figsize=(12, 8))

axes[0].plot(df.index, df['unemployment'], label='Unemployment Rate', color='red')
axes[0].set_ylabel('Unemployment Rate (%)')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

axes[1].plot(df.index, df['consumer_spending'], label='Consumer Spending', color='blue')
axes[1].set_ylabel('Consumer Spending Index')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('time_series_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

# Plot 2: P-value heatmap
fig, ax = plt.subplots(figsize=(10, 6))

p_value_matrix = pd.DataFrame({
    'Unemployment → Spending': p_values_unemp_to_spending,
    'Spending → Unemployment': p_values_spending_to_unemp
}, index=range(1, 9))

sns.heatmap(p_value_matrix.T, annot=True, fmt='.3f', cmap='RdYlGn_r', 
            center=0.05, vmin=0, vmax=0.2, ax=ax, cbar_kws={'label': 'p-value'})
ax.set_xlabel('Lag (months)')
ax.set_title('Granger Causality Test P-Values\n(Green = Significant)')
plt.tight_layout()
plt.savefig('granger_pvalue_heatmap.png', dpi=300, bbox_inches='tight')
plt.show()

Practical Example: Stock Market Application

Here’s a complete workflow analyzing whether trading volume predicts stock price movements:

# Simulate stock data
np.random.seed(123)
dates = pd.date_range(start='2020-01-01', end='2023-12-31', freq='D')

price_returns = np.random.randn(len(dates)) * 0.02
volume = np.random.randn(len(dates)) * 1000000 + 5000000
# Add predictive relationship: high volume 2 days ago predicts returns
price_returns[2:] += 0.15 * (volume[:-2] - volume.mean()) / volume.std()

stock_df = pd.DataFrame({
    'returns': price_returns,
    'volume': volume
}, index=dates)

# Check stationarity (returns are typically stationary)
returns_stat = check_stationarity(stock_df['returns'], 'Returns')
volume_stat = check_stationarity(stock_df['volume'], 'Volume')

# If volume is non-stationary, difference it
if not volume_stat:
    stock_df['volume_diff'] = stock_df['volume'].diff()
    volume_col = 'volume_diff'
else:
    volume_col = 'volume'

# Perform Granger test
analysis_df = stock_df[['returns', volume_col]].dropna()
granger_results = perform_granger_test(
    analysis_df,
    volume_col,
    'returns',
    max_lag=5
)

# Interpretation
significant_lags = [i+1 for i, p in enumerate(granger_results) if p < 0.05]
if significant_lags:
    print(f"\n✓ Volume Granger-causes returns at lags: {significant_lags}")
    print(f"  Interpretation: Trading volume from {min(significant_lags)}-{max(significant_lags)} days ago")
    print(f"  helps predict today's returns.")
else:
    print("\n✗ No significant Granger causality found")

Limitations and Best Practices

Critical Limitations:

Not true causation: Granger causality identifies prediction, not mechanism. Always combine with domain knowledge.
Stationarity requirement: Non-stationary data creates false positives. Always test and transform.
Linear relationships only: The test assumes linear relationships. Non-linear dynamics require different approaches.
Lag selection sensitivity: Too few lags miss relationships; too many introduce noise. Use AIC/BIC criteria or test multiple values.
Sample size matters: Small datasets (< 50 observations) produce unreliable results.

Best Practices:

Test both directions (X→Y and Y→X) to understand bidirectional relationships
Use multiple lag values and look for consistent patterns
Validate findings with out-of-sample data
Consider confounding variables—use multivariate Granger tests when appropriate
Document your stationarity transformations clearly

When Not to Use Granger Causality:

When you need to prove causation (use experimental designs or causal inference methods)
With high-frequency data showing non-linear dynamics (consider transfer entropy)
When contemporaneous relationships matter more than lagged ones (use correlation or regression)
With very short time series (< 30 observations)

Alternatives: Consider VAR models for multivariate systems, transfer entropy for non-linear relationships, or cross-correlation functions for exploratory analysis.

Granger causality remains valuable for understanding temporal precedence in time series data. Used correctly with proper stationarity checks and lag selection, it reveals which variables contain predictive information about others—a crucial insight for forecasting, trading strategies, and understanding complex systems.