How to Plot the Autocorrelation Function (ACF) in Python

Autocorrelation measures the correlation between a time series and lagged versions of itself. If your data at time t correlates strongly with data at time t-1, t-2, or t-k, you have autocorrelation...

Key Insights

  • Autocorrelation measures how a time series correlates with lagged versions of itself, revealing patterns like seasonality, trends, and mean reversion that are invisible in standard plots.
  • The statsmodels library provides the fastest path to ACF plotting, but understanding manual calculation with pandas gives you flexibility for custom analysis and debugging.
  • ACF patterns tell specific stories: exponential decay indicates autoregressive behavior, sharp cutoffs suggest moving average processes, and slow decay warns of non-stationarity requiring differencing.

Introduction to Autocorrelation

Autocorrelation measures the correlation between a time series and lagged versions of itself. If your data at time t correlates strongly with data at time t-1, t-2, or t-k, you have autocorrelation at those lags. This isn’t just an academic curiosity—it’s fundamental to time series analysis.

Why does this matter? First, autocorrelation reveals hidden patterns in your data. A dataset that looks like noise might have strong weekly seasonality that only becomes apparent in the ACF plot. Second, it guides model selection. ARIMA models, for instance, require specific ACF patterns to determine the order of autoregressive and moving average components. Third, it validates assumptions. Many statistical tests assume independence between observations; autocorrelation tells you when that assumption fails.

The autocorrelation function (ACF) plots these correlations across different lags, giving you a visual signature of your time series behavior.

Understanding the ACF Plot

An ACF plot has a simple structure but rich information. The x-axis shows the lag—how many time steps back you’re comparing. The y-axis shows the correlation coefficient, ranging from -1 to 1. A value of 1 means perfect positive correlation; -1 means perfect negative correlation; 0 means no correlation.

The blue shaded region represents the confidence interval, typically set at 95%. Correlations outside this band are statistically significant. If all your autocorrelations fall within the confidence bounds, your series is likely white noise—no exploitable patterns.

Key patterns to recognize: A slow, gradual decay suggests non-stationarity or a strong trend. Sharp drops to zero indicate a moving average process. Sinusoidal waves point to seasonality. Exponential decay characterizes autoregressive processes. Learning to read these patterns transforms ACF from a diagnostic tool into a decision-making instrument.

Using statsmodels for ACF Plotting

The statsmodels library provides the most straightforward approach to ACF plotting. Here’s how to use it:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf

# Generate synthetic time series with autocorrelation
np.random.seed(42)
n = 200
ar_coef = 0.7
data = np.zeros(n)
data[0] = np.random.normal()

for t in range(1, n):
    data[t] = ar_coef * data[t-1] + np.random.normal()

# Create ACF plot
fig, ax = plt.subplots(figsize=(10, 5))
plot_acf(data, lags=40, ax=ax, alpha=0.05)
ax.set_title('Autocorrelation Function', fontsize=14)
ax.set_xlabel('Lag', fontsize=12)
ax.set_ylabel('Autocorrelation', fontsize=12)
plt.tight_layout()
plt.show()

The lags parameter controls how many lags to display. For daily data, you might use 30-40 lags; for hourly data, perhaps 168 (one week). The alpha parameter sets the confidence interval—0.05 gives you 95% confidence bounds.

You can customize the plot appearance:

# Customize ACF plot
fig, ax = plt.subplots(figsize=(12, 6))
plot_acf(data, lags=40, ax=ax, alpha=0.05, 
         bartlett_confint=True, missing='drop')
ax.grid(True, alpha=0.3)
ax.set_title('ACF with Bartlett Confidence Intervals', fontsize=14, fontweight='bold')
plt.show()

The bartlett_confint=True option uses Bartlett’s formula for confidence intervals, which is more conservative than the default. The missing='drop' parameter handles missing values by removing them.

Manual ACF Calculation and Plotting with Pandas/Matplotlib

Understanding the mechanics behind ACF helps when you need custom implementations or want to debug unexpected results:

import pandas as pd
import matplotlib.pyplot as plt

# Convert data to pandas Series
ts = pd.Series(data)

# Calculate autocorrelation manually for different lags
max_lag = 40
acf_values = [ts.autocorr(lag=i) for i in range(max_lag + 1)]

# Calculate confidence interval
conf_interval = 1.96 / np.sqrt(len(ts))

# Create custom ACF plot
fig, ax = plt.subplots(figsize=(12, 6))
ax.stem(range(max_lag + 1), acf_values, basefmt=' ')
ax.axhline(y=0, color='black', linewidth=0.8)
ax.axhline(y=conf_interval, color='blue', linestyle='--', linewidth=1, alpha=0.5)
ax.axhline(y=-conf_interval, color='blue', linestyle='--', linewidth=1, alpha=0.5)
ax.fill_between(range(max_lag + 1), -conf_interval, conf_interval, alpha=0.2)
ax.set_xlabel('Lag', fontsize=12)
ax.set_ylabel('Autocorrelation', fontsize=12)
ax.set_title('Manual ACF Calculation', fontsize=14)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

This manual approach gives you complete control over the visualization and helps you understand that ACF is just correlation calculated at different lags.

Real-World Example: Stock Price Analysis

Let’s apply ACF to actual financial data to detect patterns:

import yfinance as yf

# Download stock data
ticker = yf.Ticker("AAPL")
stock_data = ticker.history(period="1y")
returns = stock_data['Close'].pct_change().dropna()

# Plot both price and returns ACF
fig, axes = plt.subplots(2, 1, figsize=(12, 10))

# ACF of price
plot_acf(stock_data['Close'].dropna(), lags=40, ax=axes[0], alpha=0.05)
axes[0].set_title('ACF of AAPL Price', fontsize=14, fontweight='bold')

# ACF of returns
plot_acf(returns, lags=40, ax=axes[1], alpha=0.05)
axes[1].set_title('ACF of AAPL Daily Returns', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

Stock prices typically show strong autocorrelation with slow decay—a sign of non-stationarity. Returns, however, often show little to no autocorrelation, appearing as white noise. If you see significant autocorrelation in returns, you’ve potentially found a tradable pattern (though this is rare in efficient markets).

Common Patterns and Interpretation

Different time series processes produce distinctive ACF signatures. Here’s how to generate and recognize them:

np.random.seed(42)
n = 300

# AR(1) process - exponential decay
ar1 = np.zeros(n)
ar1[0] = np.random.normal()
for t in range(1, n):
    ar1[t] = 0.8 * ar1[t-1] + np.random.normal()

# MA(1) process - sharp cutoff
ma1 = np.random.normal(size=n)
for t in range(1, n):
    ma1[t] = ma1[t] + 0.8 * np.random.normal()

# Seasonal process - sinusoidal pattern
seasonal = np.sin(np.arange(n) * 2 * np.pi / 12) + np.random.normal(0, 0.3, n)

# Non-stationary - slow decay
non_stationary = np.cumsum(np.random.normal(size=n))

# Plot all ACFs
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
processes = [
    (ar1, 'AR(1) Process - Exponential Decay'),
    (ma1, 'MA(1) Process - Sharp Cutoff'),
    (seasonal, 'Seasonal Process - Sinusoidal'),
    (non_stationary, 'Non-Stationary - Slow Decay')
]

for ax, (data, title) in zip(axes.flat, processes):
    plot_acf(data, lags=30, ax=ax, alpha=0.05)
    ax.set_title(title, fontsize=12, fontweight='bold')

plt.tight_layout()
plt.show()

AR processes show exponential decay. MA processes cut off sharply after q lags (where q is the MA order). Seasonal data shows spikes at seasonal lags. Non-stationary series decay very slowly, staying significant for many lags.

Best Practices and Troubleshooting

Always check for stationarity before interpreting ACF. Non-stationary series produce misleading ACF plots. Differencing often solves this:

# Compare ACF before and after differencing
fig, axes = plt.subplots(2, 1, figsize=(12, 10))

# Original non-stationary series
plot_acf(non_stationary, lags=40, ax=axes[0], alpha=0.05)
axes[0].set_title('ACF Before Differencing (Non-Stationary)', fontsize=12, fontweight='bold')

# After first-order differencing
differenced = np.diff(non_stationary)
plot_acf(differenced, lags=40, ax=axes[1], alpha=0.05)
axes[1].set_title('ACF After Differencing (Stationary)', fontsize=12, fontweight='bold')

plt.tight_layout()
plt.show()

For lag selection, use domain knowledge. Daily data might need 30-40 lags to capture monthly patterns. Hourly data needs 168 lags for weekly seasonality. Don’t blindly use default values.

When ACF shows significant autocorrelation but you need independent observations, consider using the Partial Autocorrelation Function (PACF) instead. PACF removes indirect correlations, showing only direct relationships at each lag.

Handle missing data carefully. The missing='drop' parameter removes gaps, but this can create artificial patterns if your missing data isn’t random. Sometimes it’s better to impute values or work with the gaps explicitly.

The ACF is your first diagnostic tool when approaching any time series problem. Master its interpretation, and you’ll make better modeling decisions, catch data quality issues earlier, and understand your temporal data at a deeper level.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.