How to Plot the Partial Autocorrelation Function (PACF) in Python

The Partial Autocorrelation Function (PACF) is a fundamental tool in time series analysis that measures the direct relationship between an observation and its lag, after removing the effects of...

Key Insights

  • The Partial Autocorrelation Function (PACF) measures direct correlation between time series observations at different lags while controlling for intermediate values, making it essential for identifying the order of AR models.
  • Python’s statsmodels library provides plot_pacf() with flexible parameters for confidence intervals, lag counts, and calculation methods—understanding these options prevents misinterpretation of your time series structure.
  • Always check for stationarity before PACF analysis; non-stationary series produce misleading correlation patterns that can lead to incorrect model specifications.

Introduction to PACF

The Partial Autocorrelation Function (PACF) is a fundamental tool in time series analysis that measures the direct relationship between an observation and its lag, after removing the effects of intermediate lags. Unlike the standard autocorrelation function (ACF), which shows all correlations including indirect effects, PACF isolates the unique contribution of each lag.

This distinction matters tremendously when building autoregressive (AR) models. If you’re trying to predict tomorrow’s stock price, you need to know whether yesterday’s price adds predictive value beyond what you already know from two days ago. PACF answers exactly this question.

The primary use case for PACF is determining the order parameter (p) in AR(p) models. When you plot PACF, significant spikes at specific lags tell you which past observations directly influence the current value. After the cutoff lag, the PACF should drop to near-zero, indicating those lags don’t add unique information.

Setting Up Your Environment

You’ll need four core libraries for PACF analysis in Python. Install them if you haven’t already:

pip install statsmodels matplotlib pandas numpy

Here’s the standard import setup:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_pacf, plot_acf
from statsmodels.tsa.arima_process import ArmaProcess
from statsmodels.tsa.stattools import pacf

# Set random seed for reproducibility
np.random.seed(42)

Let’s generate a sample AR(2) process to work with. An AR(2) model means the current value depends on the two previous values:

# Define AR(2) coefficients: y_t = 0.6*y_{t-1} - 0.3*y_{t-2} + error
ar_params = np.array([1, -0.6, 0.3])  # Note: first element is 1
ma_params = np.array([1])

# Create the AR(2) process
ar2_process = ArmaProcess(ar_params, ma_params)

# Generate 500 observations
n_samples = 500
ar2_data = ar2_process.generate_sample(n_samples)

# Convert to pandas Series for better handling
ts_data = pd.Series(ar2_data)

Creating Basic PACF Plots

The simplest way to create a PACF plot is using plot_pacf() from statsmodels. This function handles the calculation and visualization in one step:

# Create basic PACF plot
fig, ax = plt.subplots(figsize=(10, 5))
plot_pacf(ts_data, lags=40, ax=ax)
plt.title('Partial Autocorrelation Function - AR(2) Process')
plt.xlabel('Lag')
plt.ylabel('Partial Autocorrelation')
plt.tight_layout()
plt.show()

This produces a stem plot where each vertical line represents the partial autocorrelation at that lag. The shaded region shows the 95% confidence interval—spikes extending beyond this region are statistically significant.

For our AR(2) process, you should see significant spikes at lags 1 and 2, then values dropping into the confidence band. This pattern clearly indicates an AR(2) structure, exactly what we simulated.

Customizing PACF Visualizations

The default parameters work for quick exploration, but you’ll often need customization for analysis and publication. Here are the key parameters:

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Default method (Yule-Walker)
plot_pacf(ts_data, lags=30, ax=axes[0, 0], method='ywm')
axes[0, 0].set_title('Method: Yule-Walker (default)')

# OLS method
plot_pacf(ts_data, lags=30, ax=axes[0, 1], method='ols')
axes[0, 1].set_title('Method: OLS')

# Custom confidence interval (99%)
plot_pacf(ts_data, lags=30, ax=axes[1, 0], alpha=0.01)
axes[1, 0].set_title('99% Confidence Interval (alpha=0.01)')

# Fewer lags for clarity
plot_pacf(ts_data, lags=15, ax=axes[1, 1], alpha=0.05)
axes[1, 1].set_title('Focused View (15 lags)')

plt.tight_layout()
plt.show()

The method parameter determines how partial autocorrelations are calculated:

  • 'ywm' (Yule-Walker): Faster, works well for stationary series
  • 'ols': More accurate but slower, fits regressions for each lag
  • 'ywmle': Maximum likelihood variant of Yule-Walker

The alpha parameter controls confidence interval width. Lower alpha (like 0.01) creates wider bands, making the test more conservative.

Comparing ACF and PACF Side-by-Side

Understanding the difference between ACF and PACF is crucial for model identification. ACF shows total correlation (direct + indirect), while PACF shows only direct correlation.

For AR processes, PACF cuts off after lag p, while ACF decays gradually. For MA processes, the pattern reverses—ACF cuts off while PACF decays. Here’s how to visualize both:

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot ACF
plot_acf(ts_data, lags=40, ax=axes[0])
axes[0].set_title('Autocorrelation Function (ACF)')

# Plot PACF
plot_pacf(ts_data, lags=40, ax=axes[1])
axes[1].set_title('Partial Autocorrelation Function (PACF)')

plt.tight_layout()
plt.show()

For our AR(2) data, the ACF shows a gradual decay (exponential or sinusoidal), while PACF has sharp cutoff after lag 2. This signature pattern tells you immediately that you’re dealing with an AR process, not MA or ARMA.

Practical Application: Identifying AR Model Order

Let’s walk through a complete workflow using real-world-style data. We’ll generate data, analyze it, and select an appropriate model:

from statsmodels.tsa.ar_model import AutoReg
from sklearn.metrics import mean_squared_error

# Generate AR(3) process with noise
ar3_params = np.array([1, -0.5, 0.3, -0.2])
ma_params = np.array([1])
ar3_process = ArmaProcess(ar3_params, ma_params)
data = ar3_process.generate_sample(300)

# Split into train/test
train_size = int(len(data) * 0.8)
train, test = data[:train_size], data[train_size:]

# Visualize PACF to determine order
fig, ax = plt.subplots(figsize=(10, 5))
plot_pacf(train, lags=20, ax=ax)
plt.title('PACF Analysis for Model Order Selection')
plt.show()

# Based on PACF, test different AR orders
orders_to_test = [1, 2, 3, 4, 5]
results = {}

for p in orders_to_test:
    model = AutoReg(train, lags=p)
    fitted_model = model.fit()
    
    # Forecast on test set
    predictions = fitted_model.predict(start=len(train), 
                                       end=len(train)+len(test)-1)
    mse = mean_squared_error(test, predictions)
    results[p] = {'mse': mse, 'aic': fitted_model.aic}
    
    print(f"AR({p}) - MSE: {mse:.4f}, AIC: {fitted_model.aic:.2f}")

# Best model by AIC (lower is better)
best_order = min(results, key=lambda x: results[x]['aic'])
print(f"\nBest model order by AIC: AR({best_order})")

The PACF plot should show significant spikes at lags 1, 2, and 3, then drop off. The AIC comparison confirms AR(3) as the optimal choice, validating our PACF-based identification.

Common Pitfalls and Best Practices

The most frequent mistake is applying PACF to non-stationary data. Non-stationary series (trending, seasonal) produce spurious correlations that mislead model selection. Always check stationarity first using the Augmented Dickey-Fuller test or visual inspection.

Here’s a comparison showing why this matters:

from statsmodels.tsa.stattools import adfuller

# Create non-stationary data (random walk with drift)
non_stationary = np.cumsum(np.random.randn(500)) + 0.5 * np.arange(500)

# Create stationary version (first difference)
stationary = np.diff(non_stationary)

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Non-stationary series
axes[0, 0].plot(non_stationary)
axes[0, 0].set_title('Non-Stationary Series (Random Walk + Trend)')

# PACF of non-stationary
plot_pacf(non_stationary, lags=40, ax=axes[0, 1])
axes[0, 1].set_title('PACF: Non-Stationary (Misleading!)')

# Stationary series
axes[1, 0].plot(stationary)
axes[1, 0].set_title('Stationary Series (First Difference)')

# PACF of stationary
plot_pacf(stationary, lags=40, ax=axes[1, 1])
axes[1, 1].set_title('PACF: Stationary (Reliable)')

plt.tight_layout()
plt.show()

# Statistical test for stationarity
adf_result = adfuller(non_stationary)
print(f"Non-stationary ADF p-value: {adf_result[1]:.4f}")

adf_result_diff = adfuller(stationary)
print(f"Stationary ADF p-value: {adf_result_diff[1]:.4f}")

Additional best practices:

Use sufficient data: You need at least 50-100 observations for reliable PACF estimates. With fewer points, confidence intervals become too wide.

Consider seasonality: For seasonal data, look at PACF values at seasonal lags (12 for monthly data, 7 for daily). You might need seasonal differencing before analysis.

Don’t over-interpret: A single spike just outside the confidence band might be random. Look for clear patterns and multiple significant lags.

Validate with multiple methods: Combine PACF analysis with information criteria (AIC, BIC) and out-of-sample testing. PACF guides initial model selection, but validation confirms it.

The PACF is your first line of defense against model misspecification in time series analysis. Master its interpretation, respect its assumptions about stationarity, and you’ll build better forecasting models with less trial and error.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.