How to Choose ARIMA Parameters (p, d, q) in Python

Key Insights

The differencing parameter (d) should be determined first using stationarity tests like ADF, while p and q are identified through ACF/PACF plot analysis or grid search with AIC/BIC metrics
Auto ARIMA automates parameter selection but understanding manual methods gives you better intuition for model behavior and helps debug poor forecasts
Always validate your chosen parameters through residual diagnostics—a model with optimal AIC/BIC is worthless if residuals show patterns or fail normality tests

Understanding ARIMA Parameters

ARIMA models require three integer parameters that fundamentally shape how the model learns from your time series data. The p parameter controls the autoregressive component—how many historical values influence the current prediction. The d parameter specifies differencing order to achieve stationarity. The q parameter defines the moving average component—how many lagged forecast errors factor into predictions.

Choosing these parameters incorrectly leads to either underfitting (missing important patterns) or overfitting (learning noise). Let’s work through systematic methods to identify optimal values.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller, kpss
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima.model import ARIMA
import warnings
warnings.filterwarnings('ignore')

Determining d Through Stationarity Testing

Start by finding the differencing parameter. ARIMA models require stationary data—constant mean, variance, and autocorrelation over time. Most real-world series aren’t stationary, so we difference them.

The Augmented Dickey-Fuller (ADF) test is your primary tool. The null hypothesis states the series has a unit root (non-stationary). A p-value below 0.05 rejects this, indicating stationarity.

def check_stationarity(series, name='Series'):
    """Perform ADF test and return results"""
    result = adfuller(series.dropna())
    print(f'\n{name} ADF Test Results:')
    print(f'ADF Statistic: {result[0]:.6f}')
    print(f'p-value: {result[1]:.6f}')
    print(f'Critical Values:')
    for key, value in result[4].items():
        print(f'  {key}: {value:.3f}')
    
    if result[1] <= 0.05:
        print("Result: Stationary (reject null hypothesis)")
        return True
    else:
        print("Result: Non-stationary (fail to reject null hypothesis)")
        return False

# Example with synthetic data
np.random.seed(42)
dates = pd.date_range('2020-01-01', periods=200, freq='D')
trend = np.linspace(100, 150, 200)
seasonal = 10 * np.sin(np.linspace(0, 8*np.pi, 200))
noise = np.random.normal(0, 3, 200)
data = trend + seasonal + noise
ts = pd.Series(data, index=dates)

# Test original series
is_stationary = check_stationarity(ts, 'Original')

# If not stationary, difference once
if not is_stationary:
    ts_diff1 = ts.diff().dropna()
    is_stationary = check_stationarity(ts_diff1, 'First Difference')
    d = 1
    
    # If still not stationary, difference again
    if not is_stationary:
        ts_diff2 = ts_diff1.diff().dropna()
        check_stationarity(ts_diff2, 'Second Difference')
        d = 2
else:
    d = 0

print(f'\nRecommended d parameter: {d}')

Rarely do you need d > 2. If second differencing doesn’t achieve stationarity, you likely have structural breaks or need a different modeling approach.

The KPSS test provides confirmation—its null hypothesis is stationarity (opposite of ADF). Use both tests to avoid contradictory results.

Using ACF and PACF for p and q

Once you have stationary data, ACF and PACF plots reveal the AR and MA orders.

ACF (Autocorrelation Function) shows correlation between the series and its lags. A sharp cutoff after lag q suggests MA(q). Gradual decay suggests AR components.

PACF (Partial Autocorrelation Function) shows correlation at each lag after removing effects of shorter lags. A sharp cutoff after lag p suggests AR(p). Gradual decay suggests MA components.

# Use the differenced series from previous step
stationary_series = ts.diff(d).dropna() if d > 0 else ts

fig, axes = plt.subplots(1, 2, figsize=(14, 4))

# ACF plot
plot_acf(stationary_series, lags=40, ax=axes[0])
axes[0].set_title('Autocorrelation Function (ACF)')
axes[0].set_xlabel('Lag')

# PACF plot  
plot_pacf(stationary_series, lags=40, ax=axes[1])
axes[1].set_title('Partial Autocorrelation Function (PACF)')
axes[1].set_xlabel('Lag')

plt.tight_layout()
plt.show()

Reading the plots:

Significant spikes extend beyond the shaded confidence interval
ACF cutoff at lag 2, PACF decay: Try MA(2), so q=2, p=0
PACF cutoff at lag 3, ACF decay: Try AR(3), so p=3, q=0
Both decay gradually: Try ARMA with small p and q values (1-3 each)
Both cut off: Try specific p and q at cutoff points

This interpretation requires practice. When uncertain, use grid search.

Grid Search with Information Criteria

Grid search tests multiple parameter combinations and ranks them by AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion). Lower values indicate better models.

AIC balances fit quality against model complexity. BIC penalizes complexity more heavily than AIC, preferring simpler models.

def grid_search_arima(series, p_range, d_range, q_range):
    """
    Perform grid search over ARIMA parameters
    Returns DataFrame sorted by AIC
    """
    results = []
    
    for p in p_range:
        for d in d_range:
            for q in q_range:
                try:
                    model = ARIMA(series, order=(p, d, q))
                    fitted = model.fit()
                    results.append({
                        'p': p, 'd': d, 'q': q,
                        'AIC': fitted.aic,
                        'BIC': fitted.bic
                    })
                except:
                    continue
    
    results_df = pd.DataFrame(results)
    return results_df.sort_values('AIC').reset_index(drop=True)

# Search over reasonable parameter space
p_values = range(0, 4)
d_values = range(0, 3)
q_values = range(0, 4)

results = grid_search_arima(ts, p_values, d_values, q_values)
print("\nTop 10 models by AIC:")
print(results.head(10))

best_params = results.iloc[0]
print(f"\nBest parameters: p={int(best_params['p'])}, "
      f"d={int(best_params['d'])}, q={int(best_params['q'])}")

Start with ranges 0-3 for p and q, 0-2 for d. Expand if the optimal values sit at range boundaries.

Auto ARIMA for Automated Selection

The pmdarima library automates this entire process. It’s excellent for quick prototyping or when dealing with many time series.

from pmdarima import auto_arima

# Fit auto ARIMA
auto_model = auto_arima(
    ts,
    start_p=0, max_p=5,
    start_q=0, max_q=5,
    d=None,  # Let it determine d
    seasonal=False,
    stepwise=True,  # Faster than exhaustive search
    suppress_warnings=True,
    error_action='ignore',
    trace=True  # Print search progress
)

print(f"\nAuto ARIMA selected: {auto_model.order}")
print(auto_model.summary())

Set seasonal=True for seasonal ARIMA (SARIMA) if you have clear seasonal patterns. The stepwise=True parameter uses a smart search algorithm instead of brute force, making it much faster.

Use auto ARIMA when you need quick results or lack domain knowledge. Use manual methods when you need to understand model behavior or debug forecasting issues.

Model Validation Through Diagnostics

Optimal parameters mean nothing if the model produces poor forecasts. Always validate through residual analysis.

# Fit model with chosen parameters
p, d, q = 2, 1, 2  # Example parameters
model = ARIMA(ts, order=(p, d, q))
fitted_model = model.fit()

# Get residuals
residuals = fitted_model.resid

# Create diagnostic plots
fig, axes = plt.subplots(2, 2, figsize=(14, 8))

# Residuals over time
axes[0, 0].plot(residuals)
axes[0, 0].set_title('Residuals Over Time')
axes[0, 0].axhline(y=0, color='r', linestyle='--')

# Residuals histogram
axes[0, 1].hist(residuals, bins=30, edgecolor='black')
axes[0, 1].set_title('Residuals Distribution')

# ACF of residuals
plot_acf(residuals, lags=40, ax=axes[1, 0])
axes[1, 0].set_title('Residuals ACF')

# Q-Q plot
from scipy import stats
stats.probplot(residuals, dist="norm", plot=axes[1, 1])
axes[1, 1].set_title('Q-Q Plot')

plt.tight_layout()
plt.show()

# Ljung-Box test for residual autocorrelation
from statsmodels.stats.diagnostic import acorr_ljungbox
lb_test = acorr_ljungbox(residuals, lags=[10], return_df=True)
print("\nLjung-Box Test:")
print(lb_test)
print("p-value > 0.05 indicates no significant autocorrelation (good)")

Good residuals should:

Fluctuate randomly around zero
Show no patterns or trends
Follow normal distribution (Q-Q plot near diagonal)
Have no significant autocorrelation (ACF within confidence bands, Ljung-Box p > 0.05)

If diagnostics fail, try different parameters or consider alternative models.

Complete Practical Example

Here’s an end-to-end workflow with validation:

# Train/test split
train_size = int(len(ts) * 0.8)
train, test = ts[:train_size], ts[train_size:]

# Fit model on training data
model = ARIMA(train, order=(2, 1, 2))
fitted = model.fit()

# Forecast on test period
forecast = fitted.forecast(steps=len(test))

# Calculate RMSE
from sklearn.metrics import mean_squared_error
rmse = np.sqrt(mean_squared_error(test, forecast))
print(f"Test RMSE: {rmse:.2f}")

# Visualize results
plt.figure(figsize=(12, 6))
plt.plot(train.index, train, label='Training Data')
plt.plot(test.index, test, label='Actual Test Data')
plt.plot(test.index, forecast, label='Forecast', color='red')
plt.fill_between(test.index, 
                 forecast - 1.96*np.std(fitted.resid),
                 forecast + 1.96*np.std(fitted.resid),
                 alpha=0.2, color='red', label='95% Confidence')
plt.legend()
plt.title('ARIMA Forecast vs Actual')
plt.show()

This validation approach reveals whether your parameters generalize to unseen data. A model that fits training data perfectly but fails on test data needs different parameters or more features.

Start with stationarity testing for d, examine ACF/PACF for initial p and q estimates, validate with grid search, and always check residuals. This systematic approach beats random parameter guessing every time.