SARIMA Model Explained

Time series forecasting predicts future values based on historical patterns. ARIMA (AutoRegressive Integrated Moving Average) models have been the workhorse of time series analysis for decades,...

Key Insights

  • SARIMA extends ARIMA by adding seasonal components (P,D,Q,m) to capture repeating patterns at fixed intervals, making it essential for forecasting data with periodic fluctuations like retail sales or energy demand.
  • The model requires seven parameters: non-seasonal (p,d,q) control short-term dependencies, while seasonal (P,D,Q,m) handle patterns that repeat every m periods—proper parameter selection through ACF/PACF analysis is critical for accuracy.
  • SARIMA works best for univariate time series with stable seasonal patterns, but struggles with multiple seasonalities or complex non-linear trends where modern alternatives like Prophet or deep learning models may perform better.

Introduction to Time Series and ARIMA

Time series forecasting predicts future values based on historical patterns. ARIMA (AutoRegressive Integrated Moving Average) models have been the workhorse of time series analysis for decades, combining three components: AR (autoregressive terms using past values), I (differencing to achieve stationarity), and MA (moving average of past errors).

However, ARIMA has a critical limitation—it cannot handle seasonal patterns. Real-world data often exhibits seasonality: retail sales spike during holidays, energy consumption varies by season, and website traffic follows weekly patterns. This is where SARIMA (Seasonal ARIMA) becomes essential.

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.datasets import co2

# Load CO2 dataset with clear seasonal patterns
data = co2.load_pandas().data
data = data.fillna(data.interpolate())

plt.figure(figsize=(12, 6))
plt.plot(data.index, data['co2'])
plt.title('CO2 Levels - Clear Seasonal Pattern')
plt.xlabel('Year')
plt.ylabel('CO2 (ppm)')
plt.grid(True)
plt.show()

This dataset shows atmospheric CO2 levels with obvious yearly seasonality overlaid on an upward trend—exactly the type of data where SARIMA excels.

Understanding Seasonality in Data

Seasonality refers to patterns that repeat at fixed, known intervals. Unlike trends (long-term directional movements) or cycles (fluctuations without fixed periods), seasonal patterns are predictable and regular.

Common seasonal periods include:

  • Daily: Hourly website traffic patterns
  • Weekly: Weekday vs. weekend sales differences
  • Monthly: Utility bills affected by weather
  • Quarterly: Business earnings reports
  • Yearly: Holiday shopping seasons

Decomposing a time series helps visualize these components separately.

from statsmodels.tsa.seasonal import seasonal_decompose

# Decompose time series
decomposition = seasonal_decompose(data['co2'], model='additive', period=12)

fig, axes = plt.subplots(4, 1, figsize=(12, 10))

decomposition.observed.plot(ax=axes[0], title='Original')
decomposition.trend.plot(ax=axes[1], title='Trend')
decomposition.seasonal.plot(ax=axes[2], title='Seasonal')
decomposition.resid.plot(ax=axes[3], title='Residual')

plt.tight_layout()
plt.show()

This decomposition reveals the underlying structure: a strong upward trend, repeating seasonal pattern, and random residuals. SARIMA models all three components simultaneously.

SARIMA Components Breakdown

SARIMA extends ARIMA with seasonal parameters, resulting in seven total parameters: (p,d,q)(P,D,Q)m

Non-seasonal components (p,d,q):

  • p: Number of autoregressive terms (lag observations)
  • d: Degree of differencing (to make data stationary)
  • q: Number of moving average terms (lag forecast errors)

Seasonal components (P,D,Q):

  • P: Seasonal autoregressive order
  • D: Seasonal differencing order
  • Q: Seasonal moving average order
  • m: Number of periods per season (12 for monthly data with yearly seasonality)

For example, SARIMA(1,1,1)(1,1,1,12) means:

  • One non-seasonal AR term, one differencing, one MA term
  • One seasonal AR term, one seasonal differencing, one seasonal MA term
  • Seasonality repeats every 12 periods
from statsmodels.tsa.statespace.sarimax import SARIMAX

# Define SARIMA model structure
order = (1, 1, 1)  # (p,d,q)
seasonal_order = (1, 1, 1, 12)  # (P,D,Q,m)

# Initialize model (not yet fitted)
model = SARIMAX(data['co2'], 
                order=order,
                seasonal_order=seasonal_order,
                enforce_stationarity=False,
                enforce_invertibility=False)

Building a SARIMA Model Step-by-Step

Building an effective SARIMA model requires systematic data preparation and parameter selection.

Step 1: Test for Stationarity

SARIMA requires stationary data (constant mean and variance). Use the Augmented Dickey-Fuller test:

from statsmodels.tsa.stattools import adfuller

def test_stationarity(timeseries):
    result = adfuller(timeseries.dropna())
    print(f'ADF Statistic: {result[0]:.4f}')
    print(f'p-value: {result[1]:.4f}')
    
    if result[1] <= 0.05:
        print("Data is stationary")
    else:
        print("Data is non-stationary - differencing needed")

test_stationarity(data['co2'])

If p-value > 0.05, the data is non-stationary and requires differencing (d > 0 or D > 0).

Step 2: Analyze ACF and PACF Plots

Autocorrelation (ACF) and Partial Autocorrelation (PACF) plots help identify appropriate p and q values:

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Difference the data once
diff_data = data['co2'].diff().dropna()

fig, axes = plt.subplots(1, 2, figsize=(14, 5))
plot_acf(diff_data, lags=40, ax=axes[0])
plot_pacf(diff_data, lags=40, ax=axes[1])
plt.tight_layout()
plt.show()
  • Significant spikes at seasonal lags (12, 24, 36) indicate seasonal components
  • Gradual decay in ACF suggests AR terms
  • Sharp cutoff in PACF helps determine the AR order

Step 3: Fit the Model

# Fit SARIMA model
model = SARIMAX(data['co2'], 
                order=(1, 1, 1),
                seasonal_order=(1, 1, 1, 12))

results = model.fit(disp=False)
print(results.summary())

The summary provides coefficients, standard errors, and statistical significance for each parameter.

Model Evaluation and Diagnostics

Never trust a model without validating its assumptions. SARIMA assumes residuals are white noise (random, normally distributed, no autocorrelation).

# Diagnostic plots
results.plot_diagnostics(figsize=(14, 10))
plt.show()

# Calculate performance metrics
from sklearn.metrics import mean_squared_error, mean_absolute_error
import numpy as np

# In-sample predictions
predictions = results.fittedvalues

# Calculate metrics
rmse = np.sqrt(mean_squared_error(data['co2'][1:], predictions[1:]))
mae = mean_absolute_error(data['co2'][1:], predictions[1:])

print(f'RMSE: {rmse:.4f}')
print(f'MAE: {mae:.4f}')
print(f'AIC: {results.aic:.2f}')
print(f'BIC: {results.bic:.2f}')

Good diagnostics show:

  • Residuals centered around zero with constant variance
  • Normal distribution in Q-Q plot
  • No significant autocorrelation in correlogram
  • Lower AIC/BIC values indicate better model fit (useful for comparing models)

Forecasting and Practical Applications

The ultimate goal is accurate forecasting. SARIMA provides point forecasts and confidence intervals:

# Split data for validation
train = data['co2'][:-24]
test = data['co2'][-24:]

# Fit model on training data
model_train = SARIMAX(train, 
                      order=(1, 1, 1),
                      seasonal_order=(1, 1, 1, 12))
results_train = model_train.fit(disp=False)

# Forecast 24 steps ahead
forecast = results_train.forecast(steps=24)
forecast_ci = results_train.get_forecast(steps=24).conf_int()

# Visualize predictions
plt.figure(figsize=(14, 6))
plt.plot(train.index, train, label='Training Data')
plt.plot(test.index, test, label='Actual', color='green')
plt.plot(test.index, forecast, label='Forecast', color='red')
plt.fill_between(test.index, 
                 forecast_ci.iloc[:, 0], 
                 forecast_ci.iloc[:, 1], 
                 color='pink', alpha=0.3, label='95% Confidence Interval')
plt.legend()
plt.title('SARIMA Forecast vs Actual')
plt.show()

Real-world applications:

  • Retail: Forecast inventory needs accounting for holiday seasons
  • Energy: Predict electricity demand considering weather seasonality
  • Finance: Model quarterly earnings patterns
  • Web Analytics: Anticipate traffic spikes during weekly or yearly events

Limitations and Alternatives

SARIMA is powerful but not universal. It struggles with:

Multiple seasonal patterns: Data with both weekly and yearly seasonality (e.g., electricity demand) Non-linear trends: Exponential growth or regime changes External factors: Events like promotions, holidays, or policy changes Long-term dependencies: Very long seasonal periods may require excessive parameters

Alternatives to consider:

  • Prophet: Facebook’s library handles multiple seasonalities and holidays automatically
  • Exponential Smoothing (ETS): Simpler alternative for basic seasonal patterns
  • LSTM/GRU: Deep learning for complex, non-linear patterns
  • TBATS: Handles multiple seasonal periods and complex patterns
  • XGBoost with lag features: For data with many external predictors

SARIMA remains the go-to choice for univariate time series with clear, stable seasonal patterns. Its interpretability, statistical foundation, and computational efficiency make it indispensable for production forecasting systems. Master SARIMA first—it provides the conceptual foundation for understanding more advanced methods and often outperforms complex alternatives on well-behaved seasonal data.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.