How to Perform Seasonal Adjustment in Python
Time series data often contains predictable patterns that repeat at fixed intervals—monthly sales spikes during holidays, quarterly earnings cycles, or weekly traffic patterns. These seasonal effects...
Key Insights
- Seasonal adjustment removes predictable patterns from time series data, revealing underlying trends and making year-over-year comparisons meaningful—essential for economic indicators, retail analytics, and demand forecasting.
- STL decomposition offers the most flexibility for modern applications, handling outliers and changing seasonal patterns better than classical methods, while X-13ARIMA-SEATS remains the gold standard for official statistics.
- Always validate your seasonal adjustment by examining residuals for remaining patterns; a properly adjusted series should show no autocorrelation in the seasonal component and normally distributed residuals.
Introduction to Seasonal Adjustment
Time series data often contains predictable patterns that repeat at fixed intervals—monthly sales spikes during holidays, quarterly earnings cycles, or weekly traffic patterns. These seasonal effects obscure the underlying trends that matter for decision-making. When retail executives see a 20% jump in November sales, they need to know whether business is genuinely improving or if it’s just typical holiday shopping.
Seasonal adjustment separates these recurring patterns from the true signal. Without it, you’ll mistake normal seasonal variation for meaningful change, leading to poor forecasts and misguided business decisions. Government agencies seasonally adjust employment data, GDP figures, and inflation rates before publishing them. E-commerce companies adjust sales metrics to identify real growth. Energy providers adjust consumption data to plan capacity independent of weather-driven seasonal demand.
The goal is straightforward: isolate and remove the seasonal component so you can analyze the trend and irregular variations separately.
Understanding Seasonal Decomposition
Time series decomposition breaks data into three components: trend (long-term movement), seasonal (repeating patterns), and residual (irregular fluctuations). You combine these components using either an additive or multiplicative model.
Additive model: Y(t) = T(t) + S(t) + R(t)
Use this when seasonal variations remain constant in absolute terms regardless of the trend level. If sales always spike by $10,000 in December whether your baseline is $50,000 or $100,000, that’s additive.
Multiplicative model: Y(t) = T(t) × S(t) × R(t)
Use this when seasonal variations scale with the trend. If December sales are always 150% of the average month, that’s multiplicative. Most real-world data exhibits multiplicative seasonality—percentage changes matter more than absolute changes.
Let’s decompose monthly retail sales data:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
# Generate sample retail sales data with trend and seasonality
np.random.seed(42)
dates = pd.date_range('2018-01-01', '2023-12-31', freq='MS')
trend = np.linspace(100, 150, len(dates))
seasonal = 10 * np.sin(np.arange(len(dates)) * 2 * np.pi / 12)
noise = np.random.normal(0, 3, len(dates))
sales = trend + seasonal + noise
df = pd.DataFrame({'sales': sales}, index=dates)
# Perform seasonal decomposition
decomposition = seasonal_decompose(df['sales'], model='additive', period=12)
# Plot components
fig, axes = plt.subplots(4, 1, figsize=(12, 10))
df['sales'].plot(ax=axes[0], title='Original Series')
decomposition.trend.plot(ax=axes[1], title='Trend')
decomposition.seasonal.plot(ax=axes[2], title='Seasonal')
decomposition.resid.plot(ax=axes[3], title='Residual')
plt.tight_layout()
plt.show()
# Extract seasonally adjusted series
adjusted_sales = df['sales'] - decomposition.seasonal
This basic decomposition uses moving averages internally. The trend line smooths out short-term fluctuations, the seasonal component captures the repeating pattern, and residuals show what’s left over.
Classical Seasonal Adjustment Methods
Classical decomposition uses centered moving averages to estimate the trend, then extracts seasonality from the detrended series. While simple, it has limitations: it loses observations at the beginning and end of the series, can’t handle missing data, and assumes constant seasonal patterns.
Here’s a manual implementation to understand the mechanics:
def classical_seasonal_adjustment(series, period=12):
"""
Manually perform seasonal adjustment using moving averages
"""
# Step 1: Calculate centered moving average (trend)
trend = series.rolling(window=period, center=True).mean()
# Step 2: Detrend the series
detrended = series - trend
# Step 3: Calculate seasonal component (average for each period)
seasonal = detrended.groupby(detrended.index.month).transform('mean')
# Step 4: Remove seasonal component
adjusted = series - seasonal
# Step 5: Calculate residuals
residual = series - trend - seasonal
return {
'adjusted': adjusted,
'trend': trend,
'seasonal': seasonal,
'residual': residual
}
# Apply to our data
components = classical_seasonal_adjustment(df['sales'], period=12)
print("Original December value:", df['sales']['2022-12'])
print("Seasonally adjusted December value:", components['adjusted']['2022-12'])
print("Seasonal factor for December:", components['seasonal']['2022-12'])
This approach works for stable, well-behaved data but breaks down with outliers or evolving seasonal patterns. That’s where modern methods excel.
STL Decomposition (Seasonal and Trend decomposition using Loess)
STL uses locally weighted regression (LOESS) to decompose time series, offering significant advantages: it handles any type of seasonality, is robust to outliers, and allows the seasonal component to change over time. It’s the best choice for most modern applications.
Key parameters:
seasonal: Controls how quickly the seasonal component can change (odd integer)trend: Controls smoothness of the trend componentrobust: Whether to use robust fitting (recommended with outliers)
from statsmodels.tsa.seasonal import STL
# Load classic airline passenger data
passengers = pd.read_csv('airline_passengers.csv', index_col='Month', parse_dates=True)
# STL decomposition with different parameter settings
stl_default = STL(passengers['Passengers'], seasonal=13).fit()
stl_flexible = STL(passengers['Passengers'], seasonal=7, trend=15).fit()
stl_robust = STL(passengers['Passengers'], seasonal=13, robust=True).fit()
# Compare results
fig, axes = plt.subplots(3, 1, figsize=(12, 8))
axes[0].plot(passengers.index, passengers['Passengers'], label='Original')
axes[0].plot(passengers.index, stl_default.trend + stl_default.resid, label='STL Adjusted (default)')
axes[0].legend()
axes[0].set_title('Default STL (seasonal=13)')
axes[1].plot(passengers.index, stl_flexible.trend + stl_flexible.resid, label='STL Adjusted (flexible)')
axes[1].legend()
axes[1].set_title('Flexible STL (seasonal=7, trend=15)')
axes[2].plot(passengers.index, stl_robust.trend + stl_robust.resid, label='STL Adjusted (robust)')
axes[2].legend()
axes[2].set_title('Robust STL')
plt.tight_layout()
# Extract adjusted series
adjusted_passengers = passengers['Passengers'] - stl_default.seasonal
Lower seasonal values allow the seasonal pattern to change more quickly—useful for evolving businesses. Higher values assume more stable seasonality. For most applications, start with seasonal=13 for monthly data (must be odd) and adjust based on your domain knowledge.
X-13ARIMA-SEATS Method
X-13ARIMA-SEATS is the industry standard developed by the U.S. Census Bureau. It combines seasonal adjustment with ARIMA modeling for more sophisticated handling of calendar effects, outliers, and trading day adjustments. Government statistical agencies worldwide use it for official economic indicators.
The method automatically identifies and adjusts for:
- Moving holidays (Easter, Lunar New Year)
- Trading day effects (number of weekdays vs. weekends)
- Outliers and structural breaks
- Calendar effects (leap years, different month lengths)
# Install: pip install seasonal
from seasonal import fit_seasons
import pandas as pd
# Load economic indicator data (e.g., unemployment rate)
unemployment = pd.read_csv('unemployment.csv', index_col='date', parse_dates=True)
# Perform X-13 seasonal adjustment
# Note: Requires X-13ARIMA-SEATS binary installed separately
try:
from statsmodels.tsa.x13 import x13_arima_analysis
# Run X-13 adjustment
x13_result = x13_arima_analysis(
unemployment['rate'],
trading=True, # Adjust for trading days
outlier=True, # Detect and adjust outliers
forecast_periods=12 # Include forecasts
)
adjusted_unemployment = x13_result.seasadj
# Compare original vs adjusted
fig, ax = plt.subplots(figsize=(12, 6))
unemployment['rate'].plot(ax=ax, label='Original', alpha=0.7)
adjusted_unemployment.plot(ax=ax, label='Seasonally Adjusted', linewidth=2)
ax.legend()
ax.set_title('Unemployment Rate: Original vs. Seasonally Adjusted')
except ImportError:
print("X-13 requires separate installation of Census Bureau software")
# Fallback to STL for demonstration
stl_unemp = STL(unemployment['rate'], seasonal=13, robust=True).fit()
adjusted_unemployment = unemployment['rate'] - stl_unemp.seasonal
X-13 requires additional setup (installing the Census Bureau’s binary), which can be challenging. For most applications, STL provides 90% of the value with 10% of the complexity. Use X-13 when you need official-quality adjustments or must handle complex calendar effects.
Evaluating and Validating Results
Never trust seasonal adjustment blindly. Always validate that you’ve actually removed seasonality without distorting the underlying signal.
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.stats.diagnostic import acorr_ljungbox
def validate_seasonal_adjustment(original, adjusted, seasonal_period=12):
"""
Comprehensive validation of seasonal adjustment
"""
# 1. Visual inspection
fig, axes = plt.subplots(3, 2, figsize=(14, 10))
# Original vs adjusted
axes[0, 0].plot(original.index, original, label='Original', alpha=0.6)
axes[0, 0].plot(adjusted.index, adjusted, label='Adjusted', linewidth=2)
axes[0, 0].legend()
axes[0, 0].set_title('Original vs. Adjusted Series')
# Seasonal component removed
seasonal = original - adjusted
axes[0, 1].plot(seasonal.index, seasonal)
axes[0, 1].set_title('Removed Seasonal Component')
# ACF of original
plot_acf(original.dropna(), lags=36, ax=axes[1, 0])
axes[1, 0].set_title('ACF: Original Series')
# ACF of adjusted
plot_acf(adjusted.dropna(), lags=36, ax=axes[1, 1])
axes[1, 1].set_title('ACF: Adjusted Series')
# Residuals distribution
residuals = adjusted - adjusted.rolling(window=seasonal_period).mean()
axes[2, 0].hist(residuals.dropna(), bins=30, edgecolor='black')
axes[2, 0].set_title('Distribution of Residuals')
# Q-Q plot
from scipy import stats
stats.probplot(residuals.dropna(), dist="norm", plot=axes[2, 1])
axes[2, 1].set_title('Q-Q Plot')
plt.tight_layout()
# 2. Statistical tests
# Ljung-Box test for autocorrelation at seasonal lags
lb_test = acorr_ljungbox(adjusted.dropna(), lags=[seasonal_period], return_df=True)
print("Validation Results:")
print(f"Ljung-Box test p-value at lag {seasonal_period}: {lb_test['lb_pvalue'].values[0]:.4f}")
print(f"(p > 0.05 indicates no remaining seasonality)")
return fig
# Validate our STL adjustment
validate_seasonal_adjustment(df['sales'], adjusted_sales, seasonal_period=12)
Good seasonal adjustment should show:
- No significant spikes at seasonal lags in the ACF plot of adjusted data
- Ljung-Box test p-value > 0.05 (no autocorrelation)
- Normally distributed residuals
- Removed seasonal component showing clear, stable patterns
Practical Applications and Best Practices
Choose your method based on your needs:
- STL: Default choice for most applications. Flexible, robust, handles outliers.
- Classical decomposition: Quick exploratory analysis, educational purposes.
- X-13ARIMA-SEATS: Official statistics, complex calendar effects, regulatory requirements.
Handle edge cases:
For multiple seasonal patterns (daily + weekly + yearly), use MSTL (Multiple STL):
from statsmodels.tsa.seasonal import MSTL
# Example: hourly data with daily and weekly seasonality
mstl = MSTL(hourly_data, periods=[24, 168]).fit() # 24 hours, 168 hours/week
adjusted_hourly = hourly_data - mstl.seasonal
End-to-end pipeline:
def seasonal_adjustment_pipeline(data, target_col, seasonal_period=12, method='stl'):
"""
Complete pipeline: load, adjust, forecast, compare
"""
# 1. Seasonal adjustment
if method == 'stl':
decomposition = STL(data[target_col], seasonal=seasonal_period+1, robust=True).fit()
adjusted = data[target_col] - decomposition.seasonal
# 2. Split train/test
train_size = int(len(data) * 0.8)
train_orig = data[target_col][:train_size]
train_adj = adjusted[:train_size]
test = data[target_col][train_size:]
# 3. Simple forecast comparison (naive: last year's value)
from sklearn.metrics import mean_absolute_error, mean_squared_error
# Forecast without adjustment
forecast_orig = train_orig.iloc[-seasonal_period:].values
# Forecast with adjustment + re-seasonalize
forecast_adj = train_adj.iloc[-seasonal_period:].values + decomposition.seasonal[-seasonal_period:].values
# 4. Evaluate
mae_orig = mean_absolute_error(test[:seasonal_period], forecast_orig)
mae_adj = mean_absolute_error(test[:seasonal_period], forecast_adj)
print(f"MAE without adjustment: {mae_orig:.2f}")
print(f"MAE with adjustment: {mae_adj:.2f}")
print(f"Improvement: {((mae_orig - mae_adj) / mae_orig * 100):.1f}%")
return adjusted, decomposition
# Run pipeline
adjusted_series, decomp = seasonal_adjustment_pipeline(df, 'sales', seasonal_period=12)
Seasonal adjustment is not optional for time series analysis—it’s fundamental. Start with STL, validate thoroughly, and always compare results with and without adjustment to understand its impact on your specific use case.