How to Perform Seasonal Decomposition in Python
Time series data contains multiple patterns layered on top of each other. Seasonal decomposition breaks these patterns into three distinct components: trend (long-term direction), seasonality...
Key Insights
- Seasonal decomposition separates time series data into trend, seasonal, and residual components, making patterns easier to analyze and forecast
- Use additive decomposition when seasonal fluctuations are constant, multiplicative when they scale with the trend level
- STL decomposition handles outliers and missing data better than classical methods, making it the preferred choice for real-world datasets
Introduction to Seasonal Decomposition
Time series data contains multiple patterns layered on top of each other. Seasonal decomposition breaks these patterns into three distinct components: trend (long-term direction), seasonality (repeating patterns), and residuals (random noise or anomalies). Understanding these components separately gives you better insights than analyzing the raw data alone.
Decomposition helps with forecasting by isolating predictable patterns, detecting anomalies by examining residuals, and understanding business drivers by quantifying seasonal effects. Whether you’re analyzing retail sales, website traffic, or energy consumption, decomposition is a fundamental technique in your time series toolkit.
Let’s start by loading a classic dataset and visualizing the raw time series:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.datasets import co2
# Load atmospheric CO2 data
data = co2.load().data
data = data.fillna(data.interpolate()) # Handle missing values
plt.figure(figsize=(12, 4))
plt.plot(data.index, data.values)
plt.title('Atmospheric CO2 Concentration')
plt.xlabel('Date')
plt.ylabel('CO2 (ppm)')
plt.tight_layout()
plt.show()
This dataset shows clear upward trend and yearly seasonality, making it perfect for demonstrating decomposition techniques.
Understanding Decomposition Models
Choosing between additive and multiplicative models depends on how seasonality behaves relative to the trend. In additive models, seasonal fluctuations remain constant regardless of the trend level: Y(t) = T(t) + S(t) + R(t). In multiplicative models, seasonal fluctuations scale with the trend: Y(t) = T(t) × S(t) × R(t).
Use additive decomposition when seasonal variations stay roughly the same size throughout the series. Use multiplicative when seasonal swings grow or shrink proportionally with the trend level—common in business metrics like sales or revenue.
Here’s how the same dataset looks under both models:
from statsmodels.tsa.seasonal import seasonal_decompose
# Additive decomposition
additive = seasonal_decompose(data, model='additive', period=12)
# Multiplicative decomposition
multiplicative = seasonal_decompose(data, model='multiplicative', period=12)
# Plot comparison
fig, axes = plt.subplots(2, 4, figsize=(16, 8))
# Additive
axes[0, 0].plot(data)
axes[0, 0].set_title('Original (Additive)')
axes[0, 1].plot(additive.trend)
axes[0, 1].set_title('Trend')
axes[0, 2].plot(additive.seasonal)
axes[0, 2].set_title('Seasonal')
axes[0, 3].plot(additive.resid)
axes[0, 3].set_title('Residual')
# Multiplicative
axes[1, 0].plot(data)
axes[1, 0].set_title('Original (Multiplicative)')
axes[1, 1].plot(multiplicative.trend)
axes[1, 1].set_title('Trend')
axes[1, 2].plot(multiplicative.seasonal)
axes[1, 2].set_title('Seasonal')
axes[1, 3].plot(multiplicative.resid)
axes[1, 3].set_title('Residual')
plt.tight_layout()
plt.show()
For the CO2 data, additive decomposition works well because seasonal fluctuations remain relatively constant. If you were analyzing retail sales that double during holiday seasons, multiplicative would be more appropriate.
Classical Decomposition with statsmodels
The seasonal_decompose() function implements classical decomposition using moving averages. The two critical parameters are model (additive or multiplicative) and period (the number of observations per seasonal cycle).
For monthly data with yearly seasonality, set period=12. For daily data with weekly patterns, use period=7. Getting the period wrong produces meaningless results, so understand your data’s natural cycles first.
from statsmodels.tsa.seasonal import seasonal_decompose
# Perform decomposition
result = seasonal_decompose(data, model='additive', period=12)
# Access individual components
trend = result.trend
seasonal = result.seasonal
residual = result.resid
# Create comprehensive visualization
fig, axes = plt.subplots(4, 1, figsize=(12, 10))
data.plot(ax=axes[0], title='Original Time Series')
axes[0].set_ylabel('CO2 (ppm)')
trend.plot(ax=axes[1], title='Trend Component')
axes[1].set_ylabel('Trend')
seasonal.plot(ax=axes[2], title='Seasonal Component')
axes[2].set_ylabel('Seasonality')
residual.plot(ax=axes[3], title='Residual Component')
axes[3].set_ylabel('Residuals')
plt.tight_layout()
plt.show()
# Print component statistics
print(f"Trend range: {trend.min():.2f} to {trend.max():.2f}")
print(f"Seasonal range: {seasonal.min():.2f} to {seasonal.max():.2f}")
print(f"Residual std: {residual.std():.2f}")
The trend component shows the long-term increase in CO2 levels. The seasonal component reveals the annual cycle driven by vegetation growth patterns. Residuals should look like random noise—patterns here indicate model inadequacy or anomalies.
STL Decomposition (Seasonal-Trend decomposition using LOESS)
Classical decomposition has limitations: it can’t handle missing data well, is sensitive to outliers, and assumes constant seasonal patterns. STL (Seasonal and Trend decomposition using Loess) addresses these issues using locally weighted regression.
STL offers several advantages: it handles outliers robustly, allows seasonal patterns to change over time, works with any type of seasonality, and provides more flexibility through tunable parameters.
from statsmodels.tsa.seasonal import STL
# Perform STL decomposition
stl = STL(data, seasonal=13, trend=15)
stl_result = stl.fit()
# Plot STL results
fig, axes = plt.subplots(4, 1, figsize=(12, 10))
data.plot(ax=axes[0], title='Original Time Series')
stl_result.trend.plot(ax=axes[1], title='STL Trend')
stl_result.seasonal.plot(ax=axes[2], title='STL Seasonal')
stl_result.resid.plot(ax=axes[3], title='STL Residual')
plt.tight_layout()
plt.show()
Now let’s compare STL versus classical decomposition on data with outliers:
# Create synthetic data with outliers
np.random.seed(42)
time = pd.date_range('2020-01-01', periods=365, freq='D')
trend_comp = np.linspace(100, 150, 365)
seasonal_comp = 10 * np.sin(2 * np.pi * np.arange(365) / 7)
noise = np.random.normal(0, 2, 365)
ts_data = trend_comp + seasonal_comp + noise
# Add outliers
ts_data[100] += 50
ts_data[200] -= 40
ts = pd.Series(ts_data, index=time)
# Classical decomposition
classical = seasonal_decompose(ts, model='additive', period=7)
# STL decomposition
stl = STL(ts, seasonal=7, robust=True)
stl_result = stl.fit()
# Compare residuals
fig, axes = plt.subplots(2, 1, figsize=(12, 6))
classical.resid.plot(ax=axes[0], title='Classical Residuals')
stl_result.resid.plot(ax=axes[1], title='STL Residuals (Robust)')
plt.tight_layout()
plt.show()
The STL residuals show cleaner separation of outliers because the robust fitting prevents outliers from distorting the trend and seasonal estimates.
Extracting and Using Decomposed Components
Decomposed components aren’t just for visualization—they’re inputs for further analysis. Common applications include creating seasonally adjusted series, forecasting using trend extrapolation, and detecting anomalies through residual analysis.
Here’s how to extract and use these components:
from statsmodels.tsa.seasonal import STL
# Decompose the data
stl = STL(data, seasonal=13)
result = stl.fit()
# Create seasonally adjusted series
seasonally_adjusted = data - result.seasonal
# Plot original vs adjusted
fig, axes = plt.subplots(2, 1, figsize=(12, 8))
data.plot(ax=axes[0], title='Original Series')
seasonally_adjusted.plot(ax=axes[1], title='Seasonally Adjusted Series')
plt.tight_layout()
plt.show()
# Anomaly detection using residuals
residuals = result.resid
threshold = 3 * residuals.std()
anomalies = residuals[np.abs(residuals) > threshold]
print(f"Detected {len(anomalies)} anomalies:")
print(anomalies)
# Visualize anomalies
plt.figure(figsize=(12, 4))
plt.plot(data.index, data.values, label='Original')
plt.scatter(anomalies.index, data.loc[anomalies.index],
color='red', s=100, label='Anomalies', zorder=5)
plt.legend()
plt.title('Anomaly Detection Using Residual Analysis')
plt.tight_layout()
plt.show()
Seasonally adjusted data makes it easier to spot genuine changes in underlying trends without seasonal noise obscuring the signal. Residual-based anomaly detection works because after removing trend and seasonality, large residuals indicate unusual observations.
Practical Applications and Best Practices
Choosing the right period parameter is critical. For business data, common periods are 7 (weekly patterns in daily data), 12 (monthly patterns in annual data), or 4 (quarterly patterns). Inspect your data’s autocorrelation function (ACF) to identify dominant periodicities.
Handle missing data before decomposition. While STL tolerates some missing values, classical methods require complete data. Use interpolation for small gaps, but be cautious with large gaps that might introduce artificial patterns.
import pandas as pd
from statsmodels.tsa.seasonal import STL
import matplotlib.pyplot as plt
# Load real-world energy consumption data
# This example uses synthetic data mimicking energy patterns
np.random.seed(42)
dates = pd.date_range('2020-01-01', periods=730, freq='D')
base_consumption = 1000
trend = np.linspace(0, 100, 730)
seasonal_yearly = 200 * np.sin(2 * np.pi * np.arange(730) / 365.25)
seasonal_weekly = 50 * np.sin(2 * np.pi * np.arange(730) / 7)
noise = np.random.normal(0, 30, 730)
energy = base_consumption + trend + seasonal_yearly + seasonal_weekly + noise
energy_series = pd.Series(energy, index=dates)
# Complete workflow
# 1. Decompose with appropriate period
stl = STL(energy_series, seasonal=7, trend=91) # 7-day seasonality, ~3-month trend
result = stl.fit()
# 2. Create seasonally adjusted series
adjusted = energy_series - result.seasonal
# 3. Identify unusual consumption days
residual_threshold = 2.5 * result.resid.std()
high_consumption = result.resid[result.resid > residual_threshold]
low_consumption = result.resid[result.resid < -residual_threshold]
# 4. Calculate seasonal factors for forecasting
seasonal_pattern = result.seasonal[:7].values # One week pattern
print("Seasonal Factors (Day of Week):")
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
for day, factor in zip(days, seasonal_pattern):
print(f"{day}: {factor:+.1f} kWh")
print(f"\nHigh consumption days: {len(high_consumption)}")
print(f"Low consumption days: {len(low_consumption)}")
# Visualize complete analysis
fig, axes = plt.subplots(3, 1, figsize=(14, 10))
energy_series.plot(ax=axes[0], label='Original', alpha=0.7)
adjusted.plot(ax=axes[0], label='Seasonally Adjusted', linewidth=2)
axes[0].legend()
axes[0].set_title('Energy Consumption: Original vs Adjusted')
result.trend.plot(ax=axes[1], linewidth=2)
axes[1].set_title('Long-term Trend')
result.resid.plot(ax=axes[2], alpha=0.5)
axes[2].axhline(residual_threshold, color='r', linestyle='--', label='Threshold')
axes[2].axhline(-residual_threshold, color='r', linestyle='--')
axes[2].scatter(high_consumption.index, high_consumption, color='red', s=50)
axes[2].scatter(low_consumption.index, low_consumption, color='blue', s=50)
axes[2].legend()
axes[2].set_title('Residuals with Anomalies')
plt.tight_layout()
plt.show()
Key limitations to remember: decomposition assumes patterns repeat regularly, requires sufficient historical data (at least two full seasonal cycles), and doesn’t extrapolate—it describes historical patterns but doesn’t forecast future values directly. Use decomposed components as inputs to forecasting models rather than treating decomposition itself as a forecasting method.
Start with STL decomposition unless you have specific reasons to use classical methods. STL’s robustness and flexibility make it suitable for most real-world scenarios where data quality isn’t perfect and patterns evolve over time.