How to Use Statsmodels for Time Series in Python

Key Insights

Statsmodels provides rigorous statistical methods for time series analysis, including ARIMA models, stationarity tests, and diagnostic tools that give you mathematical confidence in your forecasts.
Always test for stationarity using the Augmented Dickey-Fuller test before modeling—non-stationary data will produce unreliable forecasts and misleading results.
Use ACF and PACF plots to systematically identify ARIMA parameters rather than guessing, and validate models with AIC/BIC scores combined with out-of-sample testing on held-out data.

Introduction to Time Series Analysis with Statsmodels

Statsmodels is Python’s go-to library for rigorous statistical modeling of time series data. Unlike machine learning libraries that treat time series as just another prediction problem, Statsmodels gives you classical statistical methods with proper hypothesis testing, confidence intervals, and interpretable parameters.

You’ll use Statsmodels when you need to forecast sales, analyze trends in financial data, predict resource utilization, or detect seasonality in user behavior. The library excels at ARIMA models, seasonal decomposition, and statistical tests that help you understand why your model works, not just that it works.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.stattools import adfuller, acf, pacf
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.statespace.sarimax import SARIMAX
import warnings
warnings.filterwarnings('ignore')

# Load sample airline passenger data
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv'
df = pd.read_csv(url, parse_dates=['Month'], index_col='Month')
df.columns = ['Passengers']
print(df.head())

This dataset contains monthly airline passenger counts from 1949-1960, a classic time series with clear trend and seasonality.

Data Preparation and Exploratory Analysis

Time series analysis starts with proper data preparation. Your index must be a DatetimeIndex, and you need to understand the patterns in your data before modeling.

# Ensure datetime index with proper frequency
df.index = pd.to_datetime(df.index)
df = df.asfreq('MS')  # Month Start frequency

# Handle missing values (forward fill or interpolation)
df = df.fillna(method='ffill')

# Basic visualization
plt.figure(figsize=(12, 4))
plt.plot(df.index, df['Passengers'])
plt.title('Airline Passengers Over Time')
plt.xlabel('Date')
plt.ylabel('Passengers')
plt.grid(True, alpha=0.3)
plt.show()

Decomposition separates your time series into trend, seasonal, and residual components. This reveals the underlying structure:

# Seasonal decomposition
decomposition = seasonal_decompose(df['Passengers'], model='multiplicative', period=12)

fig, axes = plt.subplots(4, 1, figsize=(12, 10))
decomposition.observed.plot(ax=axes[0], title='Original')
decomposition.trend.plot(ax=axes[1], title='Trend')
decomposition.seasonal.plot(ax=axes[2], title='Seasonality')
decomposition.resid.plot(ax=axes[3], title='Residuals')
plt.tight_layout()
plt.show()

Use model='additive' when seasonal variations are constant, model='multiplicative' when they grow with the trend. The airline data shows increasing seasonal swings, so multiplicative is correct.

Testing for Stationarity

Stationarity means your time series has constant mean and variance over time. Most statistical models assume stationarity, so this isn’t optional—it’s required.

The Augmented Dickey-Fuller test gives you a p-value. If p < 0.05, reject the null hypothesis of non-stationarity (your data is stationary).

def adf_test(series, name=''):
    result = adfuller(series.dropna())
    print(f'ADF Test for {name}')
    print(f'ADF Statistic: {result[0]:.4f}')
    print(f'p-value: {result[1]:.4f}')
    print(f'Critical Values:')
    for key, value in result[4].items():
        print(f'  {key}: {value:.3f}')
    print(f'Stationary: {"Yes" if result[1] < 0.05 else "No"}\n')

# Test original series
adf_test(df['Passengers'], 'Original Series')

The original airline data fails the stationarity test. Fix this with differencing:

# First-order differencing removes trend
df['Passengers_diff'] = df['Passengers'].diff()

# Test again
adf_test(df['Passengers_diff'], 'First Difference')

# Log transformation stabilizes variance
df['Passengers_log'] = np.log(df['Passengers'])
df['Passengers_log_diff'] = df['Passengers_log'].diff()

adf_test(df['Passengers_log_diff'], 'Log + First Difference')

For data with exponential growth, apply log transformation before differencing. This is your “d” parameter in ARIMA(p,d,q).

ARIMA Models: Building and Fitting

ARIMA models combine AutoRegression (p), Integration/differencing (d), and Moving Average (q) components. ACF and PACF plots tell you which parameters to use.

# Plot ACF and PACF on stationary data
fig, axes = plt.subplots(1, 2, figsize=(14, 4))
plot_acf(df['Passengers_log_diff'].dropna(), lags=40, ax=axes[0])
plot_pacf(df['Passengers_log_diff'].dropna(), lags=40, ax=axes[1])
plt.tight_layout()
plt.show()

Rules of thumb:

PACF cuts off at lag p → Use AR(p)
ACF cuts off at lag q → Use MA(q)
Both decay gradually → Use both AR and MA terms

For systematic parameter selection, use grid search:

# Grid search for best ARIMA parameters
best_aic = np.inf
best_params = None
best_model = None

for p in range(0, 4):
    for d in range(0, 2):
        for q in range(0, 4):
            try:
                model = ARIMA(df['Passengers_log'], order=(p, d, q))
                fitted = model.fit()
                if fitted.aic < best_aic:
                    best_aic = fitted.aic
                    best_params = (p, d, q)
                    best_model = fitted
            except:
                continue

print(f'Best ARIMA{best_params} with AIC: {best_aic:.2f}')
print(best_model.summary())

Lower AIC/BIC is better, but don’t overfit. Prefer simpler models when AIC values are close.

Forecasting and Model Evaluation

Never evaluate on training data alone. Split your series and test on unseen data:

# Train/test split (80/20)
train_size = int(len(df) * 0.8)
train = df['Passengers_log'][:train_size]
test = df['Passengers_log'][train_size:]

# Fit model on training data
model = ARIMA(train, order=(1, 1, 1))
fitted_model = model.fit()

# Forecast
forecast_steps = len(test)
forecast = fitted_model.forecast(steps=forecast_steps)

# Transform back from log scale
forecast_original = np.exp(forecast)
test_original = np.exp(test)

# Plot results
plt.figure(figsize=(12, 5))
plt.plot(df.index[:train_size], np.exp(train), label='Training Data')
plt.plot(test.index, test_original, label='Actual', color='blue')
plt.plot(test.index, forecast_original, label='Forecast', color='red', linestyle='--')
plt.legend()
plt.title('ARIMA Forecast vs Actual')
plt.show()

# Calculate error metrics
from sklearn.metrics import mean_squared_error, mean_absolute_error

rmse = np.sqrt(mean_squared_error(test_original, forecast_original))
mae = mean_absolute_error(test_original, forecast_original)
mape = np.mean(np.abs((test_original - forecast_original) / test_original)) * 100

print(f'RMSE: {rmse:.2f}')
print(f'MAE: {mae:.2f}')
print(f'MAPE: {mape:.2f}%')

Get confidence intervals with get_forecast():

forecast_obj = fitted_model.get_forecast(steps=forecast_steps)
forecast_df = forecast_obj.summary_frame()

plt.figure(figsize=(12, 5))
plt.plot(test.index, test_original, label='Actual')
plt.plot(test.index, np.exp(forecast_df['mean']), label='Forecast', color='red')
plt.fill_between(test.index, 
                 np.exp(forecast_df['mean_ci_lower']), 
                 np.exp(forecast_df['mean_ci_upper']), 
                 alpha=0.3, color='red', label='95% CI')
plt.legend()
plt.show()

Advanced Models: SARIMA and SARIMAX

SARIMA adds seasonal components with parameters (P, D, Q, s) where s is the seasonal period:

# SARIMA for seasonal data
# (p,d,q) x (P,D,Q,s)
sarima_model = SARIMAX(train, 
                       order=(1, 1, 1), 
                       seasonal_order=(1, 1, 1, 12),
                       enforce_stationarity=False,
                       enforce_invertibility=False)

sarima_fitted = sarima_model.fit(disp=False)
print(sarima_fitted.summary())

sarima_forecast = sarima_fitted.forecast(steps=forecast_steps)

SARIMAX incorporates exogenous variables (external predictors):

# Example with external regressor
# Create dummy external variable (e.g., marketing spend)
df['marketing'] = np.random.rand(len(df)) * 1000

train_exog = df['marketing'][:train_size].values.reshape(-1, 1)
test_exog = df['marketing'][train_size:].values.reshape(-1, 1)

sarimax_model = SARIMAX(train, 
                        exog=train_exog,
                        order=(1, 1, 1),
                        seasonal_order=(1, 1, 1, 12))

sarimax_fitted = sarimax_model.fit(disp=False)
sarimax_forecast = sarimax_fitted.forecast(steps=forecast_steps, exog=test_exog)

Best Practices and Conclusion

Use Statsmodels when you need statistical rigor, interpretable parameters, and formal hypothesis testing. Switch to Prophet for automatic seasonality detection with minimal tuning, or LSTMs when you have massive datasets and complex non-linear patterns.

Common pitfalls to avoid:

Skipping stationarity tests—always verify with ADF
Overfitting with too many parameters—use AIC/BIC and validation sets
Ignoring residual diagnostics—check that residuals are white noise
Forgetting to transform forecasts back (if you used log/Box-Cox)

Here’s a complete workflow template:

# Complete ARIMA workflow
def arima_workflow(data, column, test_size=0.2):
    # 1. Prepare data
    series = data[column]
    train_size = int(len(series) * (1 - test_size))
    train, test = series[:train_size], series[train_size:]
    
    # 2. Check stationarity and transform if needed
    adf_test(train, 'Training Data')
    
    # 3. Grid search for parameters
    best_aic = np.inf
    best_order = None
    for p in range(3):
        for d in range(2):
            for q in range(3):
                try:
                    model = ARIMA(train, order=(p,d,q))
                    fitted = model.fit()
                    if fitted.aic < best_aic:
                        best_aic = fitted.aic
                        best_order = (p,d,q)
                except:
                    continue
    
    # 4. Fit best model
    final_model = ARIMA(train, order=best_order).fit()
    
    # 5. Forecast and evaluate
    forecast = final_model.forecast(steps=len(test))
    rmse = np.sqrt(mean_squared_error(test, forecast))
    
    print(f'Best Order: {best_order}')
    print(f'RMSE: {rmse:.2f}')
    
    return final_model, forecast

# Run workflow
model, predictions = arima_workflow(df, 'Passengers')

Statsmodels gives you the statistical foundation for time series analysis. Master stationarity testing, ACF/PACF interpretation, and proper validation, and you’ll build forecasts you can actually trust.