How to Forecast Time Series Data in Python

Key Insights

Time series forecasting requires specialized techniques that respect temporal dependencies—standard train/test splits and cross-validation don’t work because future data cannot inform past predictions
Start with simple statistical models like ARIMA before jumping to complex deep learning; classical methods often outperform neural networks on smaller datasets with clear seasonal patterns
Model evaluation must use time-aware metrics and backtesting strategies that simulate real-world deployment where you only have historical data to predict future values

Introduction to Time Series Forecasting

Time series forecasting is fundamentally different from standard machine learning problems. Your data has an inherent temporal order that cannot be shuffled, and patterns like trend, seasonality, and autocorrelation must be explicitly modeled.

Every time series consists of three components: trend (long-term direction), seasonality (repeating patterns), and noise (random variation). Understanding these components determines which forecasting method will work best.

Common applications include demand forecasting for inventory management, financial market prediction, energy consumption planning, and website traffic estimation. Let’s work through a practical example using monthly airline passenger data.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.datasets import get_rdataset

# Load airline passengers dataset
df = get_rdataset('AirPassengers').data
df['time'] = pd.date_range(start='1949-01', periods=len(df), freq='M')
df.set_index('time', inplace=True)
df.columns = ['passengers']

# Visualize the data
plt.figure(figsize=(12, 6))
plt.plot(df.index, df['passengers'])
plt.title('Monthly Airline Passengers (1949-1960)')
plt.xlabel('Date')
plt.ylabel('Number of Passengers')
plt.grid(True)
plt.show()

print(f"Dataset shape: {df.shape}")
print(f"Date range: {df.index.min()} to {df.index.max()}")

This dataset clearly shows both an upward trend and annual seasonality—perfect for demonstrating various forecasting techniques.

Data Preparation and Exploration

Before building models, you must verify that your data meets the assumptions of your chosen method. Most statistical forecasting techniques require stationarity—a time series whose statistical properties don’t change over time.

from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose

# Test for stationarity using Augmented Dickey-Fuller test
def check_stationarity(timeseries):
    result = adfuller(timeseries.dropna())
    print('ADF Statistic:', result[0])
    print('p-value:', result[1])
    print('Critical Values:')
    for key, value in result[4].items():
        print(f'\t{key}: {value}')
    
    if result[1] <= 0.05:
        print("Series is stationary")
    else:
        print("Series is non-stationary")

check_stationarity(df['passengers'])

# Decompose time series into components
decomposition = seasonal_decompose(df['passengers'], model='multiplicative', period=12)

fig, (ax1, ax2, ax3, ax4) = plt.subplots(4, 1, figsize=(12, 10))
decomposition.observed.plot(ax=ax1, title='Observed')
decomposition.trend.plot(ax=ax2, title='Trend')
decomposition.seasonal.plot(ax=ax3, title='Seasonal')
decomposition.resid.plot(ax=ax4, title='Residual')
plt.tight_layout()
plt.show()

For time series, never use random train/test splits. You must respect temporal order:

# Proper time series split: last 20% for testing
train_size = int(len(df) * 0.8)
train, test = df[:train_size], df[train_size:]

print(f"Training set: {train.index.min()} to {train.index.max()}")
print(f"Test set: {test.index.min()} to {test.index.max()}")

Classical Statistical Methods

ARIMA (AutoRegressive Integrated Moving Average) remains the workhorse of time series forecasting. It combines three components: AR (past values), I (differencing for stationarity), and MA (past forecast errors).

from statsmodels.tsa.arima.model import ARIMA

# Manual ARIMA with specified parameters (p, d, q)
model_arima = ARIMA(train['passengers'], order=(2, 1, 2))
fitted_arima = model_arima.fit()

# Forecast on test set
forecast_arima = fitted_arima.forecast(steps=len(test))
forecast_arima = pd.Series(forecast_arima, index=test.index)

print(fitted_arima.summary())

Parameter selection is tedious. Use auto ARIMA for automatic optimization:

from pmdarima import auto_arima

# Auto ARIMA finds optimal parameters
auto_model = auto_arima(train['passengers'], 
                        seasonal=True, 
                        m=12,  # seasonal period
                        suppress_warnings=True,
                        stepwise=True,
                        trace=True)

print(auto_model.summary())

# Generate forecasts
forecast_auto = auto_model.predict(n_periods=len(test))
forecast_auto = pd.Series(forecast_auto, index=test.index)

For data with strong seasonality, use exponential smoothing methods:

from statsmodels.tsa.holtwinters import ExponentialSmoothing

# Holt-Winters with trend and seasonality
model_hw = ExponentialSmoothing(train['passengers'], 
                                 seasonal_periods=12,
                                 trend='add',
                                 seasonal='add')
fitted_hw = model_hw.fit()
forecast_hw = fitted_hw.forecast(steps=len(test))
forecast_hw = pd.Series(forecast_hw, index=test.index)

Machine Learning Approaches

Facebook’s Prophet is designed for business forecasting with strong seasonal patterns and missing data:

from prophet import Prophet

# Prophet requires specific column names
df_prophet = df.reset_index().rename(columns={'time': 'ds', 'passengers': 'y'})
train_prophet = df_prophet[:train_size]
test_prophet = df_prophet[train_size:]

# Initialize and fit Prophet model
model_prophet = Prophet(yearly_seasonality=True,
                        weekly_seasonality=False,
                        daily_seasonality=False)
model_prophet.fit(train_prophet)

# Create future dataframe and predict
future = model_prophet.make_future_dataframe(periods=len(test), freq='M')
forecast_prophet = model_prophet.predict(future)

# Extract test period predictions
forecast_prophet_test = forecast_prophet.iloc[train_size:][['ds', 'yhat']].set_index('ds')

Traditional ML models need feature engineering to capture temporal patterns:

from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error

# Create lag features and rolling statistics
def create_features(df, lags=[1, 2, 3, 12]):
    df_features = df.copy()
    
    for lag in lags:
        df_features[f'lag_{lag}'] = df_features['passengers'].shift(lag)
    
    df_features['rolling_mean_3'] = df_features['passengers'].shift(1).rolling(window=3).mean()
    df_features['rolling_std_3'] = df_features['passengers'].shift(1).rolling(window=3).std()
    df_features['month'] = df_features.index.month
    
    return df_features.dropna()

df_ml = create_features(df)

# Split with features
train_ml = df_ml[:train_size]
test_ml = df_ml[train_size:]

X_train = train_ml.drop('passengers', axis=1)
y_train = train_ml['passengers']
X_test = test_ml.drop('passengers', axis=1)
y_test = test_ml['passengers']

# Train XGBoost
model_xgb = XGBRegressor(n_estimators=100, learning_rate=0.1, max_depth=5)
model_xgb.fit(X_train, y_train)
forecast_xgb = model_xgb.predict(X_test)

Deep Learning for Time Series

LSTMs excel at capturing long-term dependencies in sequential data:

import tensorflow as tf
from tensorflow import keras
from sklearn.preprocessing import MinMaxScaler

# Normalize data
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(df[['passengers']])

# Create sequences for LSTM
def create_sequences(data, seq_length):
    X, y = [], []
    for i in range(len(data) - seq_length):
        X.append(data[i:i+seq_length])
        y.append(data[i+seq_length])
    return np.array(X), np.array(y)

seq_length = 12
X, y = create_sequences(scaled_data, seq_length)

# Split data
train_size_lstm = int(len(X) * 0.8)
X_train_lstm, X_test_lstm = X[:train_size_lstm], X[train_size_lstm:]
y_train_lstm, y_test_lstm = y[:train_size_lstm], y[train_size_lstm:]

# Build LSTM model
model_lstm = keras.Sequential([
    keras.layers.LSTM(50, activation='relu', input_shape=(seq_length, 1)),
    keras.layers.Dense(25, activation='relu'),
    keras.layers.Dense(1)
])

model_lstm.compile(optimizer='adam', loss='mse')
model_lstm.fit(X_train_lstm, y_train_lstm, epochs=50, batch_size=16, verbose=0)

# Forecast
forecast_lstm_scaled = model_lstm.predict(X_test_lstm)
forecast_lstm = scaler.inverse_transform(forecast_lstm_scaled)

Model Evaluation and Selection

Time series metrics focus on forecast accuracy:

from sklearn.metrics import mean_absolute_error, mean_squared_error

def evaluate_forecast(actual, predicted, model_name):
    mae = mean_absolute_error(actual, predicted)
    rmse = np.sqrt(mean_squared_error(actual, predicted))
    mape = np.mean(np.abs((actual - predicted) / actual)) * 100
    
    print(f"\n{model_name} Performance:")
    print(f"MAE: {mae:.2f}")
    print(f"RMSE: {rmse:.2f}")
    print(f"MAPE: {mape:.2f}%")
    
    return {'MAE': mae, 'RMSE': rmse, 'MAPE': mape}

# Compare all models
results = {}
results['ARIMA'] = evaluate_forecast(test['passengers'], forecast_arima, 'ARIMA')
results['Prophet'] = evaluate_forecast(test['passengers'], forecast_prophet_test['yhat'], 'Prophet')
results['XGBoost'] = evaluate_forecast(y_test, forecast_xgb, 'XGBoost')

# Visualize forecasts
plt.figure(figsize=(14, 7))
plt.plot(train.index, train['passengers'], label='Training Data', color='blue')
plt.plot(test.index, test['passengers'], label='Actual', color='black', linewidth=2)
plt.plot(test.index, forecast_arima, label='ARIMA', linestyle='--')
plt.plot(test.index, forecast_prophet_test['yhat'], label='Prophet', linestyle='--')
plt.plot(test.index, forecast_xgb, label='XGBoost', linestyle='--')
plt.legend()
plt.title('Forecast Comparison')
plt.xlabel('Date')
plt.ylabel('Passengers')
plt.grid(True)
plt.show()

Conclusion and Best Practices

Choose your forecasting method based on data characteristics and constraints. For datasets with clear seasonality and trends under 10,000 observations, SARIMA or Prophet typically outperform complex models. They’re interpretable, require less data, and train faster.

Use gradient boosting or random forests when you have rich external features (promotions, holidays, weather) that influence your target. These models excel at capturing non-linear relationships that statistical methods miss.

Reserve deep learning for problems with massive datasets (100,000+ observations), complex patterns, or multivariate forecasting. LSTMs require significant tuning and computational resources but can model intricate temporal dependencies.

Always validate with proper backtesting—repeatedly train on historical data and test on subsequent periods to simulate production conditions. Never evaluate on data that came before your training set; this creates unrealistic performance estimates.

In production, monitor forecast accuracy continuously and retrain models as patterns shift. Time series data drifts more than static datasets, and yesterday’s best model may fail tomorrow. Build pipelines that automate retraining and alert you when performance degrades beyond acceptable thresholds.