How to Use Window Functions in Pandas
Window functions compute values across a 'window' of rows related to the current row. Unlike aggregation with `groupby()`, which collapses multiple rows into one, window functions preserve your...
Key Insights
- Window functions perform calculations across sliding subsets of rows without collapsing your data, unlike
groupby()which reduces rows to aggregated values. - Master three core methods—
rolling(),expanding(), andewm()—to handle most time series analysis tasks from simple moving averages to sophisticated financial indicators. - Combine window functions with
shift(),diff(), andpct_change()to create powerful lag features and period-over-period comparisons essential for forecasting and trend analysis.
Introduction to Window Functions
Window functions compute values across a “window” of rows related to the current row. Unlike aggregation with groupby(), which collapses multiple rows into one, window functions preserve your original row count while adding computed columns.
If you’ve used SQL window functions with OVER() clauses, pandas window functions serve the same purpose. They’re essential for time series analysis, financial calculations, and any scenario where you need context from surrounding rows.
Here’s the fundamental difference:
import pandas as pd
import numpy as np
# Sample data
df = pd.DataFrame({
'date': pd.date_range('2024-01-01', periods=10),
'sales': [100, 120, 90, 150, 130, 140, 160, 155, 170, 180]
})
# groupby: collapses to single value
total_sales = df['sales'].sum() # Returns: 1395
# rolling: preserves all rows, adds computed column
df['rolling_mean'] = df['sales'].rolling(window=3).mean()
print(df)
Output:
date sales rolling_mean
0 2024-01-01 100 NaN
1 2024-01-02 120 NaN
2 2024-01-03 90 103.333333
3 2024-01-04 150 120.000000
4 2024-01-05 130 123.333333
...
Use window functions when you need row-level results with contextual calculations. Use groupby() when you need summarized data.
Rolling Window Operations
The rolling() method is your workhorse for sliding window calculations. It moves a fixed-size window across your data, computing aggregations at each position.
Key parameters:
window: Size of the moving window (integer or offset string like ‘7D’)min_periods: Minimum observations required for a valid resultcenter: If True, labels are set at the center of the window
# Time series with daily data
df = pd.DataFrame({
'date': pd.date_range('2024-01-01', periods=30),
'revenue': np.random.randint(1000, 5000, 30)
})
df = df.set_index('date')
# 7-day rolling average
df['rolling_7d_avg'] = df['revenue'].rolling(window=7).mean()
# With minimum periods (get values sooner, with partial windows)
df['rolling_7d_min3'] = df['revenue'].rolling(window=7, min_periods=3).mean()
# Multiple aggregations at once
rolling_stats = df['revenue'].rolling(window=7).agg(['mean', 'std', 'min', 'max'])
print(rolling_stats.tail())
For datetime-indexed data, use offset strings for intuitive windows:
# Rolling window by time offset (handles irregular data)
df['rolling_1w'] = df['revenue'].rolling('7D').mean()
This approach handles missing dates gracefully—something integer windows can’t do.
Expanding Windows
While rolling() uses a fixed window size, expanding() grows the window from the start of the series to the current row. It’s perfect for cumulative calculations.
df = pd.DataFrame({
'date': pd.date_range('2024-01-01', periods=12, freq='M'),
'monthly_sales': [45000, 52000, 48000, 61000, 55000, 58000,
62000, 59000, 67000, 71000, 68000, 75000]
})
# Running total (cumulative sum)
df['ytd_sales'] = df['monthly_sales'].expanding().sum()
# Cumulative average
df['avg_to_date'] = df['monthly_sales'].expanding().mean()
# Cumulative max (useful for tracking peak performance)
df['peak_sales'] = df['monthly_sales'].expanding().max()
# Cumulative standard deviation
df['volatility'] = df['monthly_sales'].expanding().std()
print(df)
Expanding windows answer questions like “What’s the best we’ve ever done?” or “What’s our year-to-date total?"—queries that require all historical context.
Exponentially Weighted Windows (EWM)
Simple moving averages treat all observations equally. Exponentially weighted moving averages (EWM) give more weight to recent observations, making them more responsive to recent changes.
Three ways to specify decay:
span: Decay in terms of span (span=10 means recent values decay to ~37% after 10 periods)halflife: Time for observation to decay to half its weightalpha: Smoothing factor directly (0 < alpha ≤ 1)
# Stock price data
np.random.seed(42)
dates = pd.date_range('2024-01-01', periods=60)
prices = 100 + np.cumsum(np.random.randn(60) * 2)
df = pd.DataFrame({'date': dates, 'price': prices}).set_index('date')
# Simple 20-day moving average
df['sma_20'] = df['price'].rolling(window=20).mean()
# Exponential moving average (span=20 is roughly comparable)
df['ema_20'] = df['price'].ewm(span=20).mean()
# Shorter span = more responsive to recent prices
df['ema_10'] = df['price'].ewm(span=10).mean()
# Compare the responsiveness
print(df.tail(10))
EWM reacts faster to price changes than simple moving averages. Financial analysts prefer EWM for trend-following indicators because it reduces lag while still smoothing noise.
Custom Window Functions with apply()
When built-in aggregations aren’t enough, use apply() with custom functions:
def rolling_percentile(x, percentile=75):
"""Calculate rolling percentile."""
return np.percentile(x, percentile)
df = pd.DataFrame({
'value': np.random.randn(100)
})
# Rolling 75th percentile
df['rolling_p75'] = df['value'].rolling(window=20).apply(
lambda x: rolling_percentile(x, 75)
)
# For better performance with numpy operations, use raw=True
df['rolling_p75_fast'] = df['value'].rolling(window=20).apply(
lambda x: np.percentile(x, 75),
raw=True # Pass numpy array instead of Series
)
The raw=True parameter passes a numpy array instead of a pandas Series, which significantly speeds up computation for numpy-compatible functions.
For complex calculations:
def rolling_sharpe(returns, risk_free_rate=0.02/252):
"""Calculate rolling Sharpe ratio."""
excess_returns = returns - risk_free_rate
return np.mean(excess_returns) / np.std(excess_returns) if np.std(excess_returns) > 0 else 0
df['returns'] = df['value'].pct_change()
df['rolling_sharpe'] = df['returns'].rolling(window=20).apply(
rolling_sharpe, raw=True
)
Ranking and Shift Operations
These functions create lag features and period-over-period comparisons essential for time series analysis:
df = pd.DataFrame({
'date': pd.date_range('2024-01-01', periods=12, freq='M'),
'revenue': [100, 110, 105, 120, 115, 130, 125, 140, 135, 150, 145, 160]
})
# shift(): Access previous/future values
df['prev_month'] = df['revenue'].shift(1) # Lag by 1
df['next_month'] = df['revenue'].shift(-1) # Lead by 1
# diff(): Absolute change from previous period
df['mom_change'] = df['revenue'].diff() # Month-over-month change
df['qoq_change'] = df['revenue'].diff(3) # Quarter-over-quarter
# pct_change(): Percentage change
df['mom_growth'] = df['revenue'].pct_change() * 100
df['yoy_growth'] = df['revenue'].pct_change(12) * 100 # Year-over-year
# rank(): Rolling rank within window
df['rolling_rank'] = df['revenue'].rolling(window=6).apply(
lambda x: pd.Series(x).rank().iloc[-1]
)
print(df)
These operations are the foundation of feature engineering for forecasting models.
Practical Example: Financial Analysis Dashboard
Let’s combine everything into a real-world stock analysis with Bollinger Bands and a simplified RSI indicator:
import pandas as pd
import numpy as np
# Generate realistic stock data
np.random.seed(42)
dates = pd.date_range('2023-01-01', periods=252) # Trading year
returns = np.random.randn(252) * 0.02
prices = 100 * np.exp(np.cumsum(returns))
df = pd.DataFrame({
'date': dates,
'close': prices,
'volume': np.random.randint(1000000, 5000000, 252)
}).set_index('date')
# === Moving Averages ===
df['sma_20'] = df['close'].rolling(window=20).mean()
df['sma_50'] = df['close'].rolling(window=50).mean()
df['ema_12'] = df['close'].ewm(span=12).mean()
df['ema_26'] = df['close'].ewm(span=26).mean()
# === Bollinger Bands ===
df['bb_middle'] = df['close'].rolling(window=20).mean()
df['bb_std'] = df['close'].rolling(window=20).std()
df['bb_upper'] = df['bb_middle'] + (df['bb_std'] * 2)
df['bb_lower'] = df['bb_middle'] - (df['bb_std'] * 2)
# === RSI (Relative Strength Index) ===
delta = df['close'].diff()
gain = delta.where(delta > 0, 0)
loss = (-delta).where(delta < 0, 0)
avg_gain = gain.ewm(span=14, min_periods=14).mean()
avg_loss = loss.ewm(span=14, min_periods=14).mean()
rs = avg_gain / avg_loss
df['rsi'] = 100 - (100 / (1 + rs))
# === MACD ===
df['macd'] = df['ema_12'] - df['ema_26']
df['macd_signal'] = df['macd'].ewm(span=9).mean()
df['macd_histogram'] = df['macd'] - df['macd_signal']
# === Volume Analysis ===
df['volume_sma'] = df['volume'].rolling(window=20).mean()
df['volume_ratio'] = df['volume'] / df['volume_sma']
# === Performance Metrics ===
df['daily_return'] = df['close'].pct_change()
df['cumulative_return'] = (1 + df['daily_return']).expanding().apply(
lambda x: np.prod(x) - 1, raw=True
)
df['rolling_volatility'] = df['daily_return'].rolling(window=20).std() * np.sqrt(252)
# === Trading Signals ===
df['signal'] = np.where(
(df['close'] < df['bb_lower']) & (df['rsi'] < 30), 'BUY',
np.where(
(df['close'] > df['bb_upper']) & (df['rsi'] > 70), 'SELL', 'HOLD'
)
)
# Display summary
print(df[['close', 'sma_20', 'bb_upper', 'bb_lower', 'rsi', 'signal']].tail(10))
This analysis combines rolling windows (Bollinger Bands, SMA), exponential windows (EMA, RSI, MACD), and expanding windows (cumulative returns) into a comprehensive trading dashboard.
Window functions transform raw price data into actionable technical indicators. Master these patterns, and you’ll handle most time series analysis tasks with confidence.