Pandas - Window Functions (rolling, expanding)
Window functions differ fundamentally from `groupby()` operations. While `groupby()` aggregates data into fewer rows, window functions maintain the original DataFrame shape while computing statistics...
Key Insights
- Window functions in Pandas enable calculations across sliding or expanding data windows without collapsing the DataFrame structure, essential for time-series analysis and moving statistics
- The
rolling()method creates fixed-size moving windows, whileexpanding()grows the window from the first observation, each serving distinct analytical purposes - Custom aggregation functions, multiple column operations, and proper handling of edge cases separate production-ready window function implementations from basic examples
Understanding Window Functions vs Group Operations
Window functions differ fundamentally from groupby() operations. While groupby() aggregates data into fewer rows, window functions maintain the original DataFrame shape while computing statistics across specified windows.
import pandas as pd
import numpy as np
# Sample time-series data
dates = pd.date_range('2024-01-01', periods=10, freq='D')
df = pd.DataFrame({
'date': dates,
'sales': [100, 150, 120, 180, 160, 140, 200, 190, 170, 210],
'returns': [5, 8, 6, 12, 10, 7, 15, 11, 9, 13]
})
# groupby collapses rows
monthly_avg = df.groupby(df['date'].dt.month)['sales'].mean()
print(f"Grouped result rows: {len(monthly_avg)}") # 1 row
# rolling maintains shape
rolling_avg = df['sales'].rolling(window=3).mean()
print(f"Rolling result rows: {len(rolling_avg)}") # 10 rows
Rolling Windows: Fixed-Size Analysis
The rolling() method creates a sliding window of fixed size. Each calculation includes only the specified number of observations.
# Basic rolling calculations
df['sales_ma_3'] = df['sales'].rolling(window=3).mean()
df['sales_std_3'] = df['sales'].rolling(window=3).std()
df['sales_min_3'] = df['sales'].rolling(window=3).min()
df['sales_max_3'] = df['sales'].rolling(window=3).max()
print(df[['date', 'sales', 'sales_ma_3', 'sales_std_3']].head(5))
Output shows NaN for the first two rows because a 3-day window requires three observations:
date sales sales_ma_3 sales_std_3
0 2024-01-01 100 NaN NaN
1 2024-01-02 150 NaN NaN
2 2024-01-03 120 123.333333 25.166115
3 2024-01-04 180 150.000000 30.000000
4 2024-01-05 160 153.333333 30.550505
Handling Edge Cases with min_periods
Control minimum observations required for valid calculations:
# Require at least 2 observations instead of full window
df['sales_ma_flexible'] = df['sales'].rolling(window=3, min_periods=2).mean()
# Compare with strict window requirement
df['sales_ma_strict'] = df['sales'].rolling(window=3, min_periods=3).mean()
print(df[['sales', 'sales_ma_flexible', 'sales_ma_strict']].head(4))
Time-Based Rolling Windows
For irregular time-series data, use time-based windows instead of observation counts:
# Create irregular time-series
irregular_dates = pd.to_datetime([
'2024-01-01', '2024-01-02', '2024-01-04',
'2024-01-07', '2024-01-08', '2024-01-12'
])
irregular_df = pd.DataFrame({
'date': irregular_dates,
'value': [10, 15, 20, 25, 30, 35]
})
irregular_df.set_index('date', inplace=True)
# 3-day rolling window (calendar days, not observations)
irregular_df['rolling_3d'] = irregular_df['value'].rolling('3D').mean()
# 7-day rolling window
irregular_df['rolling_7d'] = irregular_df['value'].rolling('7D').mean()
print(irregular_df)
This approach ensures consistent time-based analysis regardless of data frequency.
Expanding Windows: Cumulative Analysis
The expanding() method creates windows that grow from the first observation, useful for cumulative statistics:
# Cumulative calculations
df['cumulative_mean'] = df['sales'].expanding().mean()
df['cumulative_std'] = df['sales'].expanding().std()
df['cumulative_sum'] = df['sales'].expanding().sum()
# Running minimum and maximum
df['running_min'] = df['sales'].expanding().min()
df['running_max'] = df['sales'].expanding().max()
print(df[['date', 'sales', 'cumulative_mean', 'running_max']].head(6))
Practical Application: Cumulative Returns
# Calculate cumulative returns for investment analysis
df['daily_return'] = df['sales'].pct_change()
df['cumulative_return'] = (1 + df['daily_return']).expanding().apply(
lambda x: x.prod() - 1, raw=True
)
print(df[['date', 'sales', 'daily_return', 'cumulative_return']].head(6))
Custom Aggregation Functions
Apply complex logic using custom functions:
def calculate_volatility(window):
"""Calculate annualized volatility"""
if len(window) < 2:
return np.nan
return window.std() * np.sqrt(252) # Assuming daily data
def weighted_average(window):
"""Calculate linearly weighted average"""
weights = np.arange(1, len(window) + 1)
return np.average(window, weights=weights)
# Apply custom functions
df['volatility_30d'] = df['sales'].rolling(window=30, min_periods=2).apply(
calculate_volatility, raw=True
)
df['weighted_ma_5'] = df['sales'].rolling(window=5, min_periods=2).apply(
weighted_average, raw=True
)
The raw=True parameter passes NumPy arrays instead of Series objects, significantly improving performance.
Multi-Column Window Operations
Perform calculations across multiple columns simultaneously:
# Rolling correlation between sales and returns
df['sales_returns_corr'] = df['sales'].rolling(window=5).corr(df['returns'])
# Multiple aggregations at once
rolling_stats = df[['sales', 'returns']].rolling(window=3).agg({
'sales': ['mean', 'std', 'min', 'max'],
'returns': ['sum', 'mean']
})
print(rolling_stats.head(5))
# Flatten multi-level columns for easier access
rolling_stats.columns = ['_'.join(col).strip() for col in rolling_stats.columns]
Advanced Pattern: Exponentially Weighted Moving Average
While not strictly a window function, EWMA provides time-decay weighting:
# Standard moving average vs EWMA
df['sma_5'] = df['sales'].rolling(window=5).mean()
df['ewma_5'] = df['sales'].ewm(span=5, adjust=False).mean()
# EWMA gives more weight to recent observations
print(df[['date', 'sales', 'sma_5', 'ewma_5']].tail(6))
# Adjust parameter controls calculation method
df['ewma_adjusted'] = df['sales'].ewm(span=5, adjust=True).mean()
Performance Optimization
Window functions can be computationally expensive. Optimize with these techniques:
# Use raw=True for custom functions
%timeit df['sales'].rolling(5).apply(lambda x: x.mean(), raw=False)
# Slower: passes Series objects
%timeit df['sales'].rolling(5).apply(lambda x: x.mean(), raw=True)
# Faster: passes NumPy arrays
# Vectorize when possible
df['manual_ma'] = df['sales'].rolling(3).mean() # Vectorized
# Better than: df['sales'].rolling(3).apply(np.mean)
# Use engine parameter for Numba acceleration (Pandas 1.0+)
df['fast_custom'] = df['sales'].rolling(10).apply(
calculate_volatility, engine='numba', raw=True
)
Combining Rolling and Expanding with GroupBy
Apply window functions within groups:
# Add category column
df['category'] = ['A', 'B'] * 5
# Rolling average per category
df['category_rolling_ma'] = df.groupby('category')['sales'].transform(
lambda x: x.rolling(window=3, min_periods=1).mean()
)
# Expanding sum per category
df['category_cumsum'] = df.groupby('category')['sales'].transform(
lambda x: x.expanding().sum()
)
print(df[['date', 'category', 'sales', 'category_rolling_ma', 'category_cumsum']])
Practical Example: Bollinger Bands
Combine multiple window functions for technical analysis:
def calculate_bollinger_bands(df, window=20, num_std=2):
"""Calculate Bollinger Bands for price data"""
df = df.copy()
# Middle band: Simple moving average
df['bb_middle'] = df['sales'].rolling(window=window).mean()
# Standard deviation
rolling_std = df['sales'].rolling(window=window).std()
# Upper and lower bands
df['bb_upper'] = df['bb_middle'] + (rolling_std * num_std)
df['bb_lower'] = df['bb_middle'] - (rolling_std * num_std)
# Bandwidth indicator
df['bb_width'] = (df['bb_upper'] - df['bb_lower']) / df['bb_middle']
return df
df_with_bands = calculate_bollinger_bands(df, window=5, num_std=1.5)
print(df_with_bands[['date', 'sales', 'bb_lower', 'bb_middle', 'bb_upper']].tail())
Window functions in Pandas provide the foundation for time-series analysis, financial modeling, and statistical processing. Master the distinction between rolling and expanding windows, leverage custom functions for domain-specific logic, and optimize performance through vectorization and the raw parameter.