Pandas - Window Functions (rolling, expanding)

Window functions differ fundamentally from `groupby()` operations. While `groupby()` aggregates data into fewer rows, window functions maintain the original DataFrame shape while computing statistics...

Key Insights

  • Window functions in Pandas enable calculations across sliding or expanding data windows without collapsing the DataFrame structure, essential for time-series analysis and moving statistics
  • The rolling() method creates fixed-size moving windows, while expanding() grows the window from the first observation, each serving distinct analytical purposes
  • Custom aggregation functions, multiple column operations, and proper handling of edge cases separate production-ready window function implementations from basic examples

Understanding Window Functions vs Group Operations

Window functions differ fundamentally from groupby() operations. While groupby() aggregates data into fewer rows, window functions maintain the original DataFrame shape while computing statistics across specified windows.

import pandas as pd
import numpy as np

# Sample time-series data
dates = pd.date_range('2024-01-01', periods=10, freq='D')
df = pd.DataFrame({
    'date': dates,
    'sales': [100, 150, 120, 180, 160, 140, 200, 190, 170, 210],
    'returns': [5, 8, 6, 12, 10, 7, 15, 11, 9, 13]
})

# groupby collapses rows
monthly_avg = df.groupby(df['date'].dt.month)['sales'].mean()
print(f"Grouped result rows: {len(monthly_avg)}")  # 1 row

# rolling maintains shape
rolling_avg = df['sales'].rolling(window=3).mean()
print(f"Rolling result rows: {len(rolling_avg)}")  # 10 rows

Rolling Windows: Fixed-Size Analysis

The rolling() method creates a sliding window of fixed size. Each calculation includes only the specified number of observations.

# Basic rolling calculations
df['sales_ma_3'] = df['sales'].rolling(window=3).mean()
df['sales_std_3'] = df['sales'].rolling(window=3).std()
df['sales_min_3'] = df['sales'].rolling(window=3).min()
df['sales_max_3'] = df['sales'].rolling(window=3).max()

print(df[['date', 'sales', 'sales_ma_3', 'sales_std_3']].head(5))

Output shows NaN for the first two rows because a 3-day window requires three observations:

        date  sales  sales_ma_3  sales_std_3
0 2024-01-01    100         NaN          NaN
1 2024-01-02    150         NaN          NaN
2 2024-01-03    120  123.333333    25.166115
3 2024-01-04    180  150.000000    30.000000
4 2024-01-05    160  153.333333    30.550505

Handling Edge Cases with min_periods

Control minimum observations required for valid calculations:

# Require at least 2 observations instead of full window
df['sales_ma_flexible'] = df['sales'].rolling(window=3, min_periods=2).mean()

# Compare with strict window requirement
df['sales_ma_strict'] = df['sales'].rolling(window=3, min_periods=3).mean()

print(df[['sales', 'sales_ma_flexible', 'sales_ma_strict']].head(4))

Time-Based Rolling Windows

For irregular time-series data, use time-based windows instead of observation counts:

# Create irregular time-series
irregular_dates = pd.to_datetime([
    '2024-01-01', '2024-01-02', '2024-01-04', 
    '2024-01-07', '2024-01-08', '2024-01-12'
])
irregular_df = pd.DataFrame({
    'date': irregular_dates,
    'value': [10, 15, 20, 25, 30, 35]
})
irregular_df.set_index('date', inplace=True)

# 3-day rolling window (calendar days, not observations)
irregular_df['rolling_3d'] = irregular_df['value'].rolling('3D').mean()

# 7-day rolling window
irregular_df['rolling_7d'] = irregular_df['value'].rolling('7D').mean()

print(irregular_df)

This approach ensures consistent time-based analysis regardless of data frequency.

Expanding Windows: Cumulative Analysis

The expanding() method creates windows that grow from the first observation, useful for cumulative statistics:

# Cumulative calculations
df['cumulative_mean'] = df['sales'].expanding().mean()
df['cumulative_std'] = df['sales'].expanding().std()
df['cumulative_sum'] = df['sales'].expanding().sum()

# Running minimum and maximum
df['running_min'] = df['sales'].expanding().min()
df['running_max'] = df['sales'].expanding().max()

print(df[['date', 'sales', 'cumulative_mean', 'running_max']].head(6))

Practical Application: Cumulative Returns

# Calculate cumulative returns for investment analysis
df['daily_return'] = df['sales'].pct_change()
df['cumulative_return'] = (1 + df['daily_return']).expanding().apply(
    lambda x: x.prod() - 1, raw=True
)

print(df[['date', 'sales', 'daily_return', 'cumulative_return']].head(6))

Custom Aggregation Functions

Apply complex logic using custom functions:

def calculate_volatility(window):
    """Calculate annualized volatility"""
    if len(window) < 2:
        return np.nan
    return window.std() * np.sqrt(252)  # Assuming daily data

def weighted_average(window):
    """Calculate linearly weighted average"""
    weights = np.arange(1, len(window) + 1)
    return np.average(window, weights=weights)

# Apply custom functions
df['volatility_30d'] = df['sales'].rolling(window=30, min_periods=2).apply(
    calculate_volatility, raw=True
)

df['weighted_ma_5'] = df['sales'].rolling(window=5, min_periods=2).apply(
    weighted_average, raw=True
)

The raw=True parameter passes NumPy arrays instead of Series objects, significantly improving performance.

Multi-Column Window Operations

Perform calculations across multiple columns simultaneously:

# Rolling correlation between sales and returns
df['sales_returns_corr'] = df['sales'].rolling(window=5).corr(df['returns'])

# Multiple aggregations at once
rolling_stats = df[['sales', 'returns']].rolling(window=3).agg({
    'sales': ['mean', 'std', 'min', 'max'],
    'returns': ['sum', 'mean']
})

print(rolling_stats.head(5))

# Flatten multi-level columns for easier access
rolling_stats.columns = ['_'.join(col).strip() for col in rolling_stats.columns]

Advanced Pattern: Exponentially Weighted Moving Average

While not strictly a window function, EWMA provides time-decay weighting:

# Standard moving average vs EWMA
df['sma_5'] = df['sales'].rolling(window=5).mean()
df['ewma_5'] = df['sales'].ewm(span=5, adjust=False).mean()

# EWMA gives more weight to recent observations
print(df[['date', 'sales', 'sma_5', 'ewma_5']].tail(6))

# Adjust parameter controls calculation method
df['ewma_adjusted'] = df['sales'].ewm(span=5, adjust=True).mean()

Performance Optimization

Window functions can be computationally expensive. Optimize with these techniques:

# Use raw=True for custom functions
%timeit df['sales'].rolling(5).apply(lambda x: x.mean(), raw=False)
# Slower: passes Series objects

%timeit df['sales'].rolling(5).apply(lambda x: x.mean(), raw=True)
# Faster: passes NumPy arrays

# Vectorize when possible
df['manual_ma'] = df['sales'].rolling(3).mean()  # Vectorized
# Better than: df['sales'].rolling(3).apply(np.mean)

# Use engine parameter for Numba acceleration (Pandas 1.0+)
df['fast_custom'] = df['sales'].rolling(10).apply(
    calculate_volatility, engine='numba', raw=True
)

Combining Rolling and Expanding with GroupBy

Apply window functions within groups:

# Add category column
df['category'] = ['A', 'B'] * 5

# Rolling average per category
df['category_rolling_ma'] = df.groupby('category')['sales'].transform(
    lambda x: x.rolling(window=3, min_periods=1).mean()
)

# Expanding sum per category
df['category_cumsum'] = df.groupby('category')['sales'].transform(
    lambda x: x.expanding().sum()
)

print(df[['date', 'category', 'sales', 'category_rolling_ma', 'category_cumsum']])

Practical Example: Bollinger Bands

Combine multiple window functions for technical analysis:

def calculate_bollinger_bands(df, window=20, num_std=2):
    """Calculate Bollinger Bands for price data"""
    df = df.copy()
    
    # Middle band: Simple moving average
    df['bb_middle'] = df['sales'].rolling(window=window).mean()
    
    # Standard deviation
    rolling_std = df['sales'].rolling(window=window).std()
    
    # Upper and lower bands
    df['bb_upper'] = df['bb_middle'] + (rolling_std * num_std)
    df['bb_lower'] = df['bb_middle'] - (rolling_std * num_std)
    
    # Bandwidth indicator
    df['bb_width'] = (df['bb_upper'] - df['bb_lower']) / df['bb_middle']
    
    return df

df_with_bands = calculate_bollinger_bands(df, window=5, num_std=1.5)
print(df_with_bands[['date', 'sales', 'bb_lower', 'bb_middle', 'bb_upper']].tail())

Window functions in Pandas provide the foundation for time-series analysis, financial modeling, and statistical processing. Master the distinction between rolling and expanding windows, leverage custom functions for domain-specific logic, and optimize performance through vectorization and the raw parameter.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.