Pandas - Rolling Mean/Average

The `rolling()` method creates a window object that slides across your data, calculating the mean at each position. The most common use case involves a fixed-size window.

Key Insights

  • Rolling averages smooth time-series data by calculating the mean over a sliding window, essential for trend analysis and noise reduction in financial, sensor, and business metrics data
  • Pandas provides rolling() with flexible window specifications (fixed periods, time-based windows, custom weights) and handles edge cases like missing data and minimum observation requirements
  • Performance optimization through vectorized operations and proper window configuration can process millions of rows efficiently, while understanding centered vs. trailing windows prevents look-ahead bias in predictive models

Basic Rolling Mean Calculation

The rolling() method creates a window object that slides across your data, calculating the mean at each position. The most common use case involves a fixed-size window.

import pandas as pd
import numpy as np

# Create sample data
dates = pd.date_range('2024-01-01', periods=10, freq='D')
values = [100, 105, 103, 108, 107, 112, 115, 113, 118, 120]
df = pd.DataFrame({'date': dates, 'price': values})

# Calculate 3-day rolling mean
df['rolling_mean_3'] = df['price'].rolling(window=3).mean()

print(df)

Output shows NaN for the first two rows because a 3-day window needs three data points:

        date  price  rolling_mean_3
0 2024-01-01    100             NaN
1 2024-01-02    105             NaN
2 2024-01-03    103       102.666667
3 2024-01-04    108       105.333333

The min_periods parameter controls how many non-null values are required for a calculation:

# Calculate rolling mean with minimum 1 observation
df['rolling_mean_min1'] = df['price'].rolling(window=3, min_periods=1).mean()

# First row now shows 100.0 instead of NaN

Time-Based Rolling Windows

For irregular time-series or when you need calendar-aware windows, use time-based specifications instead of fixed row counts.

# Create irregular time series
irregular_dates = pd.to_datetime([
    '2024-01-01', '2024-01-02', '2024-01-05', 
    '2024-01-08', '2024-01-09', '2024-01-15'
])
irregular_df = pd.DataFrame({
    'date': irregular_dates,
    'value': [10, 15, 20, 25, 30, 35]
})
irregular_df.set_index('date', inplace=True)

# 3-day rolling window based on datetime
irregular_df['rolling_3d'] = irregular_df['value'].rolling('3D').mean()

print(irregular_df)

This calculates the mean of all values within the previous 3 calendar days, regardless of how many observations exist:

            value  rolling_3d
date                        
2024-01-01     10        10.0
2024-01-02     15        12.5
2024-01-05     20        20.0
2024-01-08     25        25.0
2024-01-09     30        27.5
2024-01-15     35        35.0

Centered vs. Trailing Windows

By default, rolling windows are trailing (backward-looking). Centered windows place the current observation in the middle, useful for smoothing historical data but invalid for real-time predictions.

data = pd.DataFrame({
    'value': [10, 20, 30, 40, 50, 60, 70]
})

# Trailing window (default)
data['trailing'] = data['value'].rolling(window=3).mean()

# Centered window
data['centered'] = data['value'].rolling(window=3, center=True).mean()

print(data)

Output demonstrates the alignment difference:

   value  trailing  centered
0     10       NaN      20.0
1     20       NaN      30.0
2     30      20.0      40.0
3     40      30.0      50.0
4     50      40.0      60.0
5     60      50.0       NaN
6     70      60.0       NaN

Centered windows have NaN values at both ends, while trailing windows only at the beginning.

Multiple Rolling Windows and Aggregations

Calculate multiple rolling statistics simultaneously for comprehensive analysis:

stock_data = pd.DataFrame({
    'price': [100, 102, 98, 105, 103, 107, 110, 108, 112, 115]
})

# Multiple window sizes
stock_data['sma_5'] = stock_data['price'].rolling(5).mean()
stock_data['sma_10'] = stock_data['price'].rolling(10).mean()

# Multiple aggregations on same window
rolling_window = stock_data['price'].rolling(5)
stock_data['mean_5'] = rolling_window.mean()
stock_data['std_5'] = rolling_window.std()
stock_data['min_5'] = rolling_window.min()
stock_data['max_5'] = rolling_window.max()

print(stock_data.tail())

Use agg() for cleaner multiple aggregation syntax:

result = stock_data['price'].rolling(5).agg(['mean', 'std', 'min', 'max'])

Weighted Rolling Averages

Apply custom weights to give different importance to observations within the window:

prices = pd.Series([100, 105, 103, 108, 107])

# Exponentially weighted moving average (more weight on recent values)
ewma = prices.ewm(span=3).mean()

# Custom weighted average using apply
def weighted_mean(x):
    weights = np.array([0.1, 0.2, 0.3, 0.4])
    if len(x) < 4:
        return np.nan
    return np.sum(x * weights) / weights.sum()

custom_wma = prices.rolling(4).apply(weighted_mean, raw=True)

print(f"Prices: {prices.values}")
print(f"EWMA: {ewma.values}")
print(f"Custom WMA: {custom_wma.values}")

The raw=True parameter passes NumPy arrays instead of Series objects, improving performance significantly.

Handling Missing Data

Rolling calculations handle NaN values based on min_periods configuration:

data_with_gaps = pd.Series([10, 20, np.nan, 40, 50, np.nan, 70])

# Skip NaN values in calculation
rolling_skipna = data_with_gaps.rolling(3, min_periods=2).mean()

# Strict: require all values present
rolling_strict = data_with_gaps.rolling(3, min_periods=3).mean()

print(pd.DataFrame({
    'original': data_with_gaps,
    'skip_nan': rolling_skipna,
    'strict': rolling_strict
}))

Output shows how min_periods affects NaN handling:

   original  skip_nan  strict
0      10.0       NaN     NaN
1      20.0      15.0     NaN
2       NaN      15.0     NaN
3      40.0      30.0     NaN
4      50.0      45.0     NaN
5       NaN      50.0     NaN
6      70.0      60.0     NaN

GroupBy with Rolling Calculations

Apply rolling means to grouped data, essential for multi-entity datasets:

# Multiple stock prices
multi_stock = pd.DataFrame({
    'date': pd.date_range('2024-01-01', periods=6).tolist() * 2,
    'symbol': ['AAPL'] * 6 + ['GOOGL'] * 6,
    'price': [150, 152, 151, 155, 154, 158, 2800, 2820, 2810, 2850, 2840, 2880]
})

# Rolling mean per stock
multi_stock['rolling_3'] = (multi_stock.groupby('symbol')['price']
                            .rolling(3, min_periods=1)
                            .mean()
                            .reset_index(level=0, drop=True))

print(multi_stock)

The reset_index(level=0, drop=True) ensures the index aligns correctly after groupby operations.

Performance Optimization

For large datasets, optimize rolling calculations through proper configuration:

import time

# Large dataset
large_df = pd.DataFrame({
    'value': np.random.randn(1000000)
})

# Efficient: vectorized operation
start = time.time()
result1 = large_df['value'].rolling(100).mean()
time1 = time.time() - start

# Inefficient: custom function without raw=True
start = time.time()
result2 = large_df['value'].rolling(100).apply(lambda x: x.mean(), raw=False)
time2 = time.time() - start

# Efficient: custom function with raw=True
start = time.time()
result3 = large_df['value'].rolling(100).apply(lambda x: x.mean(), raw=True)
time3 = time.time() - start

print(f"Built-in mean: {time1:.4f}s")
print(f"Apply without raw: {time2:.4f}s")
print(f"Apply with raw: {time3:.4f}s")

Built-in aggregations like mean(), sum(), and std() are optimized Cython implementations, typically 10-100x faster than custom functions.

Practical Application: Signal Smoothing

Combine multiple rolling windows to identify trends and generate trading signals:

# Simulated stock data with noise
np.random.seed(42)
trend = np.linspace(100, 150, 100)
noise = np.random.normal(0, 5, 100)
prices = pd.Series(trend + noise)

# Calculate multiple SMAs
df_signals = pd.DataFrame({'price': prices})
df_signals['sma_10'] = prices.rolling(10).mean()
df_signals['sma_30'] = prices.rolling(30).mean()

# Generate signal: 1 when fast SMA > slow SMA (bullish)
df_signals['signal'] = (df_signals['sma_10'] > df_signals['sma_30']).astype(int)

# Identify crossover points
df_signals['crossover'] = df_signals['signal'].diff()

print(df_signals[df_signals['crossover'] != 0].head())

This pattern identifies golden cross (bullish) and death cross (bearish) signals commonly used in technical analysis. The rolling mean smooths out short-term volatility to reveal underlying trends.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.