Pandas - Rolling Mean/Average
The `rolling()` method creates a window object that slides across your data, calculating the mean at each position. The most common use case involves a fixed-size window.
Key Insights
- Rolling averages smooth time-series data by calculating the mean over a sliding window, essential for trend analysis and noise reduction in financial, sensor, and business metrics data
- Pandas provides
rolling()with flexible window specifications (fixed periods, time-based windows, custom weights) and handles edge cases like missing data and minimum observation requirements - Performance optimization through vectorized operations and proper window configuration can process millions of rows efficiently, while understanding centered vs. trailing windows prevents look-ahead bias in predictive models
Basic Rolling Mean Calculation
The rolling() method creates a window object that slides across your data, calculating the mean at each position. The most common use case involves a fixed-size window.
import pandas as pd
import numpy as np
# Create sample data
dates = pd.date_range('2024-01-01', periods=10, freq='D')
values = [100, 105, 103, 108, 107, 112, 115, 113, 118, 120]
df = pd.DataFrame({'date': dates, 'price': values})
# Calculate 3-day rolling mean
df['rolling_mean_3'] = df['price'].rolling(window=3).mean()
print(df)
Output shows NaN for the first two rows because a 3-day window needs three data points:
date price rolling_mean_3
0 2024-01-01 100 NaN
1 2024-01-02 105 NaN
2 2024-01-03 103 102.666667
3 2024-01-04 108 105.333333
The min_periods parameter controls how many non-null values are required for a calculation:
# Calculate rolling mean with minimum 1 observation
df['rolling_mean_min1'] = df['price'].rolling(window=3, min_periods=1).mean()
# First row now shows 100.0 instead of NaN
Time-Based Rolling Windows
For irregular time-series or when you need calendar-aware windows, use time-based specifications instead of fixed row counts.
# Create irregular time series
irregular_dates = pd.to_datetime([
'2024-01-01', '2024-01-02', '2024-01-05',
'2024-01-08', '2024-01-09', '2024-01-15'
])
irregular_df = pd.DataFrame({
'date': irregular_dates,
'value': [10, 15, 20, 25, 30, 35]
})
irregular_df.set_index('date', inplace=True)
# 3-day rolling window based on datetime
irregular_df['rolling_3d'] = irregular_df['value'].rolling('3D').mean()
print(irregular_df)
This calculates the mean of all values within the previous 3 calendar days, regardless of how many observations exist:
value rolling_3d
date
2024-01-01 10 10.0
2024-01-02 15 12.5
2024-01-05 20 20.0
2024-01-08 25 25.0
2024-01-09 30 27.5
2024-01-15 35 35.0
Centered vs. Trailing Windows
By default, rolling windows are trailing (backward-looking). Centered windows place the current observation in the middle, useful for smoothing historical data but invalid for real-time predictions.
data = pd.DataFrame({
'value': [10, 20, 30, 40, 50, 60, 70]
})
# Trailing window (default)
data['trailing'] = data['value'].rolling(window=3).mean()
# Centered window
data['centered'] = data['value'].rolling(window=3, center=True).mean()
print(data)
Output demonstrates the alignment difference:
value trailing centered
0 10 NaN 20.0
1 20 NaN 30.0
2 30 20.0 40.0
3 40 30.0 50.0
4 50 40.0 60.0
5 60 50.0 NaN
6 70 60.0 NaN
Centered windows have NaN values at both ends, while trailing windows only at the beginning.
Multiple Rolling Windows and Aggregations
Calculate multiple rolling statistics simultaneously for comprehensive analysis:
stock_data = pd.DataFrame({
'price': [100, 102, 98, 105, 103, 107, 110, 108, 112, 115]
})
# Multiple window sizes
stock_data['sma_5'] = stock_data['price'].rolling(5).mean()
stock_data['sma_10'] = stock_data['price'].rolling(10).mean()
# Multiple aggregations on same window
rolling_window = stock_data['price'].rolling(5)
stock_data['mean_5'] = rolling_window.mean()
stock_data['std_5'] = rolling_window.std()
stock_data['min_5'] = rolling_window.min()
stock_data['max_5'] = rolling_window.max()
print(stock_data.tail())
Use agg() for cleaner multiple aggregation syntax:
result = stock_data['price'].rolling(5).agg(['mean', 'std', 'min', 'max'])
Weighted Rolling Averages
Apply custom weights to give different importance to observations within the window:
prices = pd.Series([100, 105, 103, 108, 107])
# Exponentially weighted moving average (more weight on recent values)
ewma = prices.ewm(span=3).mean()
# Custom weighted average using apply
def weighted_mean(x):
weights = np.array([0.1, 0.2, 0.3, 0.4])
if len(x) < 4:
return np.nan
return np.sum(x * weights) / weights.sum()
custom_wma = prices.rolling(4).apply(weighted_mean, raw=True)
print(f"Prices: {prices.values}")
print(f"EWMA: {ewma.values}")
print(f"Custom WMA: {custom_wma.values}")
The raw=True parameter passes NumPy arrays instead of Series objects, improving performance significantly.
Handling Missing Data
Rolling calculations handle NaN values based on min_periods configuration:
data_with_gaps = pd.Series([10, 20, np.nan, 40, 50, np.nan, 70])
# Skip NaN values in calculation
rolling_skipna = data_with_gaps.rolling(3, min_periods=2).mean()
# Strict: require all values present
rolling_strict = data_with_gaps.rolling(3, min_periods=3).mean()
print(pd.DataFrame({
'original': data_with_gaps,
'skip_nan': rolling_skipna,
'strict': rolling_strict
}))
Output shows how min_periods affects NaN handling:
original skip_nan strict
0 10.0 NaN NaN
1 20.0 15.0 NaN
2 NaN 15.0 NaN
3 40.0 30.0 NaN
4 50.0 45.0 NaN
5 NaN 50.0 NaN
6 70.0 60.0 NaN
GroupBy with Rolling Calculations
Apply rolling means to grouped data, essential for multi-entity datasets:
# Multiple stock prices
multi_stock = pd.DataFrame({
'date': pd.date_range('2024-01-01', periods=6).tolist() * 2,
'symbol': ['AAPL'] * 6 + ['GOOGL'] * 6,
'price': [150, 152, 151, 155, 154, 158, 2800, 2820, 2810, 2850, 2840, 2880]
})
# Rolling mean per stock
multi_stock['rolling_3'] = (multi_stock.groupby('symbol')['price']
.rolling(3, min_periods=1)
.mean()
.reset_index(level=0, drop=True))
print(multi_stock)
The reset_index(level=0, drop=True) ensures the index aligns correctly after groupby operations.
Performance Optimization
For large datasets, optimize rolling calculations through proper configuration:
import time
# Large dataset
large_df = pd.DataFrame({
'value': np.random.randn(1000000)
})
# Efficient: vectorized operation
start = time.time()
result1 = large_df['value'].rolling(100).mean()
time1 = time.time() - start
# Inefficient: custom function without raw=True
start = time.time()
result2 = large_df['value'].rolling(100).apply(lambda x: x.mean(), raw=False)
time2 = time.time() - start
# Efficient: custom function with raw=True
start = time.time()
result3 = large_df['value'].rolling(100).apply(lambda x: x.mean(), raw=True)
time3 = time.time() - start
print(f"Built-in mean: {time1:.4f}s")
print(f"Apply without raw: {time2:.4f}s")
print(f"Apply with raw: {time3:.4f}s")
Built-in aggregations like mean(), sum(), and std() are optimized Cython implementations, typically 10-100x faster than custom functions.
Practical Application: Signal Smoothing
Combine multiple rolling windows to identify trends and generate trading signals:
# Simulated stock data with noise
np.random.seed(42)
trend = np.linspace(100, 150, 100)
noise = np.random.normal(0, 5, 100)
prices = pd.Series(trend + noise)
# Calculate multiple SMAs
df_signals = pd.DataFrame({'price': prices})
df_signals['sma_10'] = prices.rolling(10).mean()
df_signals['sma_30'] = prices.rolling(30).mean()
# Generate signal: 1 when fast SMA > slow SMA (bullish)
df_signals['signal'] = (df_signals['sma_10'] > df_signals['sma_30']).astype(int)
# Identify crossover points
df_signals['crossover'] = df_signals['signal'].diff()
print(df_signals[df_signals['crossover'] != 0].head())
This pattern identifies golden cross (bullish) and death cross (bearish) signals commonly used in technical analysis. The rolling mean smooths out short-term volatility to reveal underlying trends.