How to Calculate Rolling Statistics in Polars

Rolling statistics—also called moving or sliding window statistics—compute aggregate values over a fixed-size window that moves through your data. They're essential for time series analysis, signal...

Key Insights

  • Polars rolling functions execute significantly faster than pandas equivalents, especially on large datasets, thanks to lazy evaluation and parallel execution
  • Time-based rolling windows using duration strings (“7d”, “30d”) handle irregular time series data correctly, unlike fixed row-count windows
  • Combining multiple rolling statistics in a single expression with select() avoids redundant passes over the data and maximizes performance

Introduction to Rolling Statistics

Rolling statistics—also called moving or sliding window statistics—compute aggregate values over a fixed-size window that moves through your data. They’re essential for time series analysis, signal processing, and any scenario where you need to smooth noisy data or detect trends.

Common applications include calculating moving averages for stock prices, detecting anomalies in sensor readings, and smoothing out seasonal fluctuations in sales data. The core idea is simple: instead of looking at individual data points, you examine a neighborhood of values around each point.

Polars handles rolling operations exceptionally well. In benchmarks, Polars consistently outperforms pandas by 5-20x on rolling calculations, primarily due to its Rust-based engine and automatic parallelization. Let’s start with a dataset we’ll use throughout this article:

import polars as pl
import numpy as np
from datetime import datetime, timedelta

# Create sample stock price data
np.random.seed(42)
n_days = 365

dates = [datetime(2024, 1, 1) + timedelta(days=i) for i in range(n_days)]
prices = 100 + np.cumsum(np.random.randn(n_days) * 2)  # Random walk

df = pl.DataFrame({
    "date": dates,
    "price": prices,
    "volume": np.random.randint(1000, 10000, n_days)
})

print(df.head())

Basic Rolling Functions

Polars provides straightforward methods for common rolling statistics. Each operates on a column and slides a window of specified size across the data:

# Basic rolling statistics
result = df.select(
    pl.col("date"),
    pl.col("price"),
    pl.col("price").rolling_mean(window_size=7).alias("rolling_mean_7d"),
    pl.col("price").rolling_sum(window_size=7).alias("rolling_sum_7d"),
    pl.col("price").rolling_std(window_size=7).alias("rolling_std_7d"),
    pl.col("price").rolling_min(window_size=7).alias("rolling_min_7d"),
    pl.col("price").rolling_max(window_size=7).alias("rolling_max_7d"),
)

print(result.head(10))

The window_size parameter defines how many rows the window includes. A 7-day rolling mean averages the current row plus the 6 preceding rows. Note that the first 6 rows will contain null values because there aren’t enough preceding values to fill the window.

Other useful rolling functions include rolling_median(), rolling_var() (variance), and rolling_quantile(). These cover most standard statistical needs.

Configuring Window Behavior

The default rolling behavior might not match your requirements. Polars offers several parameters to customize window calculations:

# Comparing different window configurations
result = df.select(
    pl.col("date"),
    pl.col("price"),
    
    # Default: trailing window, requires full window
    pl.col("price").rolling_mean(window_size=7).alias("default"),
    
    # Allow partial windows at the start
    pl.col("price").rolling_mean(window_size=7, min_periods=1).alias("min_periods_1"),
    
    # Centered window (current value in middle)
    pl.col("price").rolling_mean(window_size=7, center=True).alias("centered"),
)

print(result.head(10))

The min_periods parameter controls the minimum number of observations required to produce a result. Setting min_periods=1 means even the first row gets a value (just itself). This eliminates leading nulls but be aware that early values are computed from fewer observations.

The center parameter shifts the window so the current row sits in the middle rather than at the end. Centered windows are useful for smoothing when you don’t need real-time calculations and can look “into the future.”

Time-Based Rolling Windows

Fixed row-count windows assume evenly spaced data. Real-world time series often have gaps—weekends, holidays, or irregular event timestamps. Polars handles this with group_by_dynamic():

# Create irregular time series (missing some days)
irregular_dates = [datetime(2024, 1, 1) + timedelta(days=i) 
                   for i in range(100) if i % 7 != 0]  # Skip every 7th day
irregular_prices = 100 + np.cumsum(np.random.randn(len(irregular_dates)) * 2)

df_irregular = pl.DataFrame({
    "date": irregular_dates,
    "price": irregular_prices,
}).sort("date")

# Time-based rolling window using group_by_dynamic
result = df_irregular.rolling(
    index_column="date",
    period="7d",  # 7 calendar days, not 7 rows
).agg(
    pl.col("price").mean().alias("rolling_mean_7d"),
    pl.col("price").std().alias("rolling_std_7d"),
    pl.col("price").count().alias("observations_in_window"),
)

print(result.head(15))

The period parameter accepts duration strings: "7d" for 7 days, "2h" for 2 hours, "30m" for 30 minutes. This correctly handles calendar time rather than row counts, so a 7-day window always spans exactly 7 days regardless of how many data points exist.

For more control, use group_by_dynamic() with explicit offset and period settings:

# More control with group_by_dynamic
result = df_irregular.group_by_dynamic(
    index_column="date",
    every="1d",      # Evaluate every day
    period="7d",     # Look back 7 days
    closed="left",   # Include left boundary, exclude right
).agg(
    pl.col("price").mean().alias("rolling_mean"),
    pl.col("price").min().alias("rolling_min"),
    pl.col("price").max().alias("rolling_max"),
)

Custom Rolling Aggregations

When built-in functions aren’t enough, you can apply custom aggregations. The cleanest approach uses rolling() combined with aggregation expressions:

# Rolling quantile (75th percentile)
result = df.select(
    pl.col("date"),
    pl.col("price"),
    pl.col("price").rolling_quantile(quantile=0.75, window_size=14).alias("rolling_p75"),
    pl.col("price").rolling_quantile(quantile=0.25, window_size=14).alias("rolling_p25"),
)

# Rolling correlation between price and volume
# Use struct to pass multiple columns, then map
def rolling_correlation(window_size: int):
    return (
        pl.struct(["price", "volume"])
        .rolling_map(
            lambda s: np.corrcoef(
                s.struct.field("price").to_numpy(),
                s.struct.field("volume").to_numpy()
            )[0, 1],
            window_size=window_size,
        )
    )

result_corr = df.select(
    pl.col("date"),
    rolling_correlation(30).alias("rolling_corr_30d"),
)

print(result_corr.tail(10))

For simpler custom functions, rolling_map() works directly:

# Custom: rolling range (max - min)
result = df.select(
    pl.col("date"),
    pl.col("price"),
    pl.col("price").rolling_map(
        lambda s: s.max() - s.min(),
        window_size=7
    ).alias("rolling_range_7d"),
)

Be aware that rolling_map() with Python functions is slower than native Polars expressions. Use built-in functions whenever possible.

Performance Optimization

For large datasets, lazy evaluation dramatically improves performance. Polars optimizes the entire query plan before execution:

import time

# Create larger dataset for benchmarking
large_df = pl.DataFrame({
    "date": [datetime(2020, 1, 1) + timedelta(hours=i) for i in range(1_000_000)],
    "value": np.random.randn(1_000_000),
})

# Eager execution
start = time.perf_counter()
result_eager = large_df.select(
    pl.col("value").rolling_mean(window_size=24).alias("mean_24h"),
    pl.col("value").rolling_std(window_size=24).alias("std_24h"),
    pl.col("value").rolling_min(window_size=168).alias("min_7d"),
    pl.col("value").rolling_max(window_size=168).alias("max_7d"),
)
eager_time = time.perf_counter() - start

# Lazy execution
start = time.perf_counter()
result_lazy = (
    large_df.lazy()
    .select(
        pl.col("value").rolling_mean(window_size=24).alias("mean_24h"),
        pl.col("value").rolling_std(window_size=24).alias("std_24h"),
        pl.col("value").rolling_min(window_size=168).alias("min_7d"),
        pl.col("value").rolling_max(window_size=168).alias("max_7d"),
    )
    .collect()
)
lazy_time = time.perf_counter() - start

print(f"Eager: {eager_time:.3f}s")
print(f"Lazy:  {lazy_time:.3f}s")

Lazy evaluation enables predicate pushdown and projection optimization. If you later filter results or select only certain columns, Polars avoids computing unnecessary values.

For memory-constrained environments, process data in chunks using scan_csv() or scan_parquet() with streaming:

# Streaming for memory efficiency
result = (
    pl.scan_parquet("large_dataset.parquet")
    .select(
        pl.col("timestamp"),
        pl.col("value").rolling_mean(window_size=100),
    )
    .collect(streaming=True)
)

Practical Example: Financial Analysis Dashboard

Let’s combine everything into a realistic financial analysis calculating Bollinger Bands and multiple moving averages:

def calculate_technical_indicators(df: pl.DataFrame) -> pl.DataFrame:
    """Calculate common technical indicators using rolling statistics."""
    
    return df.select(
        # Original data
        pl.col("date"),
        pl.col("price"),
        pl.col("volume"),
        
        # Moving averages
        pl.col("price").rolling_mean(window_size=20).alias("sma_20"),
        pl.col("price").rolling_mean(window_size=50).alias("sma_50"),
        pl.col("price").rolling_mean(window_size=200, min_periods=50).alias("sma_200"),
        
        # Bollinger Bands (20-day SMA ± 2 standard deviations)
        pl.col("price").rolling_mean(window_size=20).alias("bb_middle"),
        (pl.col("price").rolling_mean(window_size=20) + 
         2 * pl.col("price").rolling_std(window_size=20)).alias("bb_upper"),
        (pl.col("price").rolling_mean(window_size=20) - 
         2 * pl.col("price").rolling_std(window_size=20)).alias("bb_lower"),
        
        # Volume analysis
        pl.col("volume").rolling_mean(window_size=20).alias("avg_volume_20d"),
        (pl.col("volume") / pl.col("volume").rolling_mean(window_size=20))
            .alias("relative_volume"),
        
        # Volatility (rolling standard deviation)
        pl.col("price").rolling_std(window_size=20).alias("volatility_20d"),
        
        # Price momentum (current vs 20-day ago)
        (pl.col("price") - pl.col("price").shift(20)).alias("momentum_20d"),
        
        # Rolling high/low
        pl.col("price").rolling_max(window_size=52).alias("52_week_high"),
        pl.col("price").rolling_min(window_size=52).alias("52_week_low"),
    ).with_columns(
        # Derived: distance from 52-week high (percentage)
        ((pl.col("52_week_high") - pl.col("price")) / pl.col("52_week_high") * 100)
            .alias("pct_from_high"),
    )

# Apply to our data
technical_df = calculate_technical_indicators(df)
print(technical_df.tail(10))

This single expression computes 14 derived columns efficiently. Polars executes these in parallel where possible, and the lazy engine would optimize away any unused columns if you only selected a subset.

Rolling statistics in Polars are both powerful and performant. Start with built-in functions, use time-based windows for irregular data, and leverage lazy evaluation for large datasets. The API is intuitive enough for quick analysis yet flexible enough for production pipelines.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.