How to Resample Time Series Data in Pandas

Resampling is the process of changing the frequency of your time series data. If you have stock prices recorded every minute and need daily summaries, that's downsampling. If you have monthly revenue...

Key Insights

  • Resampling transforms time series data between frequencies—downsampling aggregates data (minutes to hours), while upsampling interpolates to fill gaps (monthly to daily)
  • The .resample() method requires a DatetimeIndex and returns a Resampler object that works like groupby, letting you chain aggregation methods like .mean(), .sum(), or .ohlc()
  • Use the closed and label parameters to control which edge of each time bin includes data points and how the resulting index is labeled—getting these wrong causes subtle off-by-one errors

Introduction to Time Series Resampling

Resampling is the process of changing the frequency of your time series data. If you have stock prices recorded every minute and need daily summaries, that’s downsampling. If you have monthly revenue figures and need to plot them alongside daily metrics, that’s upsampling.

Pandas makes both operations straightforward with the .resample() method, but the details matter. Choosing the wrong aggregation function or misunderstanding bin edges can introduce bugs that are hard to catch.

This guide covers the practical mechanics of resampling: how to aggregate high-frequency data into meaningful summaries, how to fill gaps when increasing frequency, and how to handle the edge cases that trip up most developers.

Before diving in, your DataFrame needs a DatetimeIndex. If your dates are in a regular column, you’ll need to convert and set them as the index first.

Setting Up Time Series Data

Let’s create realistic sample data to work with. We’ll generate minute-level stock price data that simulates a trading day:

import pandas as pd
import numpy as np

# Create minute-level data for a trading week
np.random.seed(42)
dates = pd.date_range(
    start='2024-01-08 09:30',
    end='2024-01-12 16:00',
    freq='min'
)

# Filter to market hours only (9:30 AM - 4:00 PM)
dates = dates[(dates.hour >= 9) & (dates.hour < 16) | 
              ((dates.hour == 9) & (dates.minute >= 30)) |
              ((dates.hour == 16) & (dates.minute == 0))]

# Simulate price movements with random walk
price_changes = np.random.randn(len(dates)) * 0.5
prices = 150 + np.cumsum(price_changes)

df = pd.DataFrame({
    'price': prices,
    'volume': np.random.randint(100, 10000, len(dates))
}, index=dates)

print(df.head(10))

If your data comes from a CSV or database with dates as strings, convert them first:

# Converting string dates to DatetimeIndex
raw_df = pd.read_csv('trades.csv')
raw_df['timestamp'] = pd.to_datetime(raw_df['timestamp'])
raw_df = raw_df.set_index('timestamp')

The index must be a DatetimeIndex for .resample() to work. Calling .resample() on a DataFrame with an integer or string index raises a TypeError.

Downsampling: Aggregating to Lower Frequency

Downsampling reduces data frequency by grouping multiple observations into single summary values. The .resample() method takes a frequency string and returns a Resampler object—similar to groupby—that you chain with an aggregation method.

Common frequency strings:

  • 'min' or 'T' — minute
  • 'H' — hour
  • 'D' — calendar day
  • 'W' — week
  • 'M' — month end
  • 'Q' — quarter end

Converting Minute Data to Daily OHLC Bars

Financial data often requires OHLC (Open, High, Low, Close) aggregation. Pandas has a built-in method for this:

# Create daily OHLC bars from minute data
daily_ohlc = df['price'].resample('D').ohlc()
print(daily_ohlc)

Output:

                  open        high         low       close
2024-01-08  150.246291  155.097498  148.714171  153.841573
2024-01-09  154.012847  157.234521  151.892347  155.234891
2024-01-10  155.891234  158.234567  152.123456  156.789012
...

For volume, you typically want the sum:

# Combine OHLC prices with total volume
daily_bars = df['price'].resample('D').ohlc()
daily_bars['volume'] = df['volume'].resample('D').sum()
print(daily_bars)

Weekly Sales Totals from Daily Data

For business metrics like sales, sum is usually the right aggregation:

# Sample daily sales data
sales_dates = pd.date_range('2024-01-01', periods=90, freq='D')
daily_sales = pd.DataFrame({
    'revenue': np.random.randint(1000, 5000, 90),
    'orders': np.random.randint(10, 100, 90)
}, index=sales_dates)

# Weekly totals
weekly_sales = daily_sales.resample('W').sum()
print(weekly_sales.head())

# Monthly averages
monthly_avg = daily_sales.resample('M').mean()
print(monthly_avg.head())

The aggregation method you choose depends entirely on what the data represents. Revenue gets summed; temperatures get averaged; inventory levels might use .last() to get end-of-period snapshots.

Upsampling: Increasing Data Frequency

Upsampling increases frequency, which creates gaps that need filling. If you have monthly data and resample to daily, Pandas doesn’t know what values should appear on the 2nd through 30th of each month.

# Monthly revenue data
monthly_dates = pd.date_range('2024-01-31', periods=6, freq='M')
monthly_revenue = pd.DataFrame({
    'revenue': [50000, 52000, 48000, 55000, 58000, 54000]
}, index=monthly_dates)

# Upsample to daily - creates NaN values
daily_revenue = monthly_revenue.resample('D').asfreq()
print(daily_revenue.head(35))

The result has NaN for every day except month-ends. You have three main options for filling:

# Forward fill: carry last known value forward
daily_ffill = monthly_revenue.resample('D').ffill()

# Backward fill: use next known value
daily_bfill = monthly_revenue.resample('D').bfill()

# Interpolate: linear interpolation between known points
daily_interp = monthly_revenue.resample('D').interpolate(method='linear')

print("Forward fill:")
print(daily_ffill.head(10))

print("\nInterpolated:")
print(daily_interp.head(10))

Forward fill is appropriate when values remain constant until updated (like subscription counts). Interpolation makes sense for continuous measurements where gradual change is expected.

Custom Aggregations and Multiple Columns

Real datasets often need different aggregations for different columns. Use .agg() with a dictionary:

# Different aggregations per column
daily_summary = df.resample('D').agg({
    'price': ['mean', 'std', 'first', 'last'],
    'volume': 'sum'
})

print(daily_summary)

This creates a MultiIndex column structure. To flatten it:

daily_summary.columns = ['_'.join(col).strip() for col in daily_summary.columns]
print(daily_summary.columns)
# Index(['price_mean', 'price_std', 'price_first', 'price_last', 'volume_sum'])

For custom aggregation logic, pass a function:

# Custom function: calculate intraday range as percentage
def intraday_range_pct(series):
    return (series.max() - series.min()) / series.mean() * 100

daily_volatility = df['price'].resample('D').apply(intraday_range_pct)
print(daily_volatility)

Handling Edge Cases

The closed and label Parameters

These parameters control how time bins are constructed and labeled. They’re critical for getting correct results, especially with financial data.

# Sample data: hourly observations
hourly_data = pd.DataFrame({
    'value': range(24)
}, index=pd.date_range('2024-01-15', periods=24, freq='H'))

# Default: closed='left', label='left'
# Bin [00:00, 06:00) labeled as 00:00
default_resample = hourly_data.resample('6H').sum()

# closed='right': bin (00:00, 06:00] 
right_closed = hourly_data.resample('6H', closed='right').sum()

# label='right': label bin with right edge
right_label = hourly_data.resample('6H', label='right').sum()

print("Default (closed='left', label='left'):")
print(default_resample)
print("\nclosed='right':")
print(right_closed)
print("\nlabel='right':")
print(right_label)

For daily resampling of financial data, you often want closed='right' so that midnight belongs to the previous day’s bin.

Dealing with Missing Periods

If your data has gaps, resampling still creates entries for missing periods:

# Data with a gap (missing Jan 10)
sparse_dates = pd.to_datetime(['2024-01-08', '2024-01-09', '2024-01-11', '2024-01-12'])
sparse_df = pd.DataFrame({'value': [100, 110, 130, 125]}, index=sparse_dates)

# Resampling fills the gap with NaN
complete = sparse_df.resample('D').mean()
print(complete)

# Fill missing periods explicitly
complete_filled = sparse_df.resample('D').mean().ffill()
print(complete_filled)

Timezone-Aware Resampling

Resampling respects timezone information, which matters when aggregating across DST boundaries:

# Timezone-aware data
tz_dates = pd.date_range('2024-03-09', periods=72, freq='H', tz='US/Eastern')
tz_df = pd.DataFrame({'value': range(72)}, index=tz_dates)

# Daily resampling respects timezone
daily_tz = tz_df.resample('D').sum()
print(daily_tz)

Conclusion

Here’s a quick reference for common frequency aliases:

Alias Description
min or T Minute
H Hour
D Calendar day
B Business day
W Week (Sunday end)
W-MON Week (Monday end)
M Month end
MS Month start
Q Quarter end
Y Year end

When to use .resample() vs .groupby() with Grouper: Use .resample() when your index is already a DatetimeIndex and you’re doing straightforward time-based aggregation. Use pd.Grouper(key='date_column', freq='D') inside .groupby() when you need to group by time and other columns simultaneously, or when your dates aren’t the index.

# Grouper example: aggregate by date AND category
df_with_category = df.copy()
df_with_category['category'] = np.random.choice(['A', 'B'], len(df))
df_with_category = df_with_category.reset_index()

grouped = df_with_category.groupby([
    pd.Grouper(key='index', freq='D'),
    'category'
]).agg({'price': 'mean', 'volume': 'sum'})

Resampling is one of those operations that seems simple until you hit edge cases. Test your aggregations on small samples where you can verify the results manually before running on production data.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.