Pandas - Date Range Generation (date_range)

• `pd.date_range()` generates sequences of datetime objects with flexible frequency options, essential for time series analysis and data resampling operations

Key Insights

pd.date_range() generates sequences of datetime objects with flexible frequency options, essential for time series analysis and data resampling operations • Understanding frequency aliases (‘D’, ‘B’, ‘H’, ‘MS’, ‘Q’) and normalization parameters enables precise control over datetime generation patterns • Combining date ranges with timezone handling, business day logic, and custom frequencies solves real-world data engineering challenges

Basic Date Range Generation

The pd.date_range() function creates a DatetimeIndex with evenly spaced datetime values. You specify either a start/end with periods, or start/end with frequency.

import pandas as pd
from datetime import datetime

# Generate 10 days starting from a specific date
daily_range = pd.date_range(start='2024-01-01', periods=10, freq='D')
print(daily_range)
# DatetimeIndex(['2024-01-01', '2024-01-02', ..., '2024-01-10'])

# Generate dates between start and end
date_span = pd.date_range(start='2024-01-01', end='2024-01-31', freq='D')
print(f"Days in January: {len(date_span)}")

# Using datetime objects
start_dt = datetime(2024, 1, 1)
end_dt = datetime(2024, 12, 31)
year_range = pd.date_range(start=start_dt, end=end_dt, freq='MS')
print(year_range)  # Month start dates for 2024

Frequency Aliases and Custom Intervals

Pandas supports dozens of frequency aliases for different time intervals. Common patterns include calendar days, business days, hours, and month boundaries.

# Business days only (excludes weekends)
business_days = pd.date_range(start='2024-01-01', periods=10, freq='B')
print(business_days)

# Hourly intervals
hourly = pd.date_range(start='2024-01-01', periods=24, freq='H')

# Every 4 hours
four_hourly = pd.date_range(start='2024-01-01', periods=6, freq='4H')

# Week starting on Monday
weekly = pd.date_range(start='2024-01-01', periods=12, freq='W-MON')

# Month end dates
month_end = pd.date_range(start='2024-01-01', end='2024-12-31', freq='M')

# Quarter start dates
quarters = pd.date_range(start='2024-01-01', periods=8, freq='QS')

# Annual frequency
yearly = pd.date_range(start='2020-01-01', periods=5, freq='A')
print(yearly)

Key frequency codes:

  • D: Calendar day
  • B: Business day
  • H: Hour
  • T or min: Minute
  • S: Second
  • MS: Month start
  • M: Month end
  • QS: Quarter start
  • Q: Quarter end
  • AS: Year start
  • A: Year end

Working with Business Days and Custom Calendars

Business day logic automatically excludes weekends. For custom holidays, use CustomBusinessDay.

from pandas.tseries.offsets import CustomBusinessDay
from pandas.tseries.holiday import USFederalHolidayCalendar

# Standard business days
bdays = pd.date_range(start='2024-01-01', end='2024-01-15', freq='B')
print(f"Business days: {len(bdays)}")

# Custom business days excluding US federal holidays
us_cal = USFederalHolidayCalendar()
cbd = CustomBusinessDay(calendar=us_cal)
us_bdays = pd.date_range(start='2024-01-01', end='2024-01-15', freq=cbd)
print(f"US business days: {len(us_bdays)}")

# Define custom holidays
from pandas.tseries.holiday import Holiday, AbstractHolidayCalendar

class CompanyHolidayCalendar(AbstractHolidayCalendar):
    rules = [
        Holiday('Company Founding Day', month=3, day=15),
        Holiday('Summer Break Start', month=7, day=1),
    ]

company_cal = CompanyHolidayCalendar()
company_bday = CustomBusinessDay(calendar=company_cal)
company_dates = pd.date_range(start='2024-03-01', end='2024-03-31', freq=company_bday)

Timezone-Aware Date Ranges

Generate date ranges in specific timezones for distributed systems or multi-region applications.

# UTC timezone
utc_range = pd.date_range(start='2024-01-01', periods=5, freq='D', tz='UTC')
print(utc_range)

# Specific timezone
ny_range = pd.date_range(start='2024-01-01', periods=24, freq='H', tz='America/New_York')

# Convert between timezones
tokyo_range = ny_range.tz_convert('Asia/Tokyo')
print(f"NY: {ny_range[0]}")
print(f"Tokyo: {tokyo_range[0]}")

# Localize naive datetime to timezone
naive_range = pd.date_range(start='2024-01-01', periods=5, freq='D')
localized = naive_range.tz_localize('Europe/London')

Normalization and Time Components

Control whether times are normalized to midnight and handle partial date specifications.

# Without normalization (preserves time components)
non_normalized = pd.date_range(start='2024-01-01 14:30:00', periods=5, freq='D')
print(non_normalized)
# Keeps 14:30:00 time

# With normalization (sets to midnight)
normalized = pd.date_range(start='2024-01-01 14:30:00', periods=5, freq='D', normalize=True)
print(normalized)
# All times set to 00:00:00

# Combining date and time frequencies
datetime_range = pd.date_range(
    start='2024-01-01 09:00:00',
    end='2024-01-01 17:00:00',
    freq='30T'  # 30-minute intervals
)
print(datetime_range)  # Business hours in 30-min slots

Practical Applications

Time Series Data Scaffolding

Create complete date indexes for time series data, ensuring no missing dates.

# Generate complete date range for merging
complete_dates = pd.date_range(start='2024-01-01', end='2024-01-31', freq='D')
df_scaffold = pd.DataFrame({'date': complete_dates})

# Simulated incomplete sales data
sales_data = pd.DataFrame({
    'date': pd.to_datetime(['2024-01-01', '2024-01-05', '2024-01-15']),
    'revenue': [1000, 1500, 2000]
})

# Merge to fill gaps
complete_sales = df_scaffold.merge(sales_data, on='date', how='left')
complete_sales['revenue'] = complete_sales['revenue'].fillna(0)
print(complete_sales.head(10))

Resampling and Aggregation Setup

# Create hourly data
hourly_dates = pd.date_range(start='2024-01-01', periods=168, freq='H')  # 1 week
import numpy as np

df = pd.DataFrame({
    'timestamp': hourly_dates,
    'value': np.random.randn(168).cumsum()
})
df.set_index('timestamp', inplace=True)

# Resample to daily using the generated range
daily_avg = df.resample('D').mean()
print(daily_avg)

# Business day aggregation
bday_sum = df.resample('B').sum()

Multi-Frequency Analysis

# Generate multiple date ranges for different aggregation levels
daily = pd.date_range(start='2024-01-01', end='2024-12-31', freq='D')
weekly = pd.date_range(start='2024-01-01', end='2024-12-31', freq='W-MON')
monthly = pd.date_range(start='2024-01-01', end='2024-12-31', freq='MS')

# Create hierarchical time structure
time_hierarchy = pd.DataFrame({
    'level': ['daily'] * len(daily) + ['weekly'] * len(weekly) + ['monthly'] * len(monthly),
    'date': list(daily) + list(weekly) + list(monthly)
})

print(f"Daily periods: {len(daily)}")
print(f"Weekly periods: {len(weekly)}")
print(f"Monthly periods: {len(monthly)}")

Custom Offset Patterns

from pandas.tseries.offsets import DateOffset, Week, MonthEnd

# Every 10 days
every_10_days = pd.date_range(start='2024-01-01', periods=10, freq='10D')

# Bi-weekly on Fridays
biweekly_fri = pd.date_range(start='2024-01-01', periods=12, freq='2W-FRI')

# Last business day of each month
month_end_bday = pd.date_range(start='2024-01-01', periods=12, freq='BM')

# Custom: 3rd Wednesday of each month
from pandas.tseries.offsets import Week, WeekOfMonth
third_wed = pd.date_range(start='2024-01-01', periods=12, freq=WeekOfMonth(week=2, weekday=2))
print(third_wed)

Performance Considerations

For large date ranges, date_range() is optimized but memory usage scales with period count.

# Efficient: Generate only what you need
large_range = pd.date_range(start='2000-01-01', end='2024-12-31', freq='D')
print(f"Memory size: {large_range.nbytes / 1024 / 1024:.2f} MB")

# For very large ranges, consider generators or chunking
def date_chunks(start, end, chunk_size='30D'):
    chunk_starts = pd.date_range(start=start, end=end, freq=chunk_size)
    for i in range(len(chunk_starts) - 1):
        yield pd.date_range(start=chunk_starts[i], end=chunk_starts[i+1], freq='D')

# Process in chunks
for chunk in date_chunks('2000-01-01', '2024-12-31'):
    # Process chunk
    pass

The date_range() function eliminates manual datetime arithmetic and ensures consistency across time series operations. Master frequency aliases and offset patterns to handle complex temporal data requirements efficiently.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.