Pandas - Date Range Generation (date_range)
• `pd.date_range()` generates sequences of datetime objects with flexible frequency options, essential for time series analysis and data resampling operations
Key Insights
• pd.date_range() generates sequences of datetime objects with flexible frequency options, essential for time series analysis and data resampling operations
• Understanding frequency aliases (‘D’, ‘B’, ‘H’, ‘MS’, ‘Q’) and normalization parameters enables precise control over datetime generation patterns
• Combining date ranges with timezone handling, business day logic, and custom frequencies solves real-world data engineering challenges
Basic Date Range Generation
The pd.date_range() function creates a DatetimeIndex with evenly spaced datetime values. You specify either a start/end with periods, or start/end with frequency.
import pandas as pd
from datetime import datetime
# Generate 10 days starting from a specific date
daily_range = pd.date_range(start='2024-01-01', periods=10, freq='D')
print(daily_range)
# DatetimeIndex(['2024-01-01', '2024-01-02', ..., '2024-01-10'])
# Generate dates between start and end
date_span = pd.date_range(start='2024-01-01', end='2024-01-31', freq='D')
print(f"Days in January: {len(date_span)}")
# Using datetime objects
start_dt = datetime(2024, 1, 1)
end_dt = datetime(2024, 12, 31)
year_range = pd.date_range(start=start_dt, end=end_dt, freq='MS')
print(year_range) # Month start dates for 2024
Frequency Aliases and Custom Intervals
Pandas supports dozens of frequency aliases for different time intervals. Common patterns include calendar days, business days, hours, and month boundaries.
# Business days only (excludes weekends)
business_days = pd.date_range(start='2024-01-01', periods=10, freq='B')
print(business_days)
# Hourly intervals
hourly = pd.date_range(start='2024-01-01', periods=24, freq='H')
# Every 4 hours
four_hourly = pd.date_range(start='2024-01-01', periods=6, freq='4H')
# Week starting on Monday
weekly = pd.date_range(start='2024-01-01', periods=12, freq='W-MON')
# Month end dates
month_end = pd.date_range(start='2024-01-01', end='2024-12-31', freq='M')
# Quarter start dates
quarters = pd.date_range(start='2024-01-01', periods=8, freq='QS')
# Annual frequency
yearly = pd.date_range(start='2020-01-01', periods=5, freq='A')
print(yearly)
Key frequency codes:
D: Calendar dayB: Business dayH: HourTormin: MinuteS: SecondMS: Month startM: Month endQS: Quarter startQ: Quarter endAS: Year startA: Year end
Working with Business Days and Custom Calendars
Business day logic automatically excludes weekends. For custom holidays, use CustomBusinessDay.
from pandas.tseries.offsets import CustomBusinessDay
from pandas.tseries.holiday import USFederalHolidayCalendar
# Standard business days
bdays = pd.date_range(start='2024-01-01', end='2024-01-15', freq='B')
print(f"Business days: {len(bdays)}")
# Custom business days excluding US federal holidays
us_cal = USFederalHolidayCalendar()
cbd = CustomBusinessDay(calendar=us_cal)
us_bdays = pd.date_range(start='2024-01-01', end='2024-01-15', freq=cbd)
print(f"US business days: {len(us_bdays)}")
# Define custom holidays
from pandas.tseries.holiday import Holiday, AbstractHolidayCalendar
class CompanyHolidayCalendar(AbstractHolidayCalendar):
rules = [
Holiday('Company Founding Day', month=3, day=15),
Holiday('Summer Break Start', month=7, day=1),
]
company_cal = CompanyHolidayCalendar()
company_bday = CustomBusinessDay(calendar=company_cal)
company_dates = pd.date_range(start='2024-03-01', end='2024-03-31', freq=company_bday)
Timezone-Aware Date Ranges
Generate date ranges in specific timezones for distributed systems or multi-region applications.
# UTC timezone
utc_range = pd.date_range(start='2024-01-01', periods=5, freq='D', tz='UTC')
print(utc_range)
# Specific timezone
ny_range = pd.date_range(start='2024-01-01', periods=24, freq='H', tz='America/New_York')
# Convert between timezones
tokyo_range = ny_range.tz_convert('Asia/Tokyo')
print(f"NY: {ny_range[0]}")
print(f"Tokyo: {tokyo_range[0]}")
# Localize naive datetime to timezone
naive_range = pd.date_range(start='2024-01-01', periods=5, freq='D')
localized = naive_range.tz_localize('Europe/London')
Normalization and Time Components
Control whether times are normalized to midnight and handle partial date specifications.
# Without normalization (preserves time components)
non_normalized = pd.date_range(start='2024-01-01 14:30:00', periods=5, freq='D')
print(non_normalized)
# Keeps 14:30:00 time
# With normalization (sets to midnight)
normalized = pd.date_range(start='2024-01-01 14:30:00', periods=5, freq='D', normalize=True)
print(normalized)
# All times set to 00:00:00
# Combining date and time frequencies
datetime_range = pd.date_range(
start='2024-01-01 09:00:00',
end='2024-01-01 17:00:00',
freq='30T' # 30-minute intervals
)
print(datetime_range) # Business hours in 30-min slots
Practical Applications
Time Series Data Scaffolding
Create complete date indexes for time series data, ensuring no missing dates.
# Generate complete date range for merging
complete_dates = pd.date_range(start='2024-01-01', end='2024-01-31', freq='D')
df_scaffold = pd.DataFrame({'date': complete_dates})
# Simulated incomplete sales data
sales_data = pd.DataFrame({
'date': pd.to_datetime(['2024-01-01', '2024-01-05', '2024-01-15']),
'revenue': [1000, 1500, 2000]
})
# Merge to fill gaps
complete_sales = df_scaffold.merge(sales_data, on='date', how='left')
complete_sales['revenue'] = complete_sales['revenue'].fillna(0)
print(complete_sales.head(10))
Resampling and Aggregation Setup
# Create hourly data
hourly_dates = pd.date_range(start='2024-01-01', periods=168, freq='H') # 1 week
import numpy as np
df = pd.DataFrame({
'timestamp': hourly_dates,
'value': np.random.randn(168).cumsum()
})
df.set_index('timestamp', inplace=True)
# Resample to daily using the generated range
daily_avg = df.resample('D').mean()
print(daily_avg)
# Business day aggregation
bday_sum = df.resample('B').sum()
Multi-Frequency Analysis
# Generate multiple date ranges for different aggregation levels
daily = pd.date_range(start='2024-01-01', end='2024-12-31', freq='D')
weekly = pd.date_range(start='2024-01-01', end='2024-12-31', freq='W-MON')
monthly = pd.date_range(start='2024-01-01', end='2024-12-31', freq='MS')
# Create hierarchical time structure
time_hierarchy = pd.DataFrame({
'level': ['daily'] * len(daily) + ['weekly'] * len(weekly) + ['monthly'] * len(monthly),
'date': list(daily) + list(weekly) + list(monthly)
})
print(f"Daily periods: {len(daily)}")
print(f"Weekly periods: {len(weekly)}")
print(f"Monthly periods: {len(monthly)}")
Custom Offset Patterns
from pandas.tseries.offsets import DateOffset, Week, MonthEnd
# Every 10 days
every_10_days = pd.date_range(start='2024-01-01', periods=10, freq='10D')
# Bi-weekly on Fridays
biweekly_fri = pd.date_range(start='2024-01-01', periods=12, freq='2W-FRI')
# Last business day of each month
month_end_bday = pd.date_range(start='2024-01-01', periods=12, freq='BM')
# Custom: 3rd Wednesday of each month
from pandas.tseries.offsets import Week, WeekOfMonth
third_wed = pd.date_range(start='2024-01-01', periods=12, freq=WeekOfMonth(week=2, weekday=2))
print(third_wed)
Performance Considerations
For large date ranges, date_range() is optimized but memory usage scales with period count.
# Efficient: Generate only what you need
large_range = pd.date_range(start='2000-01-01', end='2024-12-31', freq='D')
print(f"Memory size: {large_range.nbytes / 1024 / 1024:.2f} MB")
# For very large ranges, consider generators or chunking
def date_chunks(start, end, chunk_size='30D'):
chunk_starts = pd.date_range(start=start, end=end, freq=chunk_size)
for i in range(len(chunk_starts) - 1):
yield pd.date_range(start=chunk_starts[i], end=chunk_starts[i+1], freq='D')
# Process in chunks
for chunk in date_chunks('2000-01-01', '2024-12-31'):
# Process chunk
pass
The date_range() function eliminates manual datetime arithmetic and ensures consistency across time series operations. Master frequency aliases and offset patterns to handle complex temporal data requirements efficiently.