How to Work with DateTime in Polars

Key Insights

Polars’ datetime operations are built on Apache Arrow’s temporal types, delivering significantly faster performance than pandas while maintaining an intuitive API through the .dt namespace.
The group_by_dynamic() method is Polars’ killer feature for time-series analysis, enabling window-based aggregations that would require awkward workarounds in other libraries.
Always work with timezone-aware datetimes from the start—retrofitting timezone handling into existing pipelines creates subtle bugs that are painful to debug.

Introduction to DateTime in Polars

Polars handles datetime operations differently than pandas, and that difference matters for performance. While pandas datetime operations often fall back to Python objects or require vectorized workarounds, Polars leverages Apache Arrow’s native temporal types throughout. This means datetime parsing, arithmetic, and grouping operations run at near-native speed without the GIL bottlenecks that plague pandas.

The practical impact? Operations that take minutes on multi-million row datasets in pandas complete in seconds with Polars. But speed isn’t the only advantage—Polars’ datetime API is also more consistent and less prone to the timezone-related surprises that pandas users know too well.

Creating DateTime Columns

Most datetime work starts with parsing strings. Polars provides str.to_datetime() for this, with automatic format inference that actually works:

import polars as pl

df = pl.DataFrame({
    "timestamp_str": [
        "2024-01-15 14:30:00",
        "2024-02-20 09:15:30",
        "2024-03-25 18:45:00"
    ]
})

# Automatic format inference
df = df.with_columns(
    pl.col("timestamp_str").str.to_datetime().alias("timestamp")
)

When automatic inference fails or you need explicit control, specify the format string:

df = pl.DataFrame({
    "date_str": ["15/01/2024", "20/02/2024", "25/03/2024"]
})

df = df.with_columns(
    pl.col("date_str").str.to_datetime("%d/%m/%Y").alias("date")
)

For creating datetimes from component columns, use pl.datetime():

df = pl.DataFrame({
    "year": [2024, 2024, 2024],
    "month": [1, 2, 3],
    "day": [15, 20, 25],
    "hour": [14, 9, 18]
})

df = df.with_columns(
    pl.datetime(
        pl.col("year"),
        pl.col("month"),
        pl.col("day"),
        pl.col("hour")
    ).alias("timestamp")
)

You can also create datetime literals directly:

from datetime import datetime

df = pl.DataFrame({
    "event": ["start", "middle", "end"]
}).with_columns(
    pl.lit(datetime(2024, 1, 15, 12, 0, 0)).alias("reference_time")
)

Extracting DateTime Components

The .dt namespace provides access to all temporal components. This is where Polars shines with its consistent, chainable API:

df = pl.DataFrame({
    "timestamp": pl.datetime_range(
        datetime(2024, 1, 1),
        datetime(2024, 12, 31),
        interval="1mo",
        eager=True
    )
})

df = df.with_columns(
    pl.col("timestamp").dt.year().alias("year"),
    pl.col("timestamp").dt.month().alias("month"),
    pl.col("timestamp").dt.day().alias("day"),
    pl.col("timestamp").dt.weekday().alias("weekday"),  # 0 = Monday
    pl.col("timestamp").dt.week().alias("week_of_year"),
    pl.col("timestamp").dt.ordinal_day().alias("day_of_year")
)

For timestamps with time components:

df = pl.DataFrame({
    "timestamp": [
        datetime(2024, 1, 15, 14, 30, 45),
        datetime(2024, 1, 15, 9, 15, 30),
        datetime(2024, 1, 15, 18, 45, 0)
    ]
})

df = df.with_columns(
    pl.col("timestamp").dt.hour().alias("hour"),
    pl.col("timestamp").dt.minute().alias("minute"),
    pl.col("timestamp").dt.second().alias("second")
)

A useful pattern for feature engineering is extracting multiple components in a single expression chain:

df = df.with_columns(
    (pl.col("timestamp").dt.hour() >= 9).alias("is_business_hours") 
    & (pl.col("timestamp").dt.hour() < 17)
    & (pl.col("timestamp").dt.weekday() < 5)
)

DateTime Arithmetic and Durations

Polars uses the Duration type for time intervals. You can create durations explicitly or get them from datetime subtraction:

df = pl.DataFrame({
    "start": [datetime(2024, 1, 1), datetime(2024, 2, 1)],
    "end": [datetime(2024, 1, 15), datetime(2024, 3, 1)]
})

# Calculate duration between dates
df = df.with_columns(
    (pl.col("end") - pl.col("start")).alias("duration")
)

# Extract duration components
df = df.with_columns(
    pl.col("duration").dt.total_days().alias("days_between"),
    pl.col("duration").dt.total_hours().alias("hours_between")
)

Adding time intervals to datetimes uses pl.duration():

df = pl.DataFrame({
    "timestamp": [datetime(2024, 1, 15, 12, 0, 0)]
})

df = df.with_columns(
    (pl.col("timestamp") + pl.duration(days=7)).alias("plus_one_week"),
    (pl.col("timestamp") + pl.duration(hours=3)).alias("plus_three_hours"),
    (pl.col("timestamp") - pl.duration(weeks=2)).alias("minus_two_weeks")
)

For column-based duration arithmetic:

df = pl.DataFrame({
    "timestamp": [datetime(2024, 1, 15)],
    "days_to_add": [30],
    "hours_to_add": [5]
})

df = df.with_columns(
    (pl.col("timestamp") + pl.duration(days=pl.col("days_to_add"))).alias("future_date")
)

Filtering and Grouping by Time

Filtering by date ranges is straightforward:

from datetime import datetime

df = pl.DataFrame({
    "timestamp": pl.datetime_range(
        datetime(2024, 1, 1),
        datetime(2024, 12, 31),
        interval="1d",
        eager=True
    ),
    "value": range(366)
})

# Filter to Q1 2024
q1_data = df.filter(
    (pl.col("timestamp") >= datetime(2024, 1, 1)) &
    (pl.col("timestamp") < datetime(2024, 4, 1))
)

# Filter using .is_between()
q1_data = df.filter(
    pl.col("timestamp").is_between(
        datetime(2024, 1, 1),
        datetime(2024, 3, 31)
    )
)

The group_by_dynamic() method is Polars’ most powerful time-series feature. It enables rolling and tumbling window aggregations:

df = pl.DataFrame({
    "timestamp": pl.datetime_range(
        datetime(2024, 1, 1),
        datetime(2024, 1, 31, 23, 59, 59),
        interval="1h",
        eager=True
    ),
    "sales": [i % 100 + 50 for i in range(744)]
})

# Daily aggregation
daily = df.group_by_dynamic("timestamp", every="1d").agg(
    pl.col("sales").sum().alias("daily_sales"),
    pl.col("sales").mean().alias("avg_hourly_sales")
)

# Weekly aggregation starting on Monday
weekly = df.group_by_dynamic("timestamp", every="1w", start_by="monday").agg(
    pl.col("sales").sum().alias("weekly_sales")
)

For truncating timestamps to specific periods:

df = df.with_columns(
    pl.col("timestamp").dt.truncate("1d").alias("date"),
    pl.col("timestamp").dt.truncate("1h").alias("hour"),
    pl.col("timestamp").dt.truncate("1mo").alias("month_start")
)

Time Zones and Conversions

Polars distinguishes between timezone-naive and timezone-aware datetimes. Start with replace_time_zone() to make naive datetimes timezone-aware:

df = pl.DataFrame({
    "timestamp": [datetime(2024, 1, 15, 12, 0, 0)]
})

# Make timezone-aware (this timestamp IS in UTC)
df = df.with_columns(
    pl.col("timestamp").dt.replace_time_zone("UTC").alias("utc_time")
)

# Convert to another timezone
df = df.with_columns(
    pl.col("utc_time").dt.convert_time_zone("America/New_York").alias("ny_time"),
    pl.col("utc_time").dt.convert_time_zone("Europe/London").alias("london_time")
)

The distinction between replace_time_zone() and convert_time_zone() is critical:

# replace_time_zone: Labels the timestamp with a timezone (no conversion)
# Use when you KNOW the naive timestamp is in a specific timezone

# convert_time_zone: Converts the instant to another timezone
# Use when you want to see the same moment in a different timezone

df = pl.DataFrame({
    "local_time": [datetime(2024, 7, 15, 12, 0, 0)]  # Noon in New York
})

df = df.with_columns(
    # First, declare this is New York time
    pl.col("local_time").dt.replace_time_zone("America/New_York").alias("ny_aware"),
).with_columns(
    # Then convert to UTC
    pl.col("ny_aware").dt.convert_time_zone("UTC").alias("utc_time")
)

Practical Tips and Performance

Tip 1: Parse dates once, early in your pipeline.

Parsing strings to datetimes is expensive. Do it once at data load time:

# Good: Parse during read
df = pl.read_csv("data.csv").with_columns(
    pl.col("date_column").str.to_datetime()
)

# Bad: Parsing repeatedly in downstream operations
# (each filter/transform re-parses the string)

Tip 2: Use datetime_range for generating test data.

# Efficient way to create time series test data
timestamps = pl.datetime_range(
    datetime(2024, 1, 1),
    datetime(2024, 12, 31),
    interval="5m",
    eager=True
)

Tip 3: Prefer .dt.truncate() over component extraction for grouping.

# Efficient: Single operation
df.group_by(pl.col("timestamp").dt.truncate("1d")).agg(...)

# Less efficient: Multiple extractions
df.group_by(
    pl.col("timestamp").dt.year(),
    pl.col("timestamp").dt.month(),
    pl.col("timestamp").dt.day()
).agg(...)

Tip 4: Filter before grouping on large datasets.

# Good: Reduce data first
result = (
    df
    .filter(pl.col("timestamp") >= datetime(2024, 1, 1))
    .group_by_dynamic("timestamp", every="1d")
    .agg(pl.col("value").sum())
)

Tip 5: Use lazy evaluation for complex datetime pipelines.

# Let Polars optimize the execution plan
result = (
    pl.scan_parquet("large_file.parquet")
    .filter(pl.col("timestamp").dt.year() == 2024)
    .with_columns(pl.col("timestamp").dt.truncate("1h").alias("hour"))
    .group_by("hour")
    .agg(pl.col("value").mean())
    .collect()
)

Polars’ datetime handling rewards you for thinking in terms of expressions rather than row-by-row operations. Once you internalize the .dt namespace and group_by_dynamic(), time-series analysis becomes significantly more pleasant than the pandas equivalent.