Pandas - Convert Timestamp to Date

• Pandas provides multiple methods to convert timestamps to dates: `dt.date`, `dt.normalize()`, and `dt.floor()`, each serving different use cases from extracting date objects to maintaining...

Key Insights

• Pandas provides multiple methods to convert timestamps to dates: dt.date, dt.normalize(), and dt.floor(), each serving different use cases from extracting date objects to maintaining datetime64 dtype • The dt.date accessor returns Python date objects which breaks vectorization, while dt.normalize() and dt.floor('D') preserve datetime64 dtype for better performance on large datasets • Timezone-aware timestamps require explicit handling before conversion to avoid ambiguity, using dt.tz_localize(None) or dt.tz_convert() as needed

Converting Timestamps Using dt.date

The most straightforward approach to extract dates from timestamps is the dt.date accessor. This method returns Python datetime.date objects, stripping away the time component entirely.

import pandas as pd
import numpy as np

# Create sample timestamp data
df = pd.DataFrame({
    'timestamp': pd.date_range('2024-01-15 08:30:00', periods=5, freq='6H')
})

print("Original timestamps:")
print(df['timestamp'])

# Convert to date using dt.date
df['date'] = df['timestamp'].dt.date

print("\nConverted to dates:")
print(df['date'])
print(f"Data type: {type(df['date'].iloc[0])}")

Output:

Original timestamps:
0   2024-01-15 08:30:00
1   2024-01-15 14:30:00
2   2024-01-15 20:30:00
3   2024-01-16 02:30:00
4   2024-01-16 08:30:00

Converted to dates:
0    2024-01-15
1    2024-01-15
2    2024-01-15
3    2024-01-16
4    2024-01-16
Data type: <class 'datetime.date'>

The limitation here is performance. Python date objects are not vectorized, making operations on large datasets significantly slower than native pandas datetime64 operations.

Preserving datetime64 with dt.normalize()

When you need to maintain the datetime64 dtype while setting time to midnight, use dt.normalize(). This approach keeps your data in pandas’ optimized format.

# Create DataFrame with various timestamps
df = pd.DataFrame({
    'timestamp': pd.to_datetime([
        '2024-01-15 08:30:45',
        '2024-01-15 14:22:10',
        '2024-01-16 23:59:59',
        '2024-01-17 00:00:01'
    ])
})

# Normalize to midnight
df['normalized_date'] = df['timestamp'].dt.normalize()

print("Original vs Normalized:")
print(df)
print(f"\nNormalized dtype: {df['normalized_date'].dtype}")

Output:

              timestamp normalized_date
0 2024-01-15 08:30:45      2024-01-15
1 2024-01-15 14:22:10      2024-01-15
2 2024-01-16 23:59:59      2024-01-16
3 2024-01-17 00:00:01      2024-01-17

Normalized dtype: datetime64[ns]

This method is ideal for grouping operations, time-based joins, and any scenario where you need date-level granularity while maintaining compatibility with pandas datetime operations.

Using dt.floor() for Date Conversion

The dt.floor() method provides flexible rounding to any time frequency. For date conversion, use floor('D') to round down to the start of each day.

# Create timestamps with various times
df = pd.DataFrame({
    'timestamp': pd.to_datetime([
        '2024-03-15 08:30:00',
        '2024-03-15 16:45:30',
        '2024-03-16 00:15:00',
        '2024-03-16 23:59:59'
    ])
})

# Floor to day level
df['floored_date'] = df['timestamp'].dt.floor('D')

# Compare with normalize
df['normalized_date'] = df['timestamp'].dt.normalize()

print(df)
print(f"\nAre floor and normalize equal? {df['floored_date'].equals(df['normalized_date'])}")

Output:

              timestamp        floored_date      normalized_date
0 2024-03-15 08:30:00  2024-03-15 00:00:00  2024-03-15 00:00:00
1 2024-03-15 16:45:30  2024-03-15 00:00:00  2024-03-15 00:00:00
2 2024-03-16 00:15:00  2024-03-16 00:00:00  2024-03-16 00:00:00
3 2024-03-16 23:59:59  2024-03-16 00:00:00  2024-03-16 00:00:00

Are floor and normalize equal? True

While floor('D') and normalize() produce identical results for date conversion, floor() offers additional flexibility for rounding to other frequencies like hours (‘H’), minutes (‘T’), or weeks (‘W’).

Handling Timezone-Aware Timestamps

Timezone-aware timestamps require special attention. You must decide whether to convert to a specific timezone or remove timezone information entirely.

# Create timezone-aware timestamps
df = pd.DataFrame({
    'timestamp_utc': pd.date_range('2024-01-15 08:00:00', periods=4, freq='6H', tz='UTC')
})

print("Original UTC timestamps:")
print(df['timestamp_utc'])

# Convert to Eastern Time before extracting date
df['timestamp_eastern'] = df['timestamp_utc'].dt.tz_convert('America/New_York')
df['date_eastern'] = df['timestamp_eastern'].dt.normalize()

# Remove timezone info then extract date
df['date_utc'] = df['timestamp_utc'].dt.tz_localize(None).dt.normalize()

print("\nConverted dates:")
print(df[['timestamp_utc', 'date_eastern', 'date_utc']])

Output:

Original UTC timestamps:
0   2024-01-15 08:00:00+00:00
1   2024-01-15 14:00:00+00:00
2   2024-01-15 20:00:00+00:00
3   2024-01-16 02:00:00+00:00

Converted dates:
                 timestamp_utc        date_eastern            date_utc
0 2024-01-15 08:00:00+00:00  2024-01-15 00:00:00  2024-01-15 00:00:00
1 2024-01-15 14:00:00+00:00  2024-01-15 00:00:00  2024-01-15 00:00:00
2 2024-01-15 20:00:00+00:00  2024-01-15 00:00:00  2024-01-15 00:00:00
3 2024-01-16 02:00:00+00:00  2024-01-15 00:00:00  2024-01-16 00:00:00

Notice that the last timestamp (2024-01-16 02:00 UTC) becomes 2024-01-15 in Eastern Time because it’s still January 15th at 9 PM EST.

Converting String Timestamps to Dates

When working with timestamp strings, combine pd.to_datetime() with date conversion in a single operation or chain them for clarity.

# Sample data with string timestamps
data = {
    'event': ['Login', 'Purchase', 'Logout', 'Login'],
    'timestamp_str': [
        '2024-01-15 08:30:00',
        '2024-01-15 09:45:30',
        '2024-01-15 10:22:15',
        '2024-01-16 08:15:00'
    ]
}
df = pd.DataFrame(data)

# Method 1: Convert then extract date
df['date_method1'] = pd.to_datetime(df['timestamp_str']).dt.normalize()

# Method 2: Convert with specific format then extract
df['date_method2'] = pd.to_datetime(
    df['timestamp_str'], 
    format='%Y-%m-%d %H:%M:%S'
).dt.date

print(df[['event', 'timestamp_str', 'date_method1', 'date_method2']])

Performance Considerations for Large Datasets

When working with millions of rows, the choice of conversion method significantly impacts performance.

import time

# Create large dataset
n_rows = 1_000_000
large_df = pd.DataFrame({
    'timestamp': pd.date_range('2024-01-01', periods=n_rows, freq='1min')
})

# Benchmark dt.date (returns Python objects)
start = time.time()
dates_python = large_df['timestamp'].dt.date
time_python = time.time() - start

# Benchmark dt.normalize (returns datetime64)
start = time.time()
dates_normalized = large_df['timestamp'].dt.normalize()
time_normalized = time.time() - start

print(f"dt.date execution time: {time_python:.4f} seconds")
print(f"dt.normalize execution time: {time_normalized:.4f} seconds")
print(f"Speedup: {time_python/time_normalized:.2f}x")

On typical hardware, dt.normalize() runs 10-50x faster than dt.date on large datasets because it maintains vectorized operations.

Grouping and Aggregating by Date

Converting timestamps to dates enables powerful grouping operations for daily aggregations.

# Sales data with timestamps
sales_data = pd.DataFrame({
    'timestamp': pd.date_range('2024-01-15 08:00', periods=100, freq='2H'),
    'amount': np.random.randint(100, 1000, 100)
})

# Convert to date for grouping
sales_data['date'] = sales_data['timestamp'].dt.normalize()

# Daily aggregations
daily_summary = sales_data.groupby('date').agg({
    'amount': ['sum', 'mean', 'count']
}).round(2)

print("Daily Sales Summary:")
print(daily_summary.head())

This pattern is essential for time-series analysis, reporting, and dashboard creation where daily granularity is required.

Handling Missing Values

Missing timestamp values require explicit handling during conversion to avoid errors.

# Data with missing timestamps
df = pd.DataFrame({
    'timestamp': pd.to_datetime([
        '2024-01-15 08:30:00',
        None,
        '2024-01-16 14:22:00',
        np.nan,
        '2024-01-17 09:15:00'
    ])
})

# dt.normalize preserves NaT (Not a Time)
df['date_normalized'] = df['timestamp'].dt.normalize()

# dt.date also preserves None
df['date_python'] = df['timestamp'].dt.date

print(df)
print(f"\nNull count in normalized: {df['date_normalized'].isna().sum()}")
print(f"Null count in python dates: {df['date_python'].isna().sum()}")

Both methods properly handle missing values, maintaining NaT or None in the resulting series. This allows for consistent null handling in downstream operations.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.