Pandas - Convert Column to Datetime

The `pd.to_datetime()` function converts string or numeric columns to datetime objects. For standard ISO 8601 formats, Pandas automatically detects the pattern:

Key Insights

  • Pandas provides pd.to_datetime() for flexible datetime conversion with automatic format detection, custom format strings, and error handling options
  • Understanding datetime parsing parameters like format, errors, and dayfirst prevents data loss and improves conversion performance by up to 10x
  • Mixed datetime formats, timezone handling, and epoch timestamps require specific conversion strategies to maintain data integrity

Basic DateTime Conversion

The pd.to_datetime() function converts string or numeric columns to datetime objects. For standard ISO 8601 formats, Pandas automatically detects the pattern:

import pandas as pd

df = pd.DataFrame({
    'date_string': ['2024-01-15', '2024-02-20', '2024-03-25']
})

df['date'] = pd.to_datetime(df['date_string'])
print(df.dtypes)
# date_string    object
# date           datetime64[ns]

For datetime strings with time components:

df = pd.DataFrame({
    'timestamp': ['2024-01-15 14:30:00', '2024-02-20 09:15:30']
})

df['datetime'] = pd.to_datetime(df['timestamp'])
print(df['datetime'])
# 0   2024-01-15 14:30:00
# 1   2024-02-20 09:15:30

Specifying Custom Date Formats

Explicitly defining the format string dramatically improves performance and ensures correct parsing. Use Python’s strftime directives:

df = pd.DataFrame({
    'date_str': ['15/01/2024', '20/02/2024', '25/03/2024']
})

# Specify day/month/year format
df['date'] = pd.to_datetime(df['date_str'], format='%d/%m/%Y')
print(df['date'])
# 0   2024-01-15
# 1   2024-02-20
# 2   2024-03-25

Common format codes:

formats = {
    'MM/DD/YYYY': '%m/%d/%Y',
    'DD-MM-YYYY': '%d-%m-%Y',
    'YYYY.MM.DD': '%Y.%m.%d',
    'Month DD, YYYY': '%B %d, %Y',
    'DD Mon YYYY': '%d %b %Y'
}

df = pd.DataFrame({
    'date1': ['01/15/2024'],
    'date2': ['15-01-2024'],
    'date3': ['2024.01.15'],
    'date4': ['January 15, 2024'],
    'date5': ['15 Jan 2024']
})

df['parsed1'] = pd.to_datetime(df['date1'], format='%m/%d/%Y')
df['parsed2'] = pd.to_datetime(df['date2'], format='%d-%m-%Y')
df['parsed3'] = pd.to_datetime(df['date3'], format='%Y.%m.%d')
df['parsed4'] = pd.to_datetime(df['date4'], format='%B %d, %Y')
df['parsed5'] = pd.to_datetime(df['date5'], format='%d %b %Y')

Handling Conversion Errors

The errors parameter controls behavior when parsing fails. Three options exist: raise (default), coerce, and ignore:

df = pd.DataFrame({
    'dates': ['2024-01-15', 'invalid', '2024-03-25', None]
})

# Default: raises ValueError
try:
    df['date_raise'] = pd.to_datetime(df['dates'])
except ValueError as e:
    print(f"Error: {e}")

# Coerce invalid values to NaT (Not a Time)
df['date_coerce'] = pd.to_datetime(df['dates'], errors='coerce')
print(df['date_coerce'])
# 0   2024-01-15
# 1          NaT
# 2   2024-03-25
# 3          NaT

# Ignore: returns original data if conversion fails
df['date_ignore'] = pd.to_datetime(df['dates'], errors='ignore')
print(df['date_ignore'])
# 0    2024-01-15
# 1       invalid
# 2    2024-03-25
# 3          None

Identify and handle problematic rows:

df['date_parsed'] = pd.to_datetime(df['dates'], errors='coerce')
invalid_dates = df[df['date_parsed'].isna() & df['dates'].notna()]
print(f"Found {len(invalid_dates)} invalid date entries")

Ambiguous Date Formats

When dates could be interpreted multiple ways (01/02/2024 could be Jan 2 or Feb 1), use dayfirst parameter:

df = pd.DataFrame({
    'ambiguous': ['01/02/2024', '05/06/2024', '12/11/2024']
})

# US format (month first)
df['us_format'] = pd.to_datetime(df['ambiguous'], dayfirst=False)
print(df['us_format'])
# 0   2024-01-02  (January 2)
# 1   2024-05-06  (May 6)
# 2   2024-12-11  (December 11)

# European format (day first)
df['eu_format'] = pd.to_datetime(df['ambiguous'], dayfirst=True)
print(df['eu_format'])
# 0   2024-02-01  (February 1)
# 1   2024-06-05  (June 5)
# 2   2024-11-12  (November 12)

Converting Unix Timestamps

Handle epoch timestamps (seconds or milliseconds since 1970-01-01):

df = pd.DataFrame({
    'timestamp_sec': [1705334400, 1707926400, 1710518400],
    'timestamp_ms': [1705334400000, 1707926400000, 1710518400000]
})

# Seconds since epoch
df['date_sec'] = pd.to_datetime(df['timestamp_sec'], unit='s')

# Milliseconds since epoch
df['date_ms'] = pd.to_datetime(df['timestamp_ms'], unit='ms')

print(df[['date_sec', 'date_ms']])
#            date_sec            date_ms
# 0 2024-01-15 12:00:00 2024-01-15 12:00:00
# 1 2024-02-14 12:00:00 2024-02-14 12:00:00
# 2 2024-03-15 12:00:00 2024-03-15 12:00:00

Available units: D (days), s (seconds), ms (milliseconds), us (microseconds), ns (nanoseconds).

Timezone Aware Conversions

Convert timezone-naive datetimes to timezone-aware:

df = pd.DataFrame({
    'datetime': ['2024-01-15 14:30:00', '2024-02-20 09:15:30']
})

# Parse and localize to UTC
df['utc'] = pd.to_datetime(df['datetime'], utc=True)
print(df['utc'])
# 0   2024-01-15 14:30:00+00:00
# 1   2024-02-20 09:15:30+00:00

# Convert to different timezone
df['eastern'] = df['utc'].dt.tz_convert('America/New_York')
print(df['eastern'])
# 0   2024-01-15 09:30:00-05:00
# 1   2024-02-20 04:15:30-05:00

Parse timezone strings directly:

df = pd.DataFrame({
    'datetime_tz': ['2024-01-15 14:30:00+00:00', '2024-02-20 09:15:30-05:00']
})

df['parsed'] = pd.to_datetime(df['datetime_tz'])
print(df['parsed'])
# 0   2024-01-15 14:30:00+00:00
# 1   2024-02-20 09:15:30-05:00

Handling Mixed Formats

When a column contains multiple date formats, use infer_datetime_format or apply custom logic:

df = pd.DataFrame({
    'mixed_dates': [
        '2024-01-15',
        '15/01/2024',
        'January 15, 2024',
        '2024.01.15'
    ]
})

# Attempt automatic inference (slower but flexible)
df['parsed'] = pd.to_datetime(df['mixed_dates'], infer_datetime_format=True)

# Or apply conditional parsing
def parse_mixed_dates(date_str):
    formats = ['%Y-%m-%d', '%d/%m/%Y', '%B %d, %Y', '%Y.%m.%d']
    for fmt in formats:
        try:
            return pd.to_datetime(date_str, format=fmt)
        except ValueError:
            continue
    return pd.NaT

df['custom_parsed'] = df['mixed_dates'].apply(parse_mixed_dates)

Extracting Date Components

After conversion, extract specific components:

df = pd.DataFrame({
    'date_string': ['2024-01-15', '2024-02-20', '2024-03-25']
})

df['date'] = pd.to_datetime(df['date_string'])

# Extract components
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day
df['day_of_week'] = df['date'].dt.dayofweek  # Monday=0
df['day_name'] = df['date'].dt.day_name()
df['quarter'] = df['date'].dt.quarter

print(df[['date', 'year', 'month', 'day', 'day_name', 'quarter']])
#         date  year  month  day   day_name  quarter
# 0 2024-01-15  2024      1   15     Monday        1
# 1 2024-02-20  2024      2   20    Tuesday        1
# 2 2024-03-25  2024      3   25     Monday        1

Performance Optimization

For large datasets, specify format explicitly and cache results:

import numpy as np

# Generate test data
n = 1_000_000
dates = pd.date_range('2020-01-01', periods=n, freq='H').strftime('%Y-%m-%d %H:%M:%S')
df = pd.DataFrame({'date_str': dates})

# Without format specification (slower)
%timeit pd.to_datetime(df['date_str'])
# ~2.5 seconds

# With format specification (faster)
%timeit pd.to_datetime(df['date_str'], format='%Y-%m-%d %H:%M:%S')
# ~250 milliseconds (10x faster)

# For repeated operations, convert once
df['date'] = pd.to_datetime(df['date_str'], format='%Y-%m-%d %H:%M:%S')

Set datetime as index for time-series operations:

df = df.set_index('date')
df_resampled = df.resample('D').count()  # Daily aggregation

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.