Pandas - Format Datetime Column (strftime)

• The `strftime()` method converts datetime objects to formatted strings using format codes like `%Y-%m-%d`, while `dt.strftime()` applies this to entire DataFrame columns efficiently

Key Insights

• The strftime() method converts datetime objects to formatted strings using format codes like %Y-%m-%d, while dt.strftime() applies this to entire DataFrame columns efficiently • Pandas provides multiple approaches for datetime formatting: dt.strftime() for Series operations, apply() with lambda functions, and to_datetime() with format parameters for parsing • Performance matters at scale—vectorized dt.strftime() operations are significantly faster than row-by-row apply() methods, especially with datasets containing millions of rows

Understanding Datetime Formatting in Pandas

Pandas datetime formatting relies on Python’s strftime() (string format time) method, which converts datetime objects into human-readable strings. When working with DataFrame columns, you’ll use the .dt accessor to apply datetime operations vectorially across the entire column.

import pandas as pd
import numpy as np

# Create sample DataFrame with datetime column
df = pd.DataFrame({
    'transaction_id': range(1, 6),
    'timestamp': pd.date_range('2024-01-15 09:30:00', periods=5, freq='H')
})

print(df)

Output:

   transaction_id           timestamp
0               1 2024-01-15 09:30:00
1               2 2024-01-15 10:30:00
2               3 2024-01-15 11:30:00
3               4 2024-01-15 12:30:00
4               5 2024-01-15 13:30:00

Basic strftime() Formatting

The dt.strftime() method accepts format codes that define how the datetime should be displayed. Common format codes include %Y (4-digit year), %m (2-digit month), %d (day), %H (24-hour), %M (minute), and %S (second).

# Format as YYYY-MM-DD
df['date_formatted'] = df['timestamp'].dt.strftime('%Y-%m-%d')

# Format as MM/DD/YYYY HH:MM AM/PM
df['us_format'] = df['timestamp'].dt.strftime('%m/%d/%Y %I:%M %p')

# Format as readable string
df['readable'] = df['timestamp'].dt.strftime('%B %d, %Y at %I:%M %p')

print(df[['timestamp', 'date_formatted', 'us_format', 'readable']])

Output:

            timestamp date_formatted       us_format                    readable
0 2024-01-15 09:30:00     2024-01-15  01/15/2024 09:30 AM  January 15, 2024 at 09:30 AM
1 2024-01-15 10:30:00     2024-01-15  01/15/2024 10:30 AM  January 15, 2024 at 10:30 AM
2 2024-01-15 11:30:00     2024-01-15  01/15/2024 11:30 AM  January 15, 2024 at 11:30 AM

Essential Format Codes Reference

# Create comprehensive formatting examples
sample_dt = pd.Timestamp('2024-03-07 14:25:33')
df_formats = pd.DataFrame({
    'format_code': ['%Y-%m-%d', '%d/%m/%Y', '%Y%m%d', '%B %d, %Y', 
                    '%H:%M:%S', '%I:%M %p', '%A, %B %d', '%Y-W%W-%w'],
    'description': ['ISO date', 'European date', 'Compact date', 'Long date',
                   '24-hour time', '12-hour time', 'Weekday and date', 'Week number']
})

df_formats['example'] = df_formats['format_code'].apply(
    lambda x: sample_dt.strftime(x)
)

print(df_formats.to_string(index=False))

Output:

 format_code        description              example
  %Y-%m-%d           ISO date           2024-03-07
  %d/%m/%Y      European date           07/03/2024
    %Y%m%d       Compact date             20240307
%B %d, %Y          Long date       March 07, 2024
  %H:%M:%S       24-hour time             14:25:33
  %I:%M %p       12-hour time             02:25 PM
%A, %B %d    Weekday and date    Thursday, March 07
 %Y-W%W-%w        Week number          2024-W09-4

Handling NaT (Not a Time) Values

When your datetime column contains missing values (NaT), strftime() converts them to NaN strings. Handle this explicitly to avoid confusion.

# Create DataFrame with NaT values
df_missing = pd.DataFrame({
    'event_date': [
        pd.Timestamp('2024-01-15'),
        pd.NaT,
        pd.Timestamp('2024-01-17'),
        pd.NaT,
        pd.Timestamp('2024-01-19')
    ]
})

# Default behavior - NaT becomes NaN
df_missing['formatted'] = df_missing['event_date'].dt.strftime('%Y-%m-%d')

# Replace NaN with custom string
df_missing['formatted_clean'] = df_missing['event_date'].dt.strftime('%Y-%m-%d').fillna('No Date')

print(df_missing)

Output:

  event_date   formatted formatted_clean
0 2024-01-15  2024-01-15      2024-01-15
1        NaT         NaN         No Date
2 2024-01-17  2024-01-17      2024-01-17
3        NaT         NaN         No Date
4 2024-01-19  2024-01-19      2024-01-19

Extracting Specific Components

Beyond full formatting, extract individual datetime components for analysis or conditional formatting.

df_components = pd.DataFrame({
    'timestamp': pd.date_range('2024-01-01', periods=5, freq='M')
})

# Extract components
df_components['year'] = df_components['timestamp'].dt.strftime('%Y')
df_components['month_name'] = df_components['timestamp'].dt.strftime('%B')
df_components['month_abbr'] = df_components['timestamp'].dt.strftime('%b')
df_components['day_of_week'] = df_components['timestamp'].dt.strftime('%A')
df_components['quarter'] = df_components['timestamp'].dt.strftime('Q%q')

print(df_components)

Combining Multiple Format Operations

Real-world scenarios often require creating multiple formatted columns for different use cases—database storage, display, file naming, or reporting.

# Sales data example
sales_df = pd.DataFrame({
    'sale_datetime': pd.date_range('2024-01-15 08:00:00', periods=4, freq='6H'),
    'amount': [1250.50, 890.25, 2100.00, 456.75]
})

# Multiple format outputs
sales_df['db_timestamp'] = sales_df['sale_datetime'].dt.strftime('%Y-%m-%d %H:%M:%S')
sales_df['display_date'] = sales_df['sale_datetime'].dt.strftime('%b %d, %Y')
sales_df['display_time'] = sales_df['sale_datetime'].dt.strftime('%I:%M %p')
sales_df['file_prefix'] = sales_df['sale_datetime'].dt.strftime('%Y%m%d_%H%M')
sales_df['report_label'] = sales_df['sale_datetime'].dt.strftime('%A %p')

print(sales_df)

Performance Considerations

When working with large datasets, the choice between dt.strftime() and apply() significantly impacts performance.

import time

# Create large dataset
large_df = pd.DataFrame({
    'datetime_col': pd.date_range('2020-01-01', periods=1000000, freq='T')
})

# Method 1: Vectorized dt.strftime()
start = time.time()
result1 = large_df['datetime_col'].dt.strftime('%Y-%m-%d %H:%M')
time1 = time.time() - start

# Method 2: apply() with lambda
start = time.time()
result2 = large_df['datetime_col'].apply(lambda x: x.strftime('%Y-%m-%d %H:%M'))
time2 = time.time() - start

print(f"dt.strftime() time: {time1:.4f} seconds")
print(f"apply() time: {time2:.4f} seconds")
print(f"Speedup: {time2/time1:.2f}x faster")

The vectorized dt.strftime() approach typically runs 10-50x faster than apply() methods, depending on dataset size and complexity.

Timezone-Aware Formatting

When working with timezone-aware datetimes, format the timezone information explicitly or convert to local time before formatting.

# Create timezone-aware datetimes
df_tz = pd.DataFrame({
    'utc_time': pd.date_range('2024-01-15 12:00:00', periods=3, freq='H', tz='UTC')
})

# Convert to different timezone
df_tz['eastern_time'] = df_tz['utc_time'].dt.tz_convert('America/New_York')

# Format with timezone information
df_tz['utc_formatted'] = df_tz['utc_time'].dt.strftime('%Y-%m-%d %H:%M %Z')
df_tz['eastern_formatted'] = df_tz['eastern_time'].dt.strftime('%Y-%m-%d %I:%M %p %Z')

print(df_tz)

Conditional Formatting

Apply different formats based on conditions using np.where() or pd.Series.mask().

df_conditional = pd.DataFrame({
    'event_time': pd.date_range('2024-01-15', periods=6, freq='D')
})

# Different format for weekends vs weekdays
df_conditional['formatted'] = np.where(
    df_conditional['event_time'].dt.dayofweek < 5,  # Monday=0, Sunday=6
    df_conditional['event_time'].dt.strftime('%Y-%m-%d (Weekday)'),
    df_conditional['event_time'].dt.strftime('%Y-%m-%d (WEEKEND)')
)

print(df_conditional)

This approach enables dynamic formatting based on business logic, making your data presentation more contextual and informative.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.