Pandas - Get Year/Month/Day from Datetime Column

• Pandas provides multiple methods to extract date components from datetime columns, including `.dt` accessor attributes, `strftime()` formatting, and direct attribute access—each with different...

Key Insights

• Pandas provides multiple methods to extract date components from datetime columns, including .dt accessor attributes, strftime() formatting, and direct attribute access—each with different performance characteristics • The .dt accessor is the most Pandas-idiomatic approach and works seamlessly with NaT (Not-a-Time) values, while also supporting vectorized operations across entire columns • Extracting date components creates new integer columns by default, but you can control data types and handle edge cases like timezone-aware datetimes and period objects

Basic Date Component Extraction with .dt Accessor

The .dt accessor is the primary method for extracting date components from datetime columns in Pandas. It provides direct access to datetime properties like year, month, and day.

import pandas as pd
import numpy as np

# Create sample dataframe with datetime column
df = pd.DataFrame({
    'date': pd.date_range('2023-01-15', periods=5, freq='M'),
    'value': [100, 200, 300, 400, 500]
})

# Extract year, month, and day
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day

print(df)

Output:

        date  value  year  month  day
0 2023-01-31    100  2023      1   31
1 2023-02-28    200  2023      2   28
2 2023-03-31    300  2023      3   31
3 2023-04-30    400  2023      4   30
4 2023-05-31    500  2023      5   31

The .dt accessor returns integer values for year, month, and day. This approach handles NaT values gracefully, converting them to NaN in the resulting integer column (with automatic dtype conversion to float when NaT is present).

Handling Missing Values and Data Type Preservation

When your datetime column contains missing values, Pandas automatically handles the conversion while maintaining data integrity.

# Create dataframe with NaT values
df_with_nat = pd.DataFrame({
    'date': [
        pd.Timestamp('2023-01-15'),
        pd.NaT,
        pd.Timestamp('2023-03-20'),
        pd.NaT,
        pd.Timestamp('2023-05-10')
    ]
})

df_with_nat['year'] = df_with_nat['date'].dt.year
df_with_nat['month'] = df_with_nat['date'].dt.month
df_with_nat['day'] = df_with_nat['date'].dt.day

print(df_with_nat)
print(f"\nData types:\n{df_with_nat.dtypes}")

Output:

        date    year  month   day
0 2023-01-15  2023.0    1.0  15.0
1        NaT     NaN    NaN   NaN
2 2023-03-20  2023.0    3.0  20.0
3        NaT     NaN    NaN   NaN
4 2023-05-10  2023.0    5.0  10.0

Data types:
date             datetime64[ns]
year                    float64
month                   float64
day                     float64

Notice that the extracted columns are float64 instead of int64 due to NaN values. To maintain integer types, use nullable integer dtype:

df_with_nat['year_int'] = df_with_nat['date'].dt.year.astype('Int64')
df_with_nat['month_int'] = df_with_nat['date'].dt.month.astype('Int64')
df_with_nat['day_int'] = df_with_nat['date'].dt.day.astype('Int64')

print(df_with_nat[['year_int', 'month_int', 'day_int']])

Extracting Additional Date Components

Beyond basic year, month, and day, the .dt accessor provides numerous other temporal components useful for analysis.

df = pd.DataFrame({
    'datetime': pd.date_range('2023-01-01', periods=7, freq='D')
})

# Extract various components
df['year'] = df['datetime'].dt.year
df['month'] = df['datetime'].dt.month
df['day'] = df['datetime'].dt.day
df['day_of_week'] = df['datetime'].dt.dayofweek  # Monday=0, Sunday=6
df['day_name'] = df['datetime'].dt.day_name()
df['week_of_year'] = df['datetime'].dt.isocalendar().week
df['quarter'] = df['datetime'].dt.quarter
df['is_month_end'] = df['datetime'].dt.is_month_end

print(df)

Output:

    datetime  year  month  day  day_of_week day_name  week_of_year  quarter  is_month_end
0 2023-01-01  2023      1    1            6   Sunday            52        1         False
1 2023-01-02  2023      1    2            0   Monday             1        1         False
2 2023-01-03  2023      1    3            1  Tuesday             1        1         False
3 2023-01-04  2023      1    4            2 Wednesday            1        1         False
4 2023-01-05  2023      1    5            3 Thursday             1        1         False
5 2023-01-06  2023      1    6            4   Friday             1        1         False
6 2023-01-07  2023      1    7            5 Saturday             1        1         False

Using strftime for Custom Formatting

When you need formatted string output rather than integer values, use strftime() with format codes.

df = pd.DataFrame({
    'date': pd.date_range('2023-01-15', periods=3, freq='M')
})

# String formatting
df['year_str'] = df['date'].dt.strftime('%Y')
df['month_str'] = df['date'].dt.strftime('%m')
df['day_str'] = df['date'].dt.strftime('%d')
df['full_date'] = df['date'].dt.strftime('%Y-%m-%d')
df['month_name'] = df['date'].dt.strftime('%B')
df['custom'] = df['date'].dt.strftime('%Y/%m/%d (%A)')

print(df)

Output:

        date year_str month_str day_str   full_date month_name              custom
0 2023-01-31     2023        01      31  2023-01-31    January  2023/01/31 (Tuesday)
1 2023-02-28     2023        02      28  2023-02-28   February  2023/02/28 (Tuesday)
2 2023-03-31     2023        03      31  2023-03-31      March  2023/03/31 (Friday)

Working with Timezone-Aware Datetimes

Timezone-aware datetimes require special consideration when extracting date components, as the same UTC timestamp represents different dates in different timezones.

# Create timezone-aware datetime
df = pd.DataFrame({
    'utc_time': pd.date_range('2023-12-31 22:00:00', periods=3, freq='2H', tz='UTC')
})

# Convert to different timezones
df['tokyo_time'] = df['utc_time'].dt.tz_convert('Asia/Tokyo')
df['ny_time'] = df['utc_time'].dt.tz_convert('America/New_York')

# Extract dates from different timezones
df['utc_date'] = df['utc_time'].dt.date
df['tokyo_date'] = df['tokyo_time'].dt.date
df['ny_date'] = df['ny_time'].dt.date

print(df)

Output:

                  utc_time                tokyo_time                   ny_time    utc_date  tokyo_date      ny_date
0 2023-12-31 22:00:00+00:00 2024-01-01 07:00:00+09:00 2023-12-31 17:00:00-05:00  2023-12-31  2024-01-01  2023-12-31
1 2024-01-01 00:00:00+00:00 2024-01-01 09:00:00+09:00 2023-12-31 19:00:00-05:00  2024-01-01  2024-01-01  2023-12-31
2 2024-01-01 02:00:00+00:00 2024-01-01 11:00:00+09:00 2023-12-31 21:00:00-05:00  2024-01-01  2024-01-01  2023-12-31

Performance Considerations and Bulk Operations

When extracting multiple date components, consider performance implications. Extracting components individually is readable but makes multiple passes over the data.

import pandas as pd
import time

# Create large dataframe
df_large = pd.DataFrame({
    'date': pd.date_range('2020-01-01', periods=1000000, freq='H')
})

# Method 1: Individual extraction
start = time.time()
df_large['year'] = df_large['date'].dt.year
df_large['month'] = df_large['date'].dt.month
df_large['day'] = df_large['date'].dt.day
time1 = time.time() - start

# Method 2: Using assign for chaining
start = time.time()
df_large = df_large.assign(
    year=lambda x: x['date'].dt.year,
    month=lambda x: x['date'].dt.month,
    day=lambda x: x['date'].dt.day
)
time2 = time.time() - start

print(f"Individual extraction: {time1:.4f}s")
print(f"Assign method: {time2:.4f}s")

For truly performance-critical code, consider vectorized operations or extracting to numpy arrays:

# Direct numpy array manipulation
dates = df_large['date'].values.astype('datetime64[D]')
years = dates.astype('datetime64[Y]').astype(int) + 1970
months = (dates.astype('datetime64[M]').astype(int) % 12) + 1

Creating Date Columns from Components

The reverse operation—creating datetime columns from year, month, and day components—is equally common.

df = pd.DataFrame({
    'year': [2023, 2023, 2023],
    'month': [1, 6, 12],
    'day': [15, 20, 25],
    'value': [100, 200, 300]
})

# Combine into datetime column
df['date'] = pd.to_datetime(df[['year', 'month', 'day']])

print(df)

Output:

   year  month  day  value       date
0  2023      1   15    100 2023-01-15
1  2023      6   20    200 2023-06-20
2  2023     12   25    300 2023-12-25

This approach handles invalid dates by raising errors, which you can control with the errors parameter in pd.to_datetime().

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.