Pandas - Get Year/Month/Day from Datetime Column
• Pandas provides multiple methods to extract date components from datetime columns, including `.dt` accessor attributes, `strftime()` formatting, and direct attribute access—each with different...
Key Insights
• Pandas provides multiple methods to extract date components from datetime columns, including .dt accessor attributes, strftime() formatting, and direct attribute access—each with different performance characteristics
• The .dt accessor is the most Pandas-idiomatic approach and works seamlessly with NaT (Not-a-Time) values, while also supporting vectorized operations across entire columns
• Extracting date components creates new integer columns by default, but you can control data types and handle edge cases like timezone-aware datetimes and period objects
Basic Date Component Extraction with .dt Accessor
The .dt accessor is the primary method for extracting date components from datetime columns in Pandas. It provides direct access to datetime properties like year, month, and day.
import pandas as pd
import numpy as np
# Create sample dataframe with datetime column
df = pd.DataFrame({
'date': pd.date_range('2023-01-15', periods=5, freq='M'),
'value': [100, 200, 300, 400, 500]
})
# Extract year, month, and day
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day
print(df)
Output:
date value year month day
0 2023-01-31 100 2023 1 31
1 2023-02-28 200 2023 2 28
2 2023-03-31 300 2023 3 31
3 2023-04-30 400 2023 4 30
4 2023-05-31 500 2023 5 31
The .dt accessor returns integer values for year, month, and day. This approach handles NaT values gracefully, converting them to NaN in the resulting integer column (with automatic dtype conversion to float when NaT is present).
Handling Missing Values and Data Type Preservation
When your datetime column contains missing values, Pandas automatically handles the conversion while maintaining data integrity.
# Create dataframe with NaT values
df_with_nat = pd.DataFrame({
'date': [
pd.Timestamp('2023-01-15'),
pd.NaT,
pd.Timestamp('2023-03-20'),
pd.NaT,
pd.Timestamp('2023-05-10')
]
})
df_with_nat['year'] = df_with_nat['date'].dt.year
df_with_nat['month'] = df_with_nat['date'].dt.month
df_with_nat['day'] = df_with_nat['date'].dt.day
print(df_with_nat)
print(f"\nData types:\n{df_with_nat.dtypes}")
Output:
date year month day
0 2023-01-15 2023.0 1.0 15.0
1 NaT NaN NaN NaN
2 2023-03-20 2023.0 3.0 20.0
3 NaT NaN NaN NaN
4 2023-05-10 2023.0 5.0 10.0
Data types:
date datetime64[ns]
year float64
month float64
day float64
Notice that the extracted columns are float64 instead of int64 due to NaN values. To maintain integer types, use nullable integer dtype:
df_with_nat['year_int'] = df_with_nat['date'].dt.year.astype('Int64')
df_with_nat['month_int'] = df_with_nat['date'].dt.month.astype('Int64')
df_with_nat['day_int'] = df_with_nat['date'].dt.day.astype('Int64')
print(df_with_nat[['year_int', 'month_int', 'day_int']])
Extracting Additional Date Components
Beyond basic year, month, and day, the .dt accessor provides numerous other temporal components useful for analysis.
df = pd.DataFrame({
'datetime': pd.date_range('2023-01-01', periods=7, freq='D')
})
# Extract various components
df['year'] = df['datetime'].dt.year
df['month'] = df['datetime'].dt.month
df['day'] = df['datetime'].dt.day
df['day_of_week'] = df['datetime'].dt.dayofweek # Monday=0, Sunday=6
df['day_name'] = df['datetime'].dt.day_name()
df['week_of_year'] = df['datetime'].dt.isocalendar().week
df['quarter'] = df['datetime'].dt.quarter
df['is_month_end'] = df['datetime'].dt.is_month_end
print(df)
Output:
datetime year month day day_of_week day_name week_of_year quarter is_month_end
0 2023-01-01 2023 1 1 6 Sunday 52 1 False
1 2023-01-02 2023 1 2 0 Monday 1 1 False
2 2023-01-03 2023 1 3 1 Tuesday 1 1 False
3 2023-01-04 2023 1 4 2 Wednesday 1 1 False
4 2023-01-05 2023 1 5 3 Thursday 1 1 False
5 2023-01-06 2023 1 6 4 Friday 1 1 False
6 2023-01-07 2023 1 7 5 Saturday 1 1 False
Using strftime for Custom Formatting
When you need formatted string output rather than integer values, use strftime() with format codes.
df = pd.DataFrame({
'date': pd.date_range('2023-01-15', periods=3, freq='M')
})
# String formatting
df['year_str'] = df['date'].dt.strftime('%Y')
df['month_str'] = df['date'].dt.strftime('%m')
df['day_str'] = df['date'].dt.strftime('%d')
df['full_date'] = df['date'].dt.strftime('%Y-%m-%d')
df['month_name'] = df['date'].dt.strftime('%B')
df['custom'] = df['date'].dt.strftime('%Y/%m/%d (%A)')
print(df)
Output:
date year_str month_str day_str full_date month_name custom
0 2023-01-31 2023 01 31 2023-01-31 January 2023/01/31 (Tuesday)
1 2023-02-28 2023 02 28 2023-02-28 February 2023/02/28 (Tuesday)
2 2023-03-31 2023 03 31 2023-03-31 March 2023/03/31 (Friday)
Working with Timezone-Aware Datetimes
Timezone-aware datetimes require special consideration when extracting date components, as the same UTC timestamp represents different dates in different timezones.
# Create timezone-aware datetime
df = pd.DataFrame({
'utc_time': pd.date_range('2023-12-31 22:00:00', periods=3, freq='2H', tz='UTC')
})
# Convert to different timezones
df['tokyo_time'] = df['utc_time'].dt.tz_convert('Asia/Tokyo')
df['ny_time'] = df['utc_time'].dt.tz_convert('America/New_York')
# Extract dates from different timezones
df['utc_date'] = df['utc_time'].dt.date
df['tokyo_date'] = df['tokyo_time'].dt.date
df['ny_date'] = df['ny_time'].dt.date
print(df)
Output:
utc_time tokyo_time ny_time utc_date tokyo_date ny_date
0 2023-12-31 22:00:00+00:00 2024-01-01 07:00:00+09:00 2023-12-31 17:00:00-05:00 2023-12-31 2024-01-01 2023-12-31
1 2024-01-01 00:00:00+00:00 2024-01-01 09:00:00+09:00 2023-12-31 19:00:00-05:00 2024-01-01 2024-01-01 2023-12-31
2 2024-01-01 02:00:00+00:00 2024-01-01 11:00:00+09:00 2023-12-31 21:00:00-05:00 2024-01-01 2024-01-01 2023-12-31
Performance Considerations and Bulk Operations
When extracting multiple date components, consider performance implications. Extracting components individually is readable but makes multiple passes over the data.
import pandas as pd
import time
# Create large dataframe
df_large = pd.DataFrame({
'date': pd.date_range('2020-01-01', periods=1000000, freq='H')
})
# Method 1: Individual extraction
start = time.time()
df_large['year'] = df_large['date'].dt.year
df_large['month'] = df_large['date'].dt.month
df_large['day'] = df_large['date'].dt.day
time1 = time.time() - start
# Method 2: Using assign for chaining
start = time.time()
df_large = df_large.assign(
year=lambda x: x['date'].dt.year,
month=lambda x: x['date'].dt.month,
day=lambda x: x['date'].dt.day
)
time2 = time.time() - start
print(f"Individual extraction: {time1:.4f}s")
print(f"Assign method: {time2:.4f}s")
For truly performance-critical code, consider vectorized operations or extracting to numpy arrays:
# Direct numpy array manipulation
dates = df_large['date'].values.astype('datetime64[D]')
years = dates.astype('datetime64[Y]').astype(int) + 1970
months = (dates.astype('datetime64[M]').astype(int) % 12) + 1
Creating Date Columns from Components
The reverse operation—creating datetime columns from year, month, and day components—is equally common.
df = pd.DataFrame({
'year': [2023, 2023, 2023],
'month': [1, 6, 12],
'day': [15, 20, 25],
'value': [100, 200, 300]
})
# Combine into datetime column
df['date'] = pd.to_datetime(df[['year', 'month', 'day']])
print(df)
Output:
year month day value date
0 2023 1 15 100 2023-01-15
1 2023 6 20 200 2023-06-20
2 2023 12 25 300 2023-12-25
This approach handles invalid dates by raising errors, which you can control with the errors parameter in pd.to_datetime().