R lubridate - Extract Year/Month/Day/Hour
Date manipulation in R has historically been painful. Base R's `strftime()` and `format()` functions work, but their syntax is cryptic and error-prone. The lubridate package solves this problem with...
Key Insights
- lubridate provides intuitive accessor functions (
year(),month(),day(),hour()) that extract date components with a single function call, eliminating the need for complexformat()orstrftime()patterns - The
month()andwday()functions acceptlabelandabbrparameters that return human-readable factor outputs, making them ideal for visualization and reporting workflows - Combining lubridate extraction functions with dplyr’s
group_by()creates a powerful pattern for time-series aggregation that handles most real-world analytical scenarios
Introduction
Date manipulation in R has historically been painful. Base R’s strftime() and format() functions work, but their syntax is cryptic and error-prone. The lubridate package solves this problem with a consistent, readable API for parsing, manipulating, and extracting components from date and datetime objects.
Extracting date components—year, month, day, hour—is fundamental to time-series analysis. You need these operations when aggregating sales by quarter, analyzing traffic patterns by hour, or filtering records to specific time windows. lubridate makes these extractions trivial.
# Install if needed
# install.packages("lubridate")
library(lubridate)
library(dplyr)
# Create sample datetime objects
single_date <- ymd("2024-03-15")
single_datetime <- ymd_hms("2024-03-15 14:30:45")
# Vector of dates
date_vector <- ymd(c("2023-01-15", "2023-06-20", "2024-02-28", "2024-12-01"))
# Typical dataframe scenario
transactions <- tibble(
transaction_id = 1:5,
timestamp = ymd_hms(c(
"2024-01-15 09:23:11",
"2024-01-15 14:45:33",
"2024-02-20 11:12:08",
"2024-03-10 16:55:22",
"2024-03-10 08:30:00"
)),
amount = c(150.00, 89.50, 234.00, 67.25, 445.00)
)
Extracting Year with year()
The year() function returns the four-digit year as an integer. It works on Date objects, POSIXct/POSIXlt datetime objects, and character strings that lubridate can parse.
# Single date
year(single_date)
# [1] 2024
# Datetime object
year(single_datetime)
# [1] 2024
# Vector of dates
year(date_vector)
# [1] 2023 2023 2024 2024
# Dataframe column
transactions %>%
mutate(transaction_year = year(timestamp))
# # A tibble: 5 × 4
# transaction_id timestamp amount transaction_year
# <int> <dttm> <dbl> <dbl>
# 1 1 2024-01-15 09:23:11 150 2024
# 2 2 2024-01-15 14:45:33 89.5 2024
# 3 3 2024-02-20 11:12:08 234 2024
# 4 4 2024-03-10 16:55:22 67.2 2024
# 5 5 2024-03-10 08:30:00 445 2024
One useful pattern: year() also works as a setter. You can modify the year component directly:
date <- ymd("2024-03-15")
year(date) <- 2025
date
# [1] "2025-03-15"
This setter pattern applies to all lubridate accessor functions.
Extracting Month with month()
The month() function extracts the month component, but its real power comes from the label and abbr parameters. By default, it returns an integer (1-12). Set label = TRUE to get an ordered factor with month names.
# Numeric month (default)
month(single_date)
# [1] 3
# Full month name as factor
month(single_date, label = TRUE, abbr = FALSE)
# [1] March
# 12 Levels: January < February < March < April < May < June < ... < December
# Abbreviated month name (default when label = TRUE)
month(single_date, label = TRUE)
# [1] Mar
# 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < Oct < Nov < Dec
# Practical use: grouping by month
transactions %>%
mutate(month_name = month(timestamp, label = TRUE)) %>%
group_by(month_name) %>%
summarise(
total_amount = sum(amount),
transaction_count = n()
)
# # A tibble: 3 × 3
# month_name total_amount transaction_count
# <ord> <dbl> <int>
# 1 Jan 239. 2
# 2 Feb 234 1
# 3 Mar 512. 2
The ordered factor output is crucial for visualization. When you plot monthly data, ggplot2 respects the factor ordering, so January appears before February without manual intervention.
Extracting Day with day(), mday(), wday(), yday()
lubridate provides four functions for day extraction, each serving a different purpose:
day()/mday(): Day of the month (1-31). These are identical.wday(): Day of the week (1-7, where 1 = Sunday by default)yday(): Day of the year (1-366)
test_date <- ymd("2024-03-15") # A Friday
# Day of month
day(test_date)
# [1] 15
mday(test_date) # Same result
# [1] 15
# Day of week (numeric)
wday(test_date)
# [1] 6 (Friday, since Sunday = 1)
# Day of week with label
wday(test_date, label = TRUE)
# [1] Fri
# Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
# Start week on Monday instead of Sunday
wday(test_date, week_start = 1)
# [1] 5 (Friday is the 5th day when Monday = 1)
# Day of year
yday(test_date)
# [1] 75 (March 15 is the 75th day of 2024)
The week_start parameter in wday() is essential for international applications. ISO 8601 defines Monday as day 1, which is standard in Europe. Set week_start = 1 for ISO compliance.
# Comparing all day functions
dates <- ymd(c("2024-01-01", "2024-03-15", "2024-12-31"))
tibble(
date = dates,
day_of_month = mday(dates),
day_of_week = wday(dates, label = TRUE),
day_of_year = yday(dates)
)
# # A tibble: 3 × 4
# date day_of_month day_of_week day_of_year
# <date> <int> <ord> <int>
# 1 2024-01-01 1 Mon 1
# 2 2024-03-15 15 Fri 75
# 3 2024-12-31 31 Tue 366
Extracting Hour/Minute/Second with hour(), minute(), second()
Time component extraction requires datetime objects (POSIXct/POSIXlt), not Date objects. If you try to extract hours from a Date, you’ll get zeros.
timestamp <- ymd_hms("2024-03-15 14:30:45")
hour(timestamp)
# [1] 14
minute(timestamp)
# [1] 30
second(timestamp)
# [1] 45
# Extracting from a Date object returns 0
date_only <- ymd("2024-03-15")
hour(date_only)
# [1] 0
Timezone handling matters when extracting time components. lubridate respects the timezone attached to your datetime object:
# Same instant, different timezones
utc_time <- ymd_hms("2024-03-15 14:30:00", tz = "UTC")
eastern_time <- with_tz(utc_time, "America/New_York")
hour(utc_time)
# [1] 14
hour(eastern_time)
# [1] 10 (Eastern is UTC-4 during daylight saving time)
Always be explicit about timezones when your analysis depends on local time. Use force_tz() to change the timezone label without adjusting the clock time, or with_tz() to convert to a different timezone while preserving the instant.
Practical Application: Aggregating Data by Date Components
Real analysis combines extraction functions with dplyr operations. Here’s a realistic example with sales data:
# Generate sample sales data
set.seed(42)
sales <- tibble(
sale_id = 1:1000,
timestamp = ymd_hms("2023-01-01 00:00:00") +
seconds(sample(0:(365*24*60*60), 1000, replace = TRUE)),
amount = round(runif(1000, 10, 500), 2),
category = sample(c("Electronics", "Clothing", "Food"), 1000, replace = TRUE)
)
# Monthly revenue summary
monthly_revenue <- sales %>%
mutate(
year = year(timestamp),
month = month(timestamp, label = TRUE)
) %>%
group_by(year, month) %>%
summarise(
total_revenue = sum(amount),
avg_transaction = mean(amount),
transaction_count = n(),
.groups = "drop"
)
print(monthly_revenue, n = 6)
# # A tibble: 12 × 5
# year month total_revenue avg_transaction transaction_count
# <dbl> <ord> <dbl> <dbl> <int>
# 1 2023 Jan 21842. 253. 86
# 2 2023 Feb 19876. 248. 80
# 3 2023 Mar 22134. 261. 85
# ...
# Hourly traffic pattern
hourly_pattern <- sales %>%
mutate(hour = hour(timestamp)) %>%
group_by(hour) %>%
summarise(
transaction_count = n(),
avg_amount = mean(amount)
) %>%
arrange(hour)
# Day of week analysis
weekday_analysis <- sales %>%
mutate(
weekday = wday(timestamp, label = TRUE, week_start = 1)
) %>%
group_by(weekday) %>%
summarise(
total_sales = sum(amount),
transactions = n()
)
print(weekday_analysis)
# # A tibble: 7 × 3
# weekday total_sales transactions
# <ord> <dbl> <int>
# 1 Mon 36421. 143
# 2 Tue 37892. 149
# ...
A common pattern combines year and month into a single grouping variable:
# Year-month aggregation using floor_date
sales %>%
mutate(year_month = floor_date(timestamp, "month")) %>%
group_by(year_month) %>%
summarise(revenue = sum(amount))
# Or create a character key
sales %>%
mutate(
year_month = paste(year(timestamp),
sprintf("%02d", month(timestamp)),
sep = "-")
) %>%
group_by(year_month) %>%
summarise(revenue = sum(amount))
Summary
Here’s a quick reference for all lubridate extraction functions:
| Function | Returns | Example Output |
|---|---|---|
year(x) |
Integer (4-digit year) | 2024 |
month(x) |
Integer (1-12) | 3 |
month(x, label = TRUE) |
Ordered factor | Mar |
day(x) / mday(x) |
Integer (1-31) | 15 |
wday(x) |
Integer (1-7) | 6 |
wday(x, label = TRUE) |
Ordered factor | Fri |
yday(x) |
Integer (1-366) | 75 |
hour(x) |
Integer (0-23) | 14 |
minute(x) |
Integer (0-59) | 30 |
second(x) |
Numeric (0-59.999…) | 45 |
All these functions also work as setters—assign a value to modify that component in place. They handle vectors and work seamlessly inside mutate() calls. For most time-series analysis in R, lubridate’s extraction functions combined with dplyr grouping operations will cover 90% of your needs.