R lubridate - Extract Year/Month/Day/Hour

Key Insights

lubridate provides intuitive accessor functions (year(), month(), day(), hour()) that extract date components with a single function call, eliminating the need for complex format() or strftime() patterns
The month() and wday() functions accept label and abbr parameters that return human-readable factor outputs, making them ideal for visualization and reporting workflows
Combining lubridate extraction functions with dplyr’s group_by() creates a powerful pattern for time-series aggregation that handles most real-world analytical scenarios

Introduction

Date manipulation in R has historically been painful. Base R’s strftime() and format() functions work, but their syntax is cryptic and error-prone. The lubridate package solves this problem with a consistent, readable API for parsing, manipulating, and extracting components from date and datetime objects.

Extracting date components—year, month, day, hour—is fundamental to time-series analysis. You need these operations when aggregating sales by quarter, analyzing traffic patterns by hour, or filtering records to specific time windows. lubridate makes these extractions trivial.

# Install if needed
# install.packages("lubridate")

library(lubridate)
library(dplyr)

# Create sample datetime objects
single_date <- ymd("2024-03-15")
single_datetime <- ymd_hms("2024-03-15 14:30:45")

# Vector of dates
date_vector <- ymd(c("2023-01-15", "2023-06-20", "2024-02-28", "2024-12-01"))

# Typical dataframe scenario
transactions <- tibble(
  transaction_id = 1:5,
  timestamp = ymd_hms(c(
    "2024-01-15 09:23:11",
    "2024-01-15 14:45:33",
    "2024-02-20 11:12:08",
    "2024-03-10 16:55:22",
    "2024-03-10 08:30:00"
  )),
  amount = c(150.00, 89.50, 234.00, 67.25, 445.00)
)

Extracting Year with `year()`

The year() function returns the four-digit year as an integer. It works on Date objects, POSIXct/POSIXlt datetime objects, and character strings that lubridate can parse.

# Single date
year(single_date)
# [1] 2024

# Datetime object
year(single_datetime)
# [1] 2024

# Vector of dates
year(date_vector)
# [1] 2023 2023 2024 2024

# Dataframe column
transactions %>%
  mutate(transaction_year = year(timestamp))
# # A tibble: 5 × 4
#   transaction_id timestamp           amount transaction_year
#            <int> <dttm>               <dbl>            <dbl>
# 1              1 2024-01-15 09:23:11  150                2024
# 2              2 2024-01-15 14:45:33   89.5              2024
# 3              3 2024-02-20 11:12:08  234                2024
# 4              4 2024-03-10 16:55:22   67.2              2024
# 5              5 2024-03-10 08:30:00  445                2024

One useful pattern: year() also works as a setter. You can modify the year component directly:

date <- ymd("2024-03-15")
year(date) <- 2025
date
# [1] "2025-03-15"

This setter pattern applies to all lubridate accessor functions.

Extracting Month with `month()`

The month() function extracts the month component, but its real power comes from the label and abbr parameters. By default, it returns an integer (1-12). Set label = TRUE to get an ordered factor with month names.

# Numeric month (default)
month(single_date)
# [1] 3

# Full month name as factor
month(single_date, label = TRUE, abbr = FALSE)
# [1] March
# 12 Levels: January < February < March < April < May < June < ... < December

# Abbreviated month name (default when label = TRUE)
month(single_date, label = TRUE)
# [1] Mar
# 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < Oct < Nov < Dec

# Practical use: grouping by month
transactions %>%
  mutate(month_name = month(timestamp, label = TRUE)) %>%
  group_by(month_name) %>%
  summarise(
    total_amount = sum(amount),
    transaction_count = n()
  )
# # A tibble: 3 × 3
#   month_name total_amount transaction_count
#   <ord>             <dbl>             <int>
# 1 Jan               239.                  2
# 2 Feb               234                   1
# 3 Mar               512.                  2

The ordered factor output is crucial for visualization. When you plot monthly data, ggplot2 respects the factor ordering, so January appears before February without manual intervention.

Extracting Day with `day()`, `mday()`, `wday()`, `yday()`

lubridate provides four functions for day extraction, each serving a different purpose:

day() / mday(): Day of the month (1-31). These are identical.
wday(): Day of the week (1-7, where 1 = Sunday by default)
yday(): Day of the year (1-366)

test_date <- ymd("2024-03-15")  # A Friday

# Day of month
day(test_date)
# [1] 15

mday(test_date)  # Same result
# [1] 15

# Day of week (numeric)
wday(test_date)
# [1] 6 (Friday, since Sunday = 1)

# Day of week with label
wday(test_date, label = TRUE)
# [1] Fri
# Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat

# Start week on Monday instead of Sunday
wday(test_date, week_start = 1)
# [1] 5 (Friday is the 5th day when Monday = 1)

# Day of year
yday(test_date)
# [1] 75 (March 15 is the 75th day of 2024)

The week_start parameter in wday() is essential for international applications. ISO 8601 defines Monday as day 1, which is standard in Europe. Set week_start = 1 for ISO compliance.

# Comparing all day functions
dates <- ymd(c("2024-01-01", "2024-03-15", "2024-12-31"))

tibble(
  date = dates,
  day_of_month = mday(dates),
  day_of_week = wday(dates, label = TRUE),
  day_of_year = yday(dates)
)
# # A tibble: 3 × 4
#   date       day_of_month day_of_week day_of_year
#   <date>            <int> <ord>             <int>
# 1 2024-01-01            1 Mon                   1
# 2 2024-03-15           15 Fri                  75
# 3 2024-12-31           31 Tue                 366

Extracting Hour/Minute/Second with `hour()`, `minute()`, `second()`

Time component extraction requires datetime objects (POSIXct/POSIXlt), not Date objects. If you try to extract hours from a Date, you’ll get zeros.

timestamp <- ymd_hms("2024-03-15 14:30:45")

hour(timestamp)
# [1] 14

minute(timestamp)
# [1] 30

second(timestamp)
# [1] 45

# Extracting from a Date object returns 0
date_only <- ymd("2024-03-15")
hour(date_only)
# [1] 0

Timezone handling matters when extracting time components. lubridate respects the timezone attached to your datetime object:

# Same instant, different timezones
utc_time <- ymd_hms("2024-03-15 14:30:00", tz = "UTC")
eastern_time <- with_tz(utc_time, "America/New_York")

hour(utc_time)
# [1] 14

hour(eastern_time)
# [1] 10 (Eastern is UTC-4 during daylight saving time)

Always be explicit about timezones when your analysis depends on local time. Use force_tz() to change the timezone label without adjusting the clock time, or with_tz() to convert to a different timezone while preserving the instant.

Practical Application: Aggregating Data by Date Components

Real analysis combines extraction functions with dplyr operations. Here’s a realistic example with sales data:

# Generate sample sales data
set.seed(42)
sales <- tibble(
  sale_id = 1:1000,
  timestamp = ymd_hms("2023-01-01 00:00:00") + 
    seconds(sample(0:(365*24*60*60), 1000, replace = TRUE)),
  amount = round(runif(1000, 10, 500), 2),
  category = sample(c("Electronics", "Clothing", "Food"), 1000, replace = TRUE)
)

# Monthly revenue summary
monthly_revenue <- sales %>%
  mutate(
    year = year(timestamp),
    month = month(timestamp, label = TRUE)
  ) %>%
  group_by(year, month) %>%
  summarise(
    total_revenue = sum(amount),
    avg_transaction = mean(amount),
    transaction_count = n(),
    .groups = "drop"
  )

print(monthly_revenue, n = 6)
# # A tibble: 12 × 5
#    year month total_revenue avg_transaction transaction_count
#   <dbl> <ord>         <dbl>           <dbl>             <int>
# 1  2023 Jan          21842.            253.                86
# 2  2023 Feb          19876.            248.                80
# 3  2023 Mar          22134.            261.                85
# ...

# Hourly traffic pattern
hourly_pattern <- sales %>%
  mutate(hour = hour(timestamp)) %>%
  group_by(hour) %>%
  summarise(
    transaction_count = n(),
    avg_amount = mean(amount)
  ) %>%
  arrange(hour)

# Day of week analysis
weekday_analysis <- sales %>%
  mutate(
    weekday = wday(timestamp, label = TRUE, week_start = 1)
  ) %>%
  group_by(weekday) %>%
  summarise(
    total_sales = sum(amount),
    transactions = n()
  )

print(weekday_analysis)
# # A tibble: 7 × 3
#   weekday total_sales transactions
#   <ord>         <dbl>        <int>
# 1 Mon          36421.          143
# 2 Tue          37892.          149
# ...

A common pattern combines year and month into a single grouping variable:

# Year-month aggregation using floor_date
sales %>%
  mutate(year_month = floor_date(timestamp, "month")) %>%
  group_by(year_month) %>%
  summarise(revenue = sum(amount))

# Or create a character key
sales %>%
  mutate(
    year_month = paste(year(timestamp), 
                       sprintf("%02d", month(timestamp)), 
                       sep = "-")
  ) %>%
  group_by(year_month) %>%
  summarise(revenue = sum(amount))

Summary

Here’s a quick reference for all lubridate extraction functions:

Function	Returns	Example Output
`year(x)`	Integer (4-digit year)	`2024`
`month(x)`	Integer (1-12)	`3`
`month(x, label = TRUE)`	Ordered factor	`Mar`
`day(x)` / `mday(x)`	Integer (1-31)	`15`
`wday(x)`	Integer (1-7)	`6`
`wday(x, label = TRUE)`	Ordered factor	`Fri`
`yday(x)`	Integer (1-366)	`75`
`hour(x)`	Integer (0-23)	`14`
`minute(x)`	Integer (0-59)	`30`
`second(x)`	Numeric (0-59.999…)	`45`

All these functions also work as setters—assign a value to modify that component in place. They handle vectors and work seamlessly inside mutate() calls. For most time-series analysis in R, lubridate’s extraction functions combined with dplyr grouping operations will cover 90% of your needs.

Introduction

Extracting Year with year()

Extracting Month with month()

Extracting Day with day(), mday(), wday(), yday()

Extracting Hour/Minute/Second with hour(), minute(), second()