R lubridate - Extract Year/Month/Day/Hour

Date manipulation in R has historically been painful. Base R's `strftime()` and `format()` functions work, but their syntax is cryptic and error-prone. The lubridate package solves this problem with...

Key Insights

  • lubridate provides intuitive accessor functions (year(), month(), day(), hour()) that extract date components with a single function call, eliminating the need for complex format() or strftime() patterns
  • The month() and wday() functions accept label and abbr parameters that return human-readable factor outputs, making them ideal for visualization and reporting workflows
  • Combining lubridate extraction functions with dplyr’s group_by() creates a powerful pattern for time-series aggregation that handles most real-world analytical scenarios

Introduction

Date manipulation in R has historically been painful. Base R’s strftime() and format() functions work, but their syntax is cryptic and error-prone. The lubridate package solves this problem with a consistent, readable API for parsing, manipulating, and extracting components from date and datetime objects.

Extracting date components—year, month, day, hour—is fundamental to time-series analysis. You need these operations when aggregating sales by quarter, analyzing traffic patterns by hour, or filtering records to specific time windows. lubridate makes these extractions trivial.

# Install if needed
# install.packages("lubridate")

library(lubridate)
library(dplyr)

# Create sample datetime objects
single_date <- ymd("2024-03-15")
single_datetime <- ymd_hms("2024-03-15 14:30:45")

# Vector of dates
date_vector <- ymd(c("2023-01-15", "2023-06-20", "2024-02-28", "2024-12-01"))

# Typical dataframe scenario
transactions <- tibble(
  transaction_id = 1:5,
  timestamp = ymd_hms(c(
    "2024-01-15 09:23:11",
    "2024-01-15 14:45:33",
    "2024-02-20 11:12:08",
    "2024-03-10 16:55:22",
    "2024-03-10 08:30:00"
  )),
  amount = c(150.00, 89.50, 234.00, 67.25, 445.00)
)

Extracting Year with year()

The year() function returns the four-digit year as an integer. It works on Date objects, POSIXct/POSIXlt datetime objects, and character strings that lubridate can parse.

# Single date
year(single_date)
# [1] 2024

# Datetime object
year(single_datetime)
# [1] 2024

# Vector of dates
year(date_vector)
# [1] 2023 2023 2024 2024

# Dataframe column
transactions %>%
  mutate(transaction_year = year(timestamp))
# # A tibble: 5 × 4
#   transaction_id timestamp           amount transaction_year
#            <int> <dttm>               <dbl>            <dbl>
# 1              1 2024-01-15 09:23:11  150                2024
# 2              2 2024-01-15 14:45:33   89.5              2024
# 3              3 2024-02-20 11:12:08  234                2024
# 4              4 2024-03-10 16:55:22   67.2              2024
# 5              5 2024-03-10 08:30:00  445                2024

One useful pattern: year() also works as a setter. You can modify the year component directly:

date <- ymd("2024-03-15")
year(date) <- 2025
date
# [1] "2025-03-15"

This setter pattern applies to all lubridate accessor functions.

Extracting Month with month()

The month() function extracts the month component, but its real power comes from the label and abbr parameters. By default, it returns an integer (1-12). Set label = TRUE to get an ordered factor with month names.

# Numeric month (default)
month(single_date)
# [1] 3

# Full month name as factor
month(single_date, label = TRUE, abbr = FALSE)
# [1] March
# 12 Levels: January < February < March < April < May < June < ... < December

# Abbreviated month name (default when label = TRUE)
month(single_date, label = TRUE)
# [1] Mar
# 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < Oct < Nov < Dec

# Practical use: grouping by month
transactions %>%
  mutate(month_name = month(timestamp, label = TRUE)) %>%
  group_by(month_name) %>%
  summarise(
    total_amount = sum(amount),
    transaction_count = n()
  )
# # A tibble: 3 × 3
#   month_name total_amount transaction_count
#   <ord>             <dbl>             <int>
# 1 Jan               239.                  2
# 2 Feb               234                   1
# 3 Mar               512.                  2

The ordered factor output is crucial for visualization. When you plot monthly data, ggplot2 respects the factor ordering, so January appears before February without manual intervention.

Extracting Day with day(), mday(), wday(), yday()

lubridate provides four functions for day extraction, each serving a different purpose:

  • day() / mday(): Day of the month (1-31). These are identical.
  • wday(): Day of the week (1-7, where 1 = Sunday by default)
  • yday(): Day of the year (1-366)
test_date <- ymd("2024-03-15")  # A Friday

# Day of month
day(test_date)
# [1] 15

mday(test_date)  # Same result
# [1] 15

# Day of week (numeric)
wday(test_date)
# [1] 6 (Friday, since Sunday = 1)

# Day of week with label
wday(test_date, label = TRUE)
# [1] Fri
# Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat

# Start week on Monday instead of Sunday
wday(test_date, week_start = 1)
# [1] 5 (Friday is the 5th day when Monday = 1)

# Day of year
yday(test_date)
# [1] 75 (March 15 is the 75th day of 2024)

The week_start parameter in wday() is essential for international applications. ISO 8601 defines Monday as day 1, which is standard in Europe. Set week_start = 1 for ISO compliance.

# Comparing all day functions
dates <- ymd(c("2024-01-01", "2024-03-15", "2024-12-31"))

tibble(
  date = dates,
  day_of_month = mday(dates),
  day_of_week = wday(dates, label = TRUE),
  day_of_year = yday(dates)
)
# # A tibble: 3 × 4
#   date       day_of_month day_of_week day_of_year
#   <date>            <int> <ord>             <int>
# 1 2024-01-01            1 Mon                   1
# 2 2024-03-15           15 Fri                  75
# 3 2024-12-31           31 Tue                 366

Extracting Hour/Minute/Second with hour(), minute(), second()

Time component extraction requires datetime objects (POSIXct/POSIXlt), not Date objects. If you try to extract hours from a Date, you’ll get zeros.

timestamp <- ymd_hms("2024-03-15 14:30:45")

hour(timestamp)
# [1] 14

minute(timestamp)
# [1] 30

second(timestamp)
# [1] 45

# Extracting from a Date object returns 0
date_only <- ymd("2024-03-15")
hour(date_only)
# [1] 0

Timezone handling matters when extracting time components. lubridate respects the timezone attached to your datetime object:

# Same instant, different timezones
utc_time <- ymd_hms("2024-03-15 14:30:00", tz = "UTC")
eastern_time <- with_tz(utc_time, "America/New_York")

hour(utc_time)
# [1] 14

hour(eastern_time)
# [1] 10 (Eastern is UTC-4 during daylight saving time)

Always be explicit about timezones when your analysis depends on local time. Use force_tz() to change the timezone label without adjusting the clock time, or with_tz() to convert to a different timezone while preserving the instant.

Practical Application: Aggregating Data by Date Components

Real analysis combines extraction functions with dplyr operations. Here’s a realistic example with sales data:

# Generate sample sales data
set.seed(42)
sales <- tibble(
  sale_id = 1:1000,
  timestamp = ymd_hms("2023-01-01 00:00:00") + 
    seconds(sample(0:(365*24*60*60), 1000, replace = TRUE)),
  amount = round(runif(1000, 10, 500), 2),
  category = sample(c("Electronics", "Clothing", "Food"), 1000, replace = TRUE)
)

# Monthly revenue summary
monthly_revenue <- sales %>%
  mutate(
    year = year(timestamp),
    month = month(timestamp, label = TRUE)
  ) %>%
  group_by(year, month) %>%
  summarise(
    total_revenue = sum(amount),
    avg_transaction = mean(amount),
    transaction_count = n(),
    .groups = "drop"
  )

print(monthly_revenue, n = 6)
# # A tibble: 12 × 5
#    year month total_revenue avg_transaction transaction_count
#   <dbl> <ord>         <dbl>           <dbl>             <int>
# 1  2023 Jan          21842.            253.                86
# 2  2023 Feb          19876.            248.                80
# 3  2023 Mar          22134.            261.                85
# ...

# Hourly traffic pattern
hourly_pattern <- sales %>%
  mutate(hour = hour(timestamp)) %>%
  group_by(hour) %>%
  summarise(
    transaction_count = n(),
    avg_amount = mean(amount)
  ) %>%
  arrange(hour)

# Day of week analysis
weekday_analysis <- sales %>%
  mutate(
    weekday = wday(timestamp, label = TRUE, week_start = 1)
  ) %>%
  group_by(weekday) %>%
  summarise(
    total_sales = sum(amount),
    transactions = n()
  )

print(weekday_analysis)
# # A tibble: 7 × 3
#   weekday total_sales transactions
#   <ord>         <dbl>        <int>
# 1 Mon          36421.          143
# 2 Tue          37892.          149
# ...

A common pattern combines year and month into a single grouping variable:

# Year-month aggregation using floor_date
sales %>%
  mutate(year_month = floor_date(timestamp, "month")) %>%
  group_by(year_month) %>%
  summarise(revenue = sum(amount))

# Or create a character key
sales %>%
  mutate(
    year_month = paste(year(timestamp), 
                       sprintf("%02d", month(timestamp)), 
                       sep = "-")
  ) %>%
  group_by(year_month) %>%
  summarise(revenue = sum(amount))

Summary

Here’s a quick reference for all lubridate extraction functions:

Function Returns Example Output
year(x) Integer (4-digit year) 2024
month(x) Integer (1-12) 3
month(x, label = TRUE) Ordered factor Mar
day(x) / mday(x) Integer (1-31) 15
wday(x) Integer (1-7) 6
wday(x, label = TRUE) Ordered factor Fri
yday(x) Integer (1-366) 75
hour(x) Integer (0-23) 14
minute(x) Integer (0-59) 30
second(x) Numeric (0-59.999…) 45

All these functions also work as setters—assign a value to modify that component in place. They handle vectors and work seamlessly inside mutate() calls. For most time-series analysis in R, lubridate’s extraction functions combined with dplyr grouping operations will cover 90% of your needs.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.