R purrr - map_df()/map_dbl()/map_chr()
Base R's `lapply()` always returns a list. You then coerce it to your desired type, often discovering type mismatches late in execution. The purrr approach enforces types immediately:
Key Insights
- The
map_*()family in purrr provides type-stable alternatives to base R’slapply(), guaranteeing output types and failing fast when expectations aren’t met map_df()(now superseded bymap_dfr()andlist_rbind()) simplifies the common pattern of iterating and combining results into data frames without manualdo.call(rbind, ...)operations- Type-specific mappers like
map_dbl()andmap_chr()enforce output constraints at runtime, catching type mismatches immediately rather than propagating incorrect data through your pipeline
Why Type-Specific Mapping Matters
Base R’s lapply() always returns a list. You then coerce it to your desired type, often discovering type mismatches late in execution. The purrr approach enforces types immediately:
library(purrr)
# Base R approach - no type safety
numbers <- list(1, 2, 3, "4")
result <- lapply(numbers, function(x) x * 2)
unlist(result) # Silently converts to character: "2" "4" "6" "8"
# purrr approach - fails fast
map_dbl(numbers, ~ .x * 2)
# Error: Can't coerce element 4 from a character to a double
This fail-fast behavior prevents bugs from propagating through your data pipeline.
map_dbl() for Numeric Vectors
map_dbl() iterates over a list or vector and returns a numeric vector. It’s ideal for extracting numeric properties or performing calculations:
library(purrr)
# Extract numeric summaries from nested data
sales_data <- list(
q1 = c(100, 150, 200),
q2 = c(120, 180, 210),
q3 = c(140, 190, 220),
q4 = c(160, 200, 240)
)
# Calculate mean sales per quarter
quarterly_means <- map_dbl(sales_data, mean)
quarterly_means
# q1 q2 q3 q4
# 150 170 183 200
# Works with data frames too
mtcars_list <- split(mtcars, mtcars$cyl)
map_dbl(mtcars_list, ~ mean(.x$mpg))
# 4 6 8
# 26.66 19.74 15.10
The function signature accepts both formula notation (~ .x) and standard functions. Use .x as the placeholder in formulas:
# Three equivalent approaches
map_dbl(1:5, ~ .x^2)
map_dbl(1:5, function(x) x^2)
map_dbl(1:5, `^`, 2)
map_chr() for Character Vectors
map_chr() guarantees character output, useful for extracting names, labels, or formatted strings:
# Extract file extensions
files <- list(
"data/sales_2024.csv",
"reports/summary.xlsx",
"scripts/analysis.R"
)
map_chr(files, ~ tools::file_ext(.x))
# [1] "csv" "xlsx" "R"
# Format output strings
products <- list(
list(name = "Widget", price = 29.99, stock = 150),
list(name = "Gadget", price = 49.99, stock = 75),
list(name = "Doohickey", price = 19.99, stock = 200)
)
map_chr(products, ~ sprintf("%s: $%.2f (%d in stock)",
.x$name, .x$price, .x$stock))
# [1] "Widget: $29.99 (150 in stock)"
# [2] "Gadget: $49.99 (75 in stock)"
# [3] "Doohickey: $19.99 (200 in stock)"
map_dfr() and map_dfc() for Data Frames
While map_df() still works, it’s superseded by map_dfr() (row-bind) and map_dfc() (column-bind). These functions iterate and combine results into a single data frame:
library(dplyr)
# Simulate reading multiple CSV files
file_paths <- c("data1.csv", "data2.csv", "data3.csv")
# Simulate file reading with sample data
read_simulated <- function(path) {
data.frame(
file = basename(path),
value = rnorm(3),
timestamp = Sys.time()
)
}
# Row-bind results
combined <- map_dfr(file_paths, read_simulated)
combined
# file value timestamp
# 1 data1.csv 0.5855288 2024-01-15 10:30:00
# 2 data1.csv -0.1093033 2024-01-15 10:30:00
# 3 data1.csv 1.5878453 2024-01-15 10:30:00
# 4 data2.csv -0.4456620 2024-01-15 10:30:00
# ...
For column-binding, use map_dfc():
# Create multiple summary columns
metrics <- list(
mean = ~ mean(.x, na.rm = TRUE),
median = ~ median(.x, na.rm = TRUE),
sd = ~ sd(.x, na.rm = TRUE)
)
map_dfc(metrics, ~ .x(mtcars$mpg))
# mean median sd
# 1 20.09 19.2 6.03
Modern Alternative: list_rbind() and list_cbind()
The tidyverse now recommends map() combined with list_rbind() or list_cbind() from purrr 1.0.0+:
# Modern approach
file_paths %>%
map(read_simulated) %>%
list_rbind()
# Add ID column to track source
file_paths %>%
map(read_simulated) %>%
list_rbind(names_to = "source_id")
This separation provides more flexibility and clearer intent.
Practical Pattern: API Response Processing
Type-specific mapping excels when processing API responses:
# Simulate API responses
api_responses <- list(
list(id = 1, status = "success", value = 42, message = "OK"),
list(id = 2, status = "success", value = 38, message = "OK"),
list(id = 3, status = "error", value = NULL, message = "Timeout")
)
# Extract specific fields with type guarantees
ids <- map_dbl(api_responses, "id")
statuses <- map_chr(api_responses, "status")
# Handle NULL values with defaults
values <- map_dbl(api_responses, "value", .default = NA_real_)
# Combine into analysis-ready data frame
data.frame(
id = ids,
status = statuses,
value = values
)
# id status value
# 1 1 success 42
# 2 2 success 38
# 3 3 error NA
Error Handling with possibly() and safely()
Wrap functions with possibly() to handle failures gracefully:
# Function that might fail
risky_calc <- function(x) {
if (x < 0) stop("Negative values not allowed")
sqrt(x)
}
numbers <- c(4, 9, -1, 16)
# Without protection - fails completely
# map_dbl(numbers, risky_calc) # Error stops execution
# With possibly() - returns default on error
safe_calc <- possibly(risky_calc, otherwise = NA_real_)
map_dbl(numbers, safe_calc)
# [1] 2 3 NA 4
For detailed error information, use safely():
results <- map(numbers, safely(risky_calc))
successes <- map_dbl(results, "result", .default = NA_real_)
errors <- map_chr(results, ~ if(is.null(.x$error)) NA_character_ else .x$error$message)
data.frame(input = numbers, result = successes, error = errors)
# input result error
# 1 4 2 <NA>
# 2 9 3 <NA>
# 3 -1 NA Negative values not allowed
# 4 16 4 <NA>
Performance Considerations
Type-specific mappers add minimal overhead compared to base R:
library(bench)
large_list <- as.list(1:10000)
mark(
base = vapply(large_list, sqrt, numeric(1)),
purrr = map_dbl(large_list, sqrt),
iterations = 100
)
# expression min median itr/sec
# 1 base 2.1ms 2.3ms 423
# 2 purrr 2.4ms 2.6ms 378
The slight overhead buys type safety and cleaner syntax. For performance-critical code with guaranteed types, vapply() remains fastest.
Nested Mapping Operations
Combine type-specific mappers for complex transformations:
# Nested list structure
nested_data <- list(
group_a = list(x = 1:3, y = 4:6),
group_b = list(x = 7:9, y = 10:12)
)
# Calculate means for each sublist
map(nested_data, ~ map_dbl(.x, mean))
# $group_a
# x y
# 2 5
#
# $group_b
# x y
# 8 11
# Flatten to named vector
nested_data %>%
map(~ map_dbl(.x, mean)) %>%
flatten_dbl()
# group_a.x group_a.y group_b.x group_b.y
# 2 5 8 11
The purrr type-specific mappers transform functional programming in R from error-prone list manipulation into type-safe, expressive data transformations. Use map_dbl() and map_chr() for guaranteed atomic vectors, and combine map() with list_rbind() for modern data frame operations.