R purrr - map_df()/map_dbl()/map_chr()

Base R's `lapply()` always returns a list. You then coerce it to your desired type, often discovering type mismatches late in execution. The purrr approach enforces types immediately:

Key Insights

  • The map_*() family in purrr provides type-stable alternatives to base R’s lapply(), guaranteeing output types and failing fast when expectations aren’t met
  • map_df() (now superseded by map_dfr() and list_rbind()) simplifies the common pattern of iterating and combining results into data frames without manual do.call(rbind, ...) operations
  • Type-specific mappers like map_dbl() and map_chr() enforce output constraints at runtime, catching type mismatches immediately rather than propagating incorrect data through your pipeline

Why Type-Specific Mapping Matters

Base R’s lapply() always returns a list. You then coerce it to your desired type, often discovering type mismatches late in execution. The purrr approach enforces types immediately:

library(purrr)

# Base R approach - no type safety
numbers <- list(1, 2, 3, "4")
result <- lapply(numbers, function(x) x * 2)
unlist(result)  # Silently converts to character: "2" "4" "6" "8"

# purrr approach - fails fast
map_dbl(numbers, ~ .x * 2)
# Error: Can't coerce element 4 from a character to a double

This fail-fast behavior prevents bugs from propagating through your data pipeline.

map_dbl() for Numeric Vectors

map_dbl() iterates over a list or vector and returns a numeric vector. It’s ideal for extracting numeric properties or performing calculations:

library(purrr)

# Extract numeric summaries from nested data
sales_data <- list(
  q1 = c(100, 150, 200),
  q2 = c(120, 180, 210),
  q3 = c(140, 190, 220),
  q4 = c(160, 200, 240)
)

# Calculate mean sales per quarter
quarterly_means <- map_dbl(sales_data, mean)
quarterly_means
#  q1  q2  q3  q4 
# 150 170 183 200

# Works with data frames too
mtcars_list <- split(mtcars, mtcars$cyl)
map_dbl(mtcars_list, ~ mean(.x$mpg))
#     4     6     8 
# 26.66 19.74 15.10

The function signature accepts both formula notation (~ .x) and standard functions. Use .x as the placeholder in formulas:

# Three equivalent approaches
map_dbl(1:5, ~ .x^2)
map_dbl(1:5, function(x) x^2)
map_dbl(1:5, `^`, 2)

map_chr() for Character Vectors

map_chr() guarantees character output, useful for extracting names, labels, or formatted strings:

# Extract file extensions
files <- list(
  "data/sales_2024.csv",
  "reports/summary.xlsx",
  "scripts/analysis.R"
)

map_chr(files, ~ tools::file_ext(.x))
# [1] "csv"  "xlsx" "R"

# Format output strings
products <- list(
  list(name = "Widget", price = 29.99, stock = 150),
  list(name = "Gadget", price = 49.99, stock = 75),
  list(name = "Doohickey", price = 19.99, stock = 200)
)

map_chr(products, ~ sprintf("%s: $%.2f (%d in stock)", 
                            .x$name, .x$price, .x$stock))
# [1] "Widget: $29.99 (150 in stock)"    
# [2] "Gadget: $49.99 (75 in stock)"     
# [3] "Doohickey: $19.99 (200 in stock)"

map_dfr() and map_dfc() for Data Frames

While map_df() still works, it’s superseded by map_dfr() (row-bind) and map_dfc() (column-bind). These functions iterate and combine results into a single data frame:

library(dplyr)

# Simulate reading multiple CSV files
file_paths <- c("data1.csv", "data2.csv", "data3.csv")

# Simulate file reading with sample data
read_simulated <- function(path) {
  data.frame(
    file = basename(path),
    value = rnorm(3),
    timestamp = Sys.time()
  )
}

# Row-bind results
combined <- map_dfr(file_paths, read_simulated)
combined
#         file      value           timestamp
# 1 data1.csv  0.5855288 2024-01-15 10:30:00
# 2 data1.csv -0.1093033 2024-01-15 10:30:00
# 3 data1.csv  1.5878453 2024-01-15 10:30:00
# 4 data2.csv -0.4456620 2024-01-15 10:30:00
# ...

For column-binding, use map_dfc():

# Create multiple summary columns
metrics <- list(
  mean = ~ mean(.x, na.rm = TRUE),
  median = ~ median(.x, na.rm = TRUE),
  sd = ~ sd(.x, na.rm = TRUE)
)

map_dfc(metrics, ~ .x(mtcars$mpg))
#   mean median    sd
# 1 20.09   19.2  6.03

Modern Alternative: list_rbind() and list_cbind()

The tidyverse now recommends map() combined with list_rbind() or list_cbind() from purrr 1.0.0+:

# Modern approach
file_paths %>%
  map(read_simulated) %>%
  list_rbind()

# Add ID column to track source
file_paths %>%
  map(read_simulated) %>%
  list_rbind(names_to = "source_id")

This separation provides more flexibility and clearer intent.

Practical Pattern: API Response Processing

Type-specific mapping excels when processing API responses:

# Simulate API responses
api_responses <- list(
  list(id = 1, status = "success", value = 42, message = "OK"),
  list(id = 2, status = "success", value = 38, message = "OK"),
  list(id = 3, status = "error", value = NULL, message = "Timeout")
)

# Extract specific fields with type guarantees
ids <- map_dbl(api_responses, "id")
statuses <- map_chr(api_responses, "status")

# Handle NULL values with defaults
values <- map_dbl(api_responses, "value", .default = NA_real_)

# Combine into analysis-ready data frame
data.frame(
  id = ids,
  status = statuses,
  value = values
)
#   id  status value
# 1  1 success    42
# 2  2 success    38
# 3  3   error    NA

Error Handling with possibly() and safely()

Wrap functions with possibly() to handle failures gracefully:

# Function that might fail
risky_calc <- function(x) {
  if (x < 0) stop("Negative values not allowed")
  sqrt(x)
}

numbers <- c(4, 9, -1, 16)

# Without protection - fails completely
# map_dbl(numbers, risky_calc)  # Error stops execution

# With possibly() - returns default on error
safe_calc <- possibly(risky_calc, otherwise = NA_real_)
map_dbl(numbers, safe_calc)
# [1] 2 3 NA 4

For detailed error information, use safely():

results <- map(numbers, safely(risky_calc))
successes <- map_dbl(results, "result", .default = NA_real_)
errors <- map_chr(results, ~ if(is.null(.x$error)) NA_character_ else .x$error$message)

data.frame(input = numbers, result = successes, error = errors)
#   input result                         error
# 1     4      2                          <NA>
# 2     9      3                          <NA>
# 3    -1     NA Negative values not allowed
# 4    16      4                          <NA>

Performance Considerations

Type-specific mappers add minimal overhead compared to base R:

library(bench)

large_list <- as.list(1:10000)

mark(
  base = vapply(large_list, sqrt, numeric(1)),
  purrr = map_dbl(large_list, sqrt),
  iterations = 100
)
#   expression   min median  itr/sec
# 1 base       2.1ms  2.3ms     423
# 2 purrr      2.4ms  2.6ms     378

The slight overhead buys type safety and cleaner syntax. For performance-critical code with guaranteed types, vapply() remains fastest.

Nested Mapping Operations

Combine type-specific mappers for complex transformations:

# Nested list structure
nested_data <- list(
  group_a = list(x = 1:3, y = 4:6),
  group_b = list(x = 7:9, y = 10:12)
)

# Calculate means for each sublist
map(nested_data, ~ map_dbl(.x, mean))
# $group_a
# x y 
# 2 5 
# 
# $group_b
# x  y 
# 8 11

# Flatten to named vector
nested_data %>%
  map(~ map_dbl(.x, mean)) %>%
  flatten_dbl()
# group_a.x group_a.y group_b.x group_b.y 
#         2         5         8        11

The purrr type-specific mappers transform functional programming in R from error-prone list manipulation into type-safe, expressive data transformations. Use map_dbl() and map_chr() for guaranteed atomic vectors, and combine map() with list_rbind() for modern data frame operations.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.