R purrr - map2() and pmap() - Multiple Inputs
While `map()` handles single-input iteration elegantly, real-world data operations frequently require coordinating multiple inputs. Consider calculating weighted averages, combining data from...
Key Insights
map2()andpmap()extend purrr’s functional programming capabilities to handle multiple input vectors simultaneously, eliminating nested loops and improving code readabilitymap2()iterates over two inputs in parallel whilepmap()handles any number of inputs through lists, making complex multi-parameter operations straightforward- Type-specific variants like
map2_dbl()andpmap_chr()provide type safety and better performance by guaranteeing output types at compile time
Understanding Multiple Input Iteration
While map() handles single-input iteration elegantly, real-world data operations frequently require coordinating multiple inputs. Consider calculating weighted averages, combining data from multiple sources, or applying functions with multiple parameters. The purrr package provides map2() for two inputs and pmap() for three or more.
library(purrr)
# Single input with map()
values <- list(1:3, 4:6, 7:9)
map(values, mean)
# Two inputs with map2()
x <- list(1, 10, 100)
y <- list(1, 2, 3)
map2(x, y, `+`)
# [[1]] [1] 2
# [[2]] [1] 12
# [[3]] [1] 103
The function passed to map2() receives two arguments corresponding to elements from each input vector at the same position. This parallel iteration continues until the shorter vector is exhausted.
map2() for Paired Operations
map2() excels when you need to apply operations across two synchronized vectors. The syntax is map2(.x, .y, .f) where .f receives elements from .x and .y as its first and second arguments.
# Calculate weighted values
values <- c(100, 200, 150, 300)
weights <- c(0.2, 0.3, 0.25, 0.25)
weighted_values <- map2_dbl(values, weights, `*`)
sum(weighted_values)
# [1] 197.5
# Generate customized messages
names <- c("Alice", "Bob", "Charlie")
scores <- c(95, 87, 92)
map2_chr(names, scores, function(name, score) {
sprintf("%s scored %d points", name, score)
})
# [1] "Alice scored 95 points"
# [2] "Bob scored 87 points"
# [3] "Charlie scored 92 points"
The type-specific variants (map2_dbl(), map2_chr(), map2_lgl(), map2_int()) ensure type safety. If the function returns an incompatible type, purrr throws an error immediately rather than producing unexpected results.
Anonymous Functions and Formula Syntax
The purrr package supports concise formula syntax using ~ with .x and .y as placeholders for the two inputs.
# Traditional anonymous function
map2_dbl(values, weights, function(v, w) v * w)
# Formula syntax - more concise
map2_dbl(values, weights, ~ .x * .y)
# Complex calculation with formula syntax
prices <- c(29.99, 49.99, 19.99)
quantities <- c(3, 2, 5)
map2_dbl(prices, quantities, ~ {
subtotal <- .x * .y
tax <- subtotal * 0.08
subtotal + tax
})
# [1] 97.17 107.98 107.95
Formula syntax reduces boilerplate while maintaining readability for simple to moderately complex operations.
pmap() for Multiple Inputs
When you need more than two inputs, pmap() accepts a list of vectors and passes corresponding elements to your function. The function receives arguments in the order they appear in the input list.
# Three inputs: calculate compound interest
principals <- c(1000, 5000, 10000)
rates <- c(0.05, 0.04, 0.045)
years <- c(5, 10, 7)
inputs <- list(principals, rates, years)
pmap_dbl(inputs, function(p, r, t) {
p * (1 + r)^t
})
# [1] 1276.282 7401.220 13567.058
# Using formula syntax with ..1, ..2, ..3
pmap_dbl(inputs, ~ ..1 * (1 + ..2)^..3)
The ..1, ..2, ..3 notation in formula syntax refers to the first, second, and third elements from the input list. This works for any number of inputs.
Named Arguments with pmap()
A powerful feature of pmap() is using named lists or data frames as input, allowing you to pass arguments by name rather than position.
# Create a data frame of parameters
params <- data.frame(
mean = c(0, 10, 100),
sd = c(1, 5, 15),
n = c(10, 20, 30)
)
# Generate random normal samples
set.seed(123)
samples <- pmap(params, rnorm)
# Check first sample
samples[[1]]
# [1] -0.56047565 -0.23017749 1.55870831 ...
# Named list approach
calculations <- list(
list(x = 10, y = 5, operation = "add"),
list(x = 20, y = 4, operation = "multiply"),
list(x = 15, y = 3, operation = "divide")
)
pmap_dbl(calculations, function(x, y, operation) {
switch(operation,
add = x + y,
multiply = x * y,
divide = x / y
)
})
# [1] 15 80 5
Named arguments make code self-documenting and reduce errors from incorrect argument ordering.
Practical Application: Data Transformation Pipeline
Here’s a realistic scenario combining multiple inputs to transform and validate data.
library(dplyr)
# Raw data from different sources
ids <- c("A001", "A002", "A003", "A004")
raw_values <- c("100.5", "200.3", "invalid", "150.7")
categories <- c("premium", "standard", "premium", "standard")
multipliers <- c(1.2, 1.0, 1.2, 1.0)
# Process with validation and transformation
results <- pmap_dfr(
list(ids, raw_values, categories, multipliers),
function(id, value, category, mult) {
# Parse and validate
numeric_value <- suppressWarnings(as.numeric(value))
if (is.na(numeric_value)) {
return(data.frame(
id = id,
status = "error",
final_value = NA_real_,
category = category
))
}
# Apply business logic
adjusted <- numeric_value * mult
discount <- if (category == "premium") 0.1 else 0.05
final <- adjusted * (1 - discount)
data.frame(
id = id,
status = "success",
final_value = round(final, 2),
category = category
)
}
)
print(results)
# id status final_value category
# 1 A001 success 108.54 premium
# 2 A002 success 190.29 standard
# 3 A003 error NA premium
# 4 A004 success 143.17 standard
Performance Considerations and Error Handling
When working with large datasets, type-specific variants provide better performance and clearer error messages.
# Error handling with possibly()
safe_divide <- possibly(function(x, y) x / y, otherwise = NA_real_)
numerators <- c(10, 20, 30, 40)
denominators <- c(2, 0, 5, 0)
map2_dbl(numerators, denominators, safe_divide)
# [1] 5 NA 6 NA
# Parallel processing with furrr
library(furrr)
plan(multisession, workers = 4)
large_x <- 1:10000
large_y <- 10001:20000
# Standard map2
system.time(map2_dbl(large_x, large_y, ~ .x + .y))
# Parallel version
system.time(future_map2_dbl(large_x, large_y, ~ .x + .y))
Working with Different Length Inputs
By default, map2() and pmap() stop at the shortest input length. Understanding this behavior prevents subtle bugs.
# Mismatched lengths
x <- 1:5
y <- 1:3
map2(x, y, c)
# [[1]] [1] 1 1
# [[2]] [1] 2 2
# [[3]] [1] 3 3
# Warning message: Longer object length is not a multiple of shorter object length
# Intentional recycling with rep()
short_vec <- c(10, 20)
long_vec <- 1:6
map2_dbl(long_vec, rep(short_vec, length.out = length(long_vec)), `*`)
# [1] 10 40 30 80 50 120
The map2() and pmap() functions bring functional programming elegance to multi-input operations. They eliminate index management, reduce cognitive load, and produce more maintainable code than traditional loops. Type-specific variants add safety, while formula syntax keeps code concise. For complex data processing pipelines requiring multiple coordinated inputs, these functions are indispensable tools in the R programmer’s toolkit.