R purrr - map2() and pmap() - Multiple Inputs

While `map()` handles single-input iteration elegantly, real-world data operations frequently require coordinating multiple inputs. Consider calculating weighted averages, combining data from...

Key Insights

  • map2() and pmap() extend purrr’s functional programming capabilities to handle multiple input vectors simultaneously, eliminating nested loops and improving code readability
  • map2() iterates over two inputs in parallel while pmap() handles any number of inputs through lists, making complex multi-parameter operations straightforward
  • Type-specific variants like map2_dbl() and pmap_chr() provide type safety and better performance by guaranteeing output types at compile time

Understanding Multiple Input Iteration

While map() handles single-input iteration elegantly, real-world data operations frequently require coordinating multiple inputs. Consider calculating weighted averages, combining data from multiple sources, or applying functions with multiple parameters. The purrr package provides map2() for two inputs and pmap() for three or more.

library(purrr)

# Single input with map()
values <- list(1:3, 4:6, 7:9)
map(values, mean)

# Two inputs with map2()
x <- list(1, 10, 100)
y <- list(1, 2, 3)
map2(x, y, `+`)
# [[1]] [1] 2
# [[2]] [1] 12
# [[3]] [1] 103

The function passed to map2() receives two arguments corresponding to elements from each input vector at the same position. This parallel iteration continues until the shorter vector is exhausted.

map2() for Paired Operations

map2() excels when you need to apply operations across two synchronized vectors. The syntax is map2(.x, .y, .f) where .f receives elements from .x and .y as its first and second arguments.

# Calculate weighted values
values <- c(100, 200, 150, 300)
weights <- c(0.2, 0.3, 0.25, 0.25)

weighted_values <- map2_dbl(values, weights, `*`)
sum(weighted_values)
# [1] 197.5

# Generate customized messages
names <- c("Alice", "Bob", "Charlie")
scores <- c(95, 87, 92)

map2_chr(names, scores, function(name, score) {
  sprintf("%s scored %d points", name, score)
})
# [1] "Alice scored 95 points"   
# [2] "Bob scored 87 points"     
# [3] "Charlie scored 92 points"

The type-specific variants (map2_dbl(), map2_chr(), map2_lgl(), map2_int()) ensure type safety. If the function returns an incompatible type, purrr throws an error immediately rather than producing unexpected results.

Anonymous Functions and Formula Syntax

The purrr package supports concise formula syntax using ~ with .x and .y as placeholders for the two inputs.

# Traditional anonymous function
map2_dbl(values, weights, function(v, w) v * w)

# Formula syntax - more concise
map2_dbl(values, weights, ~ .x * .y)

# Complex calculation with formula syntax
prices <- c(29.99, 49.99, 19.99)
quantities <- c(3, 2, 5)

map2_dbl(prices, quantities, ~ {
  subtotal <- .x * .y
  tax <- subtotal * 0.08
  subtotal + tax
})
# [1]  97.17  107.98  107.95

Formula syntax reduces boilerplate while maintaining readability for simple to moderately complex operations.

pmap() for Multiple Inputs

When you need more than two inputs, pmap() accepts a list of vectors and passes corresponding elements to your function. The function receives arguments in the order they appear in the input list.

# Three inputs: calculate compound interest
principals <- c(1000, 5000, 10000)
rates <- c(0.05, 0.04, 0.045)
years <- c(5, 10, 7)

inputs <- list(principals, rates, years)

pmap_dbl(inputs, function(p, r, t) {
  p * (1 + r)^t
})
# [1]  1276.282  7401.220 13567.058

# Using formula syntax with ..1, ..2, ..3
pmap_dbl(inputs, ~ ..1 * (1 + ..2)^..3)

The ..1, ..2, ..3 notation in formula syntax refers to the first, second, and third elements from the input list. This works for any number of inputs.

Named Arguments with pmap()

A powerful feature of pmap() is using named lists or data frames as input, allowing you to pass arguments by name rather than position.

# Create a data frame of parameters
params <- data.frame(
  mean = c(0, 10, 100),
  sd = c(1, 5, 15),
  n = c(10, 20, 30)
)

# Generate random normal samples
set.seed(123)
samples <- pmap(params, rnorm)

# Check first sample
samples[[1]]
# [1] -0.56047565 -0.23017749  1.55870831 ...

# Named list approach
calculations <- list(
  list(x = 10, y = 5, operation = "add"),
  list(x = 20, y = 4, operation = "multiply"),
  list(x = 15, y = 3, operation = "divide")
)

pmap_dbl(calculations, function(x, y, operation) {
  switch(operation,
    add = x + y,
    multiply = x * y,
    divide = x / y
  )
})
# [1] 15 80  5

Named arguments make code self-documenting and reduce errors from incorrect argument ordering.

Practical Application: Data Transformation Pipeline

Here’s a realistic scenario combining multiple inputs to transform and validate data.

library(dplyr)

# Raw data from different sources
ids <- c("A001", "A002", "A003", "A004")
raw_values <- c("100.5", "200.3", "invalid", "150.7")
categories <- c("premium", "standard", "premium", "standard")
multipliers <- c(1.2, 1.0, 1.2, 1.0)

# Process with validation and transformation
results <- pmap_dfr(
  list(ids, raw_values, categories, multipliers),
  function(id, value, category, mult) {
    # Parse and validate
    numeric_value <- suppressWarnings(as.numeric(value))
    
    if (is.na(numeric_value)) {
      return(data.frame(
        id = id,
        status = "error",
        final_value = NA_real_,
        category = category
      ))
    }
    
    # Apply business logic
    adjusted <- numeric_value * mult
    discount <- if (category == "premium") 0.1 else 0.05
    final <- adjusted * (1 - discount)
    
    data.frame(
      id = id,
      status = "success",
      final_value = round(final, 2),
      category = category
    )
  }
)

print(results)
#     id  status final_value category
# 1 A001 success      108.54  premium
# 2 A002 success      190.29 standard
# 3 A003   error          NA  premium
# 4 A004 success      143.17 standard

Performance Considerations and Error Handling

When working with large datasets, type-specific variants provide better performance and clearer error messages.

# Error handling with possibly()
safe_divide <- possibly(function(x, y) x / y, otherwise = NA_real_)

numerators <- c(10, 20, 30, 40)
denominators <- c(2, 0, 5, 0)

map2_dbl(numerators, denominators, safe_divide)
# [1]  5 NA  6 NA

# Parallel processing with furrr
library(furrr)
plan(multisession, workers = 4)

large_x <- 1:10000
large_y <- 10001:20000

# Standard map2
system.time(map2_dbl(large_x, large_y, ~ .x + .y))

# Parallel version
system.time(future_map2_dbl(large_x, large_y, ~ .x + .y))

Working with Different Length Inputs

By default, map2() and pmap() stop at the shortest input length. Understanding this behavior prevents subtle bugs.

# Mismatched lengths
x <- 1:5
y <- 1:3

map2(x, y, c)
# [[1]] [1] 1 1
# [[2]] [1] 2 2
# [[3]] [1] 3 3
# Warning message: Longer object length is not a multiple of shorter object length

# Intentional recycling with rep()
short_vec <- c(10, 20)
long_vec <- 1:6

map2_dbl(long_vec, rep(short_vec, length.out = length(long_vec)), `*`)
# [1]  10  40  30  80  50 120

The map2() and pmap() functions bring functional programming elegance to multi-input operations. They eliminate index management, reduce cognitive load, and produce more maintainable code than traditional loops. Type-specific variants add safety, while formula syntax keeps code concise. For complex data processing pipelines requiring multiple coordinated inputs, these functions are indispensable tools in the R programmer’s toolkit.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.