R purrr - reduce() and accumulate()

library(purrr)

Key Insights

  • reduce() collapses a list into a single value by iteratively applying a binary function, while accumulate() returns all intermediate results, making it ideal for tracking transformations step-by-step
  • Both functions support .init parameters for starting values, directional control with .dir, and custom error handling, enabling complex data pipeline patterns beyond simple aggregation
  • Real-world applications include cumulative calculations, nested list flattening, sequential data validation, and building complex objects from component parts with full audit trails

Understanding reduce() Fundamentals

reduce() takes a list and a binary function, applying that function recursively to collapse the list into a single value. Think of it as a left fold operation from functional programming.

library(purrr)

# Basic sum reduction
numbers <- list(1, 2, 3, 4, 5)
reduce(numbers, `+`)
# [1] 15

# Equivalent to: ((((1 + 2) + 3) + 4) + 5)

The function signature is reduce(.x, .f, ..., .init, .dir). The .f parameter accepts any binary function that takes two arguments. Here’s how it processes:

# Custom binary function
multiply_and_log <- function(x, y) {
  result <- x * y
  message(sprintf("%d * %d = %d", x, y, result))
  result
}

reduce(1:4, multiply_and_log)
# 1 * 2 = 2
# 2 * 3 = 6
# 6 * 4 = 24
# [1] 24

Using .init for Starting Values

The .init parameter provides an initial accumulator value, crucial for operations requiring a seed value or when working with empty lists:

# Without .init - uses first element
reduce(1:5, `+`)
# [1] 15

# With .init - starts from 100
reduce(1:5, `+`, .init = 100)
# [1] 115

# Critical for empty lists
reduce(list(), `+`, .init = 0)
# [1] 0

reduce(list(), `+`)
# Error: `.x` is empty, and no `.init` supplied

Practical example with data frames:

# Combine multiple data frames
df_list <- list(
  data.frame(id = 1:3, val = c("a", "b", "c")),
  data.frame(id = 4:6, val = c("d", "e", "f")),
  data.frame(id = 7:9, val = c("g", "h", "i"))
)

combined <- reduce(df_list, rbind)
print(combined)
#   id val
# 1  1   a
# 2  2   b
# 3  3   c
# 4  4   d
# 5  5   e
# 6  6   f
# 7  7   g
# 8  8   h
# 9  9   i

Direction Control with .dir

The .dir parameter controls whether reduction proceeds left-to-right ("forward") or right-to-left ("backward"):

# Forward (default): ((1 - 2) - 3) - 4 = -8
reduce(1:4, `-`, .dir = "forward")
# [1] -8

# Backward: 1 - (2 - (3 - 4)) = 2
reduce(1:4, `-`, .dir = "backward")
# [1] 2

# String concatenation shows the difference clearly
words <- list("alpha", "beta", "gamma")

reduce(words, paste, .dir = "forward")
# [1] "alpha beta gamma"

reduce(words, paste, .dir = "backward")
# [1] "alpha beta gamma"  # Same result for commutative operation

# Non-commutative example
build_path <- function(x, y) paste0(x, "/", y)

reduce(c("home", "user", "documents"), build_path, .dir = "forward")
# [1] "home/user/documents"

reduce(c("home", "user", "documents"), build_path, .dir = "backward")
# [1] "home/user/documents"  # Still builds left-to-right due to function logic

accumulate() for Intermediate Results

accumulate() uses the same signature as reduce() but returns all intermediate values:

# Compare reduce vs accumulate
numbers <- 1:5

reduce(numbers, `+`)
# [1] 15

accumulate(numbers, `+`)
# [1]  1  3  6 10 15

This is invaluable for cumulative calculations and debugging:

# Cumulative product
accumulate(1:5, `*`)
# [1]   1   2   6  24 120

# Running maximum
values <- c(3, 1, 4, 1, 5, 9, 2, 6)
accumulate(values, max)
# [1] 3 3 4 4 5 9 9 9

# Compound interest calculation
principal <- 1000
rates <- c(0.05, 0.04, 0.06, 0.03, 0.05)

balances <- accumulate(rates, function(balance, rate) {
  balance * (1 + rate)
}, .init = principal)

balances
# [1] 1000.00 1050.00 1092.00 1157.52 1192.25 1251.86

Flattening Nested Structures

reduce() excels at flattening nested lists:

# Flatten one level
nested <- list(
  list(1, 2),
  list(3, 4),
  list(5, 6)
)

reduce(nested, c)
# [1] 1 2 3 4 5 6

# Merge named lists
config_defaults <- list(timeout = 30, retries = 3)
config_user <- list(timeout = 60, verbose = TRUE)
config_env <- list(retries = 5, debug = TRUE)

final_config <- reduce(
  list(config_defaults, config_user, config_env),
  function(x, y) modifyList(x, y)
)

final_config
# $timeout
# [1] 60
# $retries
# [1] 5
# $verbose
# [1] TRUE
# $debug
# [1] TRUE

Building Complex Objects Incrementally

Use accumulate() to build objects step-by-step with full visibility:

# Build a data processing pipeline
data <- data.frame(
  x = c(10, 20, NA, 40, 50),
  y = c(1, 2, 3, 4, 5)
)

pipeline_steps <- list(
  function(df) {
    df[!is.na(df$x), ]
  },
  function(df) {
    df$z <- df$x * df$y
    df
  },
  function(df) {
    df[df$z > 50, ]
  }
)

# See each transformation
results <- accumulate(pipeline_steps, function(data, step) step(data), .init = data)

# results[[1]] is original data
# results[[2]] is after NA removal
# results[[3]] is after adding z column
# results[[4]] is after filtering

results[[4]]
#    x y   z
# 3 40 4 160
# 4 50 5 250

Error Handling Patterns

Wrap functions to handle errors gracefully:

# Safe division with reduce
safe_divide <- function(x, y) {
  if (y == 0) {
    warning(sprintf("Division by zero encountered. Returning NA."))
    return(NA_real_)
  }
  x / y
}

reduce(c(100, 2, 0, 5), safe_divide)
# Warning: Division by zero encountered. Returning NA.
# [1] NA

# Accumulate with validation
validate_and_sum <- function(acc, val) {
  if (!is.numeric(val)) {
    stop(sprintf("Non-numeric value encountered: %s", val))
  }
  acc + val
}

# This works
accumulate(c(1, 2, 3), validate_and_sum, .init = 0)
# [1] 0 1 3 6

# This fails with clear error
tryCatch(
  accumulate(c(1, "x", 3), validate_and_sum, .init = 0),
  error = function(e) message("Error: ", e$message)
)
# Error: Non-numeric value encountered: x

Real-World Application: Time Series Analysis

Combine accumulate() with domain logic for sophisticated analyses:

# Calculate moving statistics
prices <- c(100, 102, 98, 105, 110, 108, 115, 120)

# Cumulative returns
returns <- accumulate(prices[-1], function(ret, price) {
  prev_price <- prices[which(prices == price) - 1]
  ret + ((price - prev_price) / prev_price)
}, .init = 0)

# Running volatility (simplified)
price_changes <- diff(prices)
running_sd <- accumulate(2:length(price_changes), function(vol, i) {
  sd(price_changes[1:i])
}, .init = NA)

data.frame(
  price = prices,
  cumulative_return = c(0, returns[-1]),
  volatility = c(NA, running_sd[-1])
)

The power of reduce() and accumulate() lies in their composability. They transform imperative loops into declarative pipelines, making code more maintainable and intentions clearer. Use reduce() when you need only the final result, and accumulate() when intermediate states matter for debugging, visualization, or business logic.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.