R purrr - reduce() and accumulate()
library(purrr)
Key Insights
reduce()collapses a list into a single value by iteratively applying a binary function, whileaccumulate()returns all intermediate results, making it ideal for tracking transformations step-by-step- Both functions support
.initparameters for starting values, directional control with.dir, and custom error handling, enabling complex data pipeline patterns beyond simple aggregation - Real-world applications include cumulative calculations, nested list flattening, sequential data validation, and building complex objects from component parts with full audit trails
Understanding reduce() Fundamentals
reduce() takes a list and a binary function, applying that function recursively to collapse the list into a single value. Think of it as a left fold operation from functional programming.
library(purrr)
# Basic sum reduction
numbers <- list(1, 2, 3, 4, 5)
reduce(numbers, `+`)
# [1] 15
# Equivalent to: ((((1 + 2) + 3) + 4) + 5)
The function signature is reduce(.x, .f, ..., .init, .dir). The .f parameter accepts any binary function that takes two arguments. Here’s how it processes:
# Custom binary function
multiply_and_log <- function(x, y) {
result <- x * y
message(sprintf("%d * %d = %d", x, y, result))
result
}
reduce(1:4, multiply_and_log)
# 1 * 2 = 2
# 2 * 3 = 6
# 6 * 4 = 24
# [1] 24
Using .init for Starting Values
The .init parameter provides an initial accumulator value, crucial for operations requiring a seed value or when working with empty lists:
# Without .init - uses first element
reduce(1:5, `+`)
# [1] 15
# With .init - starts from 100
reduce(1:5, `+`, .init = 100)
# [1] 115
# Critical for empty lists
reduce(list(), `+`, .init = 0)
# [1] 0
reduce(list(), `+`)
# Error: `.x` is empty, and no `.init` supplied
Practical example with data frames:
# Combine multiple data frames
df_list <- list(
data.frame(id = 1:3, val = c("a", "b", "c")),
data.frame(id = 4:6, val = c("d", "e", "f")),
data.frame(id = 7:9, val = c("g", "h", "i"))
)
combined <- reduce(df_list, rbind)
print(combined)
# id val
# 1 1 a
# 2 2 b
# 3 3 c
# 4 4 d
# 5 5 e
# 6 6 f
# 7 7 g
# 8 8 h
# 9 9 i
Direction Control with .dir
The .dir parameter controls whether reduction proceeds left-to-right ("forward") or right-to-left ("backward"):
# Forward (default): ((1 - 2) - 3) - 4 = -8
reduce(1:4, `-`, .dir = "forward")
# [1] -8
# Backward: 1 - (2 - (3 - 4)) = 2
reduce(1:4, `-`, .dir = "backward")
# [1] 2
# String concatenation shows the difference clearly
words <- list("alpha", "beta", "gamma")
reduce(words, paste, .dir = "forward")
# [1] "alpha beta gamma"
reduce(words, paste, .dir = "backward")
# [1] "alpha beta gamma" # Same result for commutative operation
# Non-commutative example
build_path <- function(x, y) paste0(x, "/", y)
reduce(c("home", "user", "documents"), build_path, .dir = "forward")
# [1] "home/user/documents"
reduce(c("home", "user", "documents"), build_path, .dir = "backward")
# [1] "home/user/documents" # Still builds left-to-right due to function logic
accumulate() for Intermediate Results
accumulate() uses the same signature as reduce() but returns all intermediate values:
# Compare reduce vs accumulate
numbers <- 1:5
reduce(numbers, `+`)
# [1] 15
accumulate(numbers, `+`)
# [1] 1 3 6 10 15
This is invaluable for cumulative calculations and debugging:
# Cumulative product
accumulate(1:5, `*`)
# [1] 1 2 6 24 120
# Running maximum
values <- c(3, 1, 4, 1, 5, 9, 2, 6)
accumulate(values, max)
# [1] 3 3 4 4 5 9 9 9
# Compound interest calculation
principal <- 1000
rates <- c(0.05, 0.04, 0.06, 0.03, 0.05)
balances <- accumulate(rates, function(balance, rate) {
balance * (1 + rate)
}, .init = principal)
balances
# [1] 1000.00 1050.00 1092.00 1157.52 1192.25 1251.86
Flattening Nested Structures
reduce() excels at flattening nested lists:
# Flatten one level
nested <- list(
list(1, 2),
list(3, 4),
list(5, 6)
)
reduce(nested, c)
# [1] 1 2 3 4 5 6
# Merge named lists
config_defaults <- list(timeout = 30, retries = 3)
config_user <- list(timeout = 60, verbose = TRUE)
config_env <- list(retries = 5, debug = TRUE)
final_config <- reduce(
list(config_defaults, config_user, config_env),
function(x, y) modifyList(x, y)
)
final_config
# $timeout
# [1] 60
# $retries
# [1] 5
# $verbose
# [1] TRUE
# $debug
# [1] TRUE
Building Complex Objects Incrementally
Use accumulate() to build objects step-by-step with full visibility:
# Build a data processing pipeline
data <- data.frame(
x = c(10, 20, NA, 40, 50),
y = c(1, 2, 3, 4, 5)
)
pipeline_steps <- list(
function(df) {
df[!is.na(df$x), ]
},
function(df) {
df$z <- df$x * df$y
df
},
function(df) {
df[df$z > 50, ]
}
)
# See each transformation
results <- accumulate(pipeline_steps, function(data, step) step(data), .init = data)
# results[[1]] is original data
# results[[2]] is after NA removal
# results[[3]] is after adding z column
# results[[4]] is after filtering
results[[4]]
# x y z
# 3 40 4 160
# 4 50 5 250
Error Handling Patterns
Wrap functions to handle errors gracefully:
# Safe division with reduce
safe_divide <- function(x, y) {
if (y == 0) {
warning(sprintf("Division by zero encountered. Returning NA."))
return(NA_real_)
}
x / y
}
reduce(c(100, 2, 0, 5), safe_divide)
# Warning: Division by zero encountered. Returning NA.
# [1] NA
# Accumulate with validation
validate_and_sum <- function(acc, val) {
if (!is.numeric(val)) {
stop(sprintf("Non-numeric value encountered: %s", val))
}
acc + val
}
# This works
accumulate(c(1, 2, 3), validate_and_sum, .init = 0)
# [1] 0 1 3 6
# This fails with clear error
tryCatch(
accumulate(c(1, "x", 3), validate_and_sum, .init = 0),
error = function(e) message("Error: ", e$message)
)
# Error: Non-numeric value encountered: x
Real-World Application: Time Series Analysis
Combine accumulate() with domain logic for sophisticated analyses:
# Calculate moving statistics
prices <- c(100, 102, 98, 105, 110, 108, 115, 120)
# Cumulative returns
returns <- accumulate(prices[-1], function(ret, price) {
prev_price <- prices[which(prices == price) - 1]
ret + ((price - prev_price) / prev_price)
}, .init = 0)
# Running volatility (simplified)
price_changes <- diff(prices)
running_sd <- accumulate(2:length(price_changes), function(vol, i) {
sd(price_changes[1:i])
}, .init = NA)
data.frame(
price = prices,
cumulative_return = c(0, returns[-1]),
volatility = c(NA, running_sd[-1])
)
The power of reduce() and accumulate() lies in their composability. They transform imperative loops into declarative pipelines, making code more maintainable and intentions clearer. Use reduce() when you need only the final result, and accumulate() when intermediate states matter for debugging, visualization, or business logic.