R - Functions - Define and Call
R functions follow a straightforward structure using the `function` keyword. The basic anatomy includes parameters, a function body, and an optional explicit return statement.
Key Insights
- R functions use lexical scoping and support default arguments, named parameters, and variable-length argument lists through the
...operator - Functions are first-class objects in R, meaning they can be assigned to variables, passed as arguments, and returned from other functions
- Understanding function environments and the difference between lazy and eager evaluation is critical for writing efficient R code
Basic Function Syntax
R functions follow a straightforward structure using the function keyword. The basic anatomy includes parameters, a function body, and an optional explicit return statement.
# Basic function definition
calculate_area <- function(length, width) {
area <- length * width
return(area)
}
# Call the function
result <- calculate_area(5, 10)
print(result) # Output: 50
# Implicit return (last expression is returned)
calculate_volume <- function(length, width, height) {
length * width * height
}
volume <- calculate_volume(5, 10, 2)
print(volume) # Output: 100
The return() statement is optional. R automatically returns the last evaluated expression, which is idiomatic in R programming.
Default Arguments and Named Parameters
Default arguments provide flexibility when calling functions. Named parameters improve code readability and allow arguments to be specified in any order.
# Function with default arguments
create_dataframe <- function(rows = 10, cols = 5, fill_value = NA) {
matrix(fill_value, nrow = rows, ncol = cols) |>
as.data.frame()
}
# Various ways to call
df1 <- create_dataframe() # Uses all defaults
df2 <- create_dataframe(rows = 20) # Override one parameter
df3 <- create_dataframe(fill_value = 0, rows = 15) # Named, any order
df4 <- create_dataframe(5, 3, 1) # Positional arguments
# Mixing positional and named
process_data <- function(data, method = "mean", na.rm = TRUE, trim = 0) {
switch(method,
mean = mean(data, na.rm = na.rm, trim = trim),
median = median(data, na.rm = na.rm),
sum = sum(data, na.rm = na.rm)
)
}
values <- c(1, 2, 3, NA, 5)
process_data(values) # Uses default method
process_data(values, method = "median")
process_data(values, "sum", na.rm = FALSE)
Variable-Length Arguments
The ... (ellipsis) operator captures an arbitrary number of arguments, enabling flexible function interfaces.
# Basic use of ...
print_all <- function(...) {
args <- list(...)
for (i in seq_along(args)) {
cat("Argument", i, ":", args[[i]], "\n")
}
}
print_all("hello", 42, TRUE)
# Forwarding ... to another function
custom_paste <- function(sep = "-", ...) {
paste(..., sep = sep)
}
result <- custom_paste(sep = " | ", "A", "B", "C")
print(result) # Output: A | B | C
# Combining named arguments with ...
statistical_summary <- function(data, ..., digits = 2) {
summary_stats <- c(
mean = mean(data, ...),
median = median(data, ...),
sd = sd(data, ...)
)
round(summary_stats, digits)
}
data_with_na <- c(1, 2, 3, NA, 5, 6)
statistical_summary(data_with_na, na.rm = TRUE)
Anonymous Functions and Functional Programming
R treats functions as first-class objects. Anonymous functions (lambdas) are particularly useful with apply family functions and modern pipe operators.
# Traditional anonymous function
squared <- lapply(1:5, function(x) x^2)
print(unlist(squared))
# R 4.1+ shorthand syntax
cubed <- lapply(1:5, \(x) x^3)
print(unlist(cubed))
# Functions as arguments
apply_operation <- function(x, y, op) {
op(x, y)
}
add <- function(a, b) a + b
multiply <- function(a, b) a * b
apply_operation(5, 3, add) # Output: 8
apply_operation(5, 3, multiply) # Output: 15
# Returning functions (closures)
make_multiplier <- function(n) {
function(x) x * n
}
times_two <- make_multiplier(2)
times_ten <- make_multiplier(10)
print(times_two(5)) # Output: 10
print(times_ten(5)) # Output: 50
Function Environments and Scoping
R uses lexical scoping, where functions look for variables in the environment where they were defined, not where they’re called.
# Lexical scoping example
x <- 10
outer_function <- function() {
x <- 20
inner_function <- function() {
x <- 30
print(paste("Inner x:", x))
}
inner_function()
print(paste("Outer x:", x))
}
outer_function()
print(paste("Global x:", x))
# Output:
# Inner x: 30
# Outer x: 20
# Global x: 10
# Accessing parent environments
counter_factory <- function() {
count <- 0
list(
increment = function() {
count <<- count + 1
count
},
get_count = function() count,
reset = function() count <<- 0
)
}
counter <- counter_factory()
counter$increment() # 1
counter$increment() # 2
counter$get_count() # 2
counter$reset()
counter$get_count() # 0
The <<- operator assigns to the parent environment, enabling stateful functions and closures.
Error Handling in Functions
Robust functions include error handling using stop(), warning(), and tryCatch().
# Input validation
safe_divide <- function(numerator, denominator) {
if (!is.numeric(numerator) || !is.numeric(denominator)) {
stop("Both arguments must be numeric")
}
if (denominator == 0) {
warning("Division by zero, returning Inf")
return(Inf)
}
numerator / denominator
}
# Using tryCatch for error handling
robust_operation <- function(x, y) {
result <- tryCatch(
{
# Try this code
log(x) + sqrt(y)
},
error = function(e) {
message("Error occurred: ", e$message)
return(NA)
},
warning = function(w) {
message("Warning: ", w$message)
return(NULL)
},
finally = {
# Always executes
message("Operation completed")
}
)
result
}
robust_operation(10, 25) # Success
robust_operation(-5, 25) # Warning: NaNs produced
robust_operation("a", 25) # Error handling
Performance Considerations
Understanding when to vectorize versus loop is crucial for R function performance.
# Non-vectorized (slow)
sum_squares_loop <- function(n) {
result <- 0
for (i in 1:n) {
result <- result + i^2
}
result
}
# Vectorized (fast)
sum_squares_vectorized <- function(n) {
sum((1:n)^2)
}
# Benchmark
library(microbenchmark)
microbenchmark(
loop = sum_squares_loop(10000),
vectorized = sum_squares_vectorized(10000),
times = 100
)
# Memoization for expensive computations
fibonacci_memo <- local({
cache <- new.env(hash = TRUE)
function(n) {
if (n <= 1) return(n)
key <- as.character(n)
if (exists(key, envir = cache)) {
return(get(key, envir = cache))
}
result <- fibonacci_memo(n - 1) + fibonacci_memo(n - 2)
assign(key, result, envir = cache)
result
}
})
fibonacci_memo(30) # Fast even for larger values
Documentation and Best Practices
Well-documented functions using roxygen2 comments improve maintainability and enable automatic documentation generation.
#' Calculate Body Mass Index
#'
#' @param weight Numeric value representing weight in kilograms
#' @param height Numeric value representing height in meters
#' @param digits Integer specifying decimal places for rounding (default: 2)
#'
#' @return Numeric BMI value rounded to specified digits
#' @export
#'
#' @examples
#' calculate_bmi(70, 1.75)
#' calculate_bmi(weight = 85, height = 1.80, digits = 1)
calculate_bmi <- function(weight, height, digits = 2) {
stopifnot(
"Weight must be positive" = weight > 0,
"Height must be positive" = height > 0
)
bmi <- weight / (height^2)
round(bmi, digits)
}
Functions form the backbone of reproducible R code. Master parameter handling, scoping rules, and vectorization to write efficient, maintainable code that scales from interactive analysis to production systems.