R - Functions - Define and Call

R functions follow a straightforward structure using the `function` keyword. The basic anatomy includes parameters, a function body, and an optional explicit return statement.

Key Insights

  • R functions use lexical scoping and support default arguments, named parameters, and variable-length argument lists through the ... operator
  • Functions are first-class objects in R, meaning they can be assigned to variables, passed as arguments, and returned from other functions
  • Understanding function environments and the difference between lazy and eager evaluation is critical for writing efficient R code

Basic Function Syntax

R functions follow a straightforward structure using the function keyword. The basic anatomy includes parameters, a function body, and an optional explicit return statement.

# Basic function definition
calculate_area <- function(length, width) {
  area <- length * width
  return(area)
}

# Call the function
result <- calculate_area(5, 10)
print(result)  # Output: 50

# Implicit return (last expression is returned)
calculate_volume <- function(length, width, height) {
  length * width * height
}

volume <- calculate_volume(5, 10, 2)
print(volume)  # Output: 100

The return() statement is optional. R automatically returns the last evaluated expression, which is idiomatic in R programming.

Default Arguments and Named Parameters

Default arguments provide flexibility when calling functions. Named parameters improve code readability and allow arguments to be specified in any order.

# Function with default arguments
create_dataframe <- function(rows = 10, cols = 5, fill_value = NA) {
  matrix(fill_value, nrow = rows, ncol = cols) |>
    as.data.frame()
}

# Various ways to call
df1 <- create_dataframe()  # Uses all defaults
df2 <- create_dataframe(rows = 20)  # Override one parameter
df3 <- create_dataframe(fill_value = 0, rows = 15)  # Named, any order
df4 <- create_dataframe(5, 3, 1)  # Positional arguments

# Mixing positional and named
process_data <- function(data, method = "mean", na.rm = TRUE, trim = 0) {
  switch(method,
    mean = mean(data, na.rm = na.rm, trim = trim),
    median = median(data, na.rm = na.rm),
    sum = sum(data, na.rm = na.rm)
  )
}

values <- c(1, 2, 3, NA, 5)
process_data(values)  # Uses default method
process_data(values, method = "median")
process_data(values, "sum", na.rm = FALSE)

Variable-Length Arguments

The ... (ellipsis) operator captures an arbitrary number of arguments, enabling flexible function interfaces.

# Basic use of ...
print_all <- function(...) {
  args <- list(...)
  for (i in seq_along(args)) {
    cat("Argument", i, ":", args[[i]], "\n")
  }
}

print_all("hello", 42, TRUE)

# Forwarding ... to another function
custom_paste <- function(sep = "-", ...) {
  paste(..., sep = sep)
}

result <- custom_paste(sep = " | ", "A", "B", "C")
print(result)  # Output: A | B | C

# Combining named arguments with ...
statistical_summary <- function(data, ..., digits = 2) {
  summary_stats <- c(
    mean = mean(data, ...),
    median = median(data, ...),
    sd = sd(data, ...)
  )
  round(summary_stats, digits)
}

data_with_na <- c(1, 2, 3, NA, 5, 6)
statistical_summary(data_with_na, na.rm = TRUE)

Anonymous Functions and Functional Programming

R treats functions as first-class objects. Anonymous functions (lambdas) are particularly useful with apply family functions and modern pipe operators.

# Traditional anonymous function
squared <- lapply(1:5, function(x) x^2)
print(unlist(squared))

# R 4.1+ shorthand syntax
cubed <- lapply(1:5, \(x) x^3)
print(unlist(cubed))

# Functions as arguments
apply_operation <- function(x, y, op) {
  op(x, y)
}

add <- function(a, b) a + b
multiply <- function(a, b) a * b

apply_operation(5, 3, add)       # Output: 8
apply_operation(5, 3, multiply)  # Output: 15

# Returning functions (closures)
make_multiplier <- function(n) {
  function(x) x * n
}

times_two <- make_multiplier(2)
times_ten <- make_multiplier(10)

print(times_two(5))   # Output: 10
print(times_ten(5))   # Output: 50

Function Environments and Scoping

R uses lexical scoping, where functions look for variables in the environment where they were defined, not where they’re called.

# Lexical scoping example
x <- 10

outer_function <- function() {
  x <- 20
  
  inner_function <- function() {
    x <- 30
    print(paste("Inner x:", x))
  }
  
  inner_function()
  print(paste("Outer x:", x))
}

outer_function()
print(paste("Global x:", x))
# Output:
# Inner x: 30
# Outer x: 20
# Global x: 10

# Accessing parent environments
counter_factory <- function() {
  count <- 0
  
  list(
    increment = function() {
      count <<- count + 1
      count
    },
    get_count = function() count,
    reset = function() count <<- 0
  )
}

counter <- counter_factory()
counter$increment()  # 1
counter$increment()  # 2
counter$get_count()  # 2
counter$reset()
counter$get_count()  # 0

The <<- operator assigns to the parent environment, enabling stateful functions and closures.

Error Handling in Functions

Robust functions include error handling using stop(), warning(), and tryCatch().

# Input validation
safe_divide <- function(numerator, denominator) {
  if (!is.numeric(numerator) || !is.numeric(denominator)) {
    stop("Both arguments must be numeric")
  }
  
  if (denominator == 0) {
    warning("Division by zero, returning Inf")
    return(Inf)
  }
  
  numerator / denominator
}

# Using tryCatch for error handling
robust_operation <- function(x, y) {
  result <- tryCatch(
    {
      # Try this code
      log(x) + sqrt(y)
    },
    error = function(e) {
      message("Error occurred: ", e$message)
      return(NA)
    },
    warning = function(w) {
      message("Warning: ", w$message)
      return(NULL)
    },
    finally = {
      # Always executes
      message("Operation completed")
    }
  )
  
  result
}

robust_operation(10, 25)   # Success
robust_operation(-5, 25)   # Warning: NaNs produced
robust_operation("a", 25)  # Error handling

Performance Considerations

Understanding when to vectorize versus loop is crucial for R function performance.

# Non-vectorized (slow)
sum_squares_loop <- function(n) {
  result <- 0
  for (i in 1:n) {
    result <- result + i^2
  }
  result
}

# Vectorized (fast)
sum_squares_vectorized <- function(n) {
  sum((1:n)^2)
}

# Benchmark
library(microbenchmark)

microbenchmark(
  loop = sum_squares_loop(10000),
  vectorized = sum_squares_vectorized(10000),
  times = 100
)

# Memoization for expensive computations
fibonacci_memo <- local({
  cache <- new.env(hash = TRUE)
  
  function(n) {
    if (n <= 1) return(n)
    
    key <- as.character(n)
    if (exists(key, envir = cache)) {
      return(get(key, envir = cache))
    }
    
    result <- fibonacci_memo(n - 1) + fibonacci_memo(n - 2)
    assign(key, result, envir = cache)
    result
  }
})

fibonacci_memo(30)  # Fast even for larger values

Documentation and Best Practices

Well-documented functions using roxygen2 comments improve maintainability and enable automatic documentation generation.

#' Calculate Body Mass Index
#'
#' @param weight Numeric value representing weight in kilograms
#' @param height Numeric value representing height in meters
#' @param digits Integer specifying decimal places for rounding (default: 2)
#'
#' @return Numeric BMI value rounded to specified digits
#' @export
#'
#' @examples
#' calculate_bmi(70, 1.75)
#' calculate_bmi(weight = 85, height = 1.80, digits = 1)
calculate_bmi <- function(weight, height, digits = 2) {
  stopifnot(
    "Weight must be positive" = weight > 0,
    "Height must be positive" = height > 0
  )
  
  bmi <- weight / (height^2)
  round(bmi, digits)
}

Functions form the backbone of reproducible R code. Master parameter handling, scoping rules, and vectorization to write efficient, maintainable code that scales from interactive analysis to production systems.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.