R dplyr - if_else() vs ifelse()

The fundamental distinction between `if_else()` and `ifelse()` lies in type checking. `if_else()` enforces strict type consistency between the true and false branches, preventing silent type coercion...

Key Insights

  • if_else() is type-safe and requires TRUE/FALSE branches to return identical types, while ifelse() coerces types silently and can produce unexpected results
  • if_else() provides explicit missing value handling through the missing parameter, whereas ifelse() propagates NAs without control
  • Performance benchmarks show if_else() runs 2-3x faster than ifelse() on large datasets due to optimized C++ implementation

Type Safety: The Critical Difference

The fundamental distinction between if_else() and ifelse() lies in type checking. if_else() enforces strict type consistency between the true and false branches, preventing silent type coercion that often leads to bugs.

library(dplyr)

# ifelse() silently coerces types
x <- c(1, 2, 3, 4, 5)
result_ifelse <- ifelse(x > 3, x, "low")
print(result_ifelse)
# [1] "1"   "2"   "3"   "4"   "5"
class(result_ifelse)
# [1] "character"

# if_else() throws an error for type mismatch
result_if_else <- if_else(x > 3, x, "low")
# Error: `false` must be a double vector, not a character vector

This type safety catches errors at execution time rather than allowing corrupted data to propagate through your pipeline. With ifelse(), numeric values get coerced to character, potentially breaking downstream calculations. if_else() forces you to handle type conversions explicitly:

# Correct approach with if_else()
result_if_else <- if_else(x > 3, as.character(x), "low")
print(result_if_else)
# [1] "low" "low" "low" "4"   "5"

Date and POSIXct Handling

Type coercion issues become particularly problematic with dates and timestamps. ifelse() strips date attributes, converting them to numeric values:

dates <- as.Date(c("2024-01-01", "2024-02-01", "2024-03-01"))
cutoff <- as.Date("2024-02-01")

# ifelse() destroys date class
result_ifelse <- ifelse(dates >= cutoff, dates, cutoff)
print(result_ifelse)
# [1] 19754 19785 19814
class(result_ifelse)
# [1] "numeric"

# if_else() preserves date class
result_if_else <- if_else(dates >= cutoff, dates, cutoff)
print(result_if_else)
# [1] "2024-02-01" "2024-02-01" "2024-03-01"
class(result_if_else)
# [1] "Date"

The same issue occurs with POSIXct timestamps:

timestamps <- as.POSIXct(c("2024-01-01 10:00:00", 
                           "2024-01-01 14:00:00",
                           "2024-01-01 18:00:00"))
threshold <- as.POSIXct("2024-01-01 15:00:00")

# ifelse() converts to numeric
bad_result <- ifelse(timestamps > threshold, timestamps, threshold)
class(bad_result)
# [1] "numeric"

# if_else() maintains POSIXct
good_result <- if_else(timestamps > threshold, timestamps, threshold)
class(good_result)
# [1] "POSIXct" "POSIXt"

Explicit Missing Value Control

if_else() provides a missing parameter for explicit NA handling, while ifelse() only propagates NAs from the condition:

values <- c(1, 2, NA, 4, 5)

# ifelse() propagates NA
ifelse_result <- ifelse(values > 2, "high", "low")
print(ifelse_result)
# [1] "low"  "low"  NA     "high" "high"

# if_else() allows explicit NA replacement
if_else_result <- if_else(values > 2, "high", "low", missing = "unknown")
print(if_else_result)
# [1] "low"     "low"     "unknown" "high"    "high"

This becomes valuable in data cleaning pipelines where you need to distinguish between missing conditions and explicit categorization:

library(tibble)

sales_data <- tibble(
  product = c("A", "B", "C", "D", "E"),
  revenue = c(1000, 2500, NA, 3500, 1500),
  target = 2000
)

sales_data %>%
  mutate(
    performance_ifelse = ifelse(revenue > target, "Above", "Below"),
    performance_if_else = if_else(revenue > target, "Above", "Below", 
                                   missing = "No Data")
  )
# # A tibble: 5 × 5
#   product revenue target performance_ifelse performance_if_else
#   <chr>     <dbl>  <dbl> <chr>              <chr>              
# 1 A          1000   2000 Below              Below              
# 2 B          2500   2000 Above              Above              
# 3 C            NA   2000 NA                 No Data            
# 4 D          3500   2000 Above              Above              
# 5 E          1500   2000 Below              Below

Performance Comparison

if_else() demonstrates significant performance advantages due to its C++ implementation via Rcpp:

library(bench)

# Generate large dataset
n <- 1e6
test_data <- tibble(
  value = rnorm(n),
  threshold = 0
)

# Benchmark comparison
benchmark_results <- mark(
  ifelse = test_data %>% 
    mutate(category = ifelse(value > threshold, "positive", "negative")),
  if_else = test_data %>% 
    mutate(category = if_else(value > threshold, "positive", "negative")),
  check = FALSE,
  iterations = 50
)

print(benchmark_results[, c("expression", "median", "mem_alloc")])
# # A tibble: 2 × 3
#   expression   median mem_alloc
#   <bch:expr> <bch:tm> <bch:byt>
# 1 ifelse       45.2ms    30.5MB
# 2 if_else      18.7ms    22.9MB

The performance gap widens with complex conditions:

# Multiple condition benchmark
test_complex <- tibble(
  x = rnorm(1e6),
  y = rnorm(1e6)
)

mark(
  ifelse_nested = test_complex %>%
    mutate(result = ifelse(x > 0 & y > 0, "Q1",
                    ifelse(x < 0 & y > 0, "Q2",
                    ifelse(x < 0 & y < 0, "Q3", "Q4")))),
  if_else_nested = test_complex %>%
    mutate(result = if_else(x > 0 & y > 0, "Q1",
                    if_else(x < 0 & y > 0, "Q2",
                    if_else(x < 0 & y < 0, "Q3", "Q4")))),
  check = FALSE,
  iterations = 30
)

Vectorization Behavior

Both functions are vectorized, but if_else() provides clearer error messages when vector lengths don’t match:

# ifelse() recycles silently
x <- 1:10
ifelse(x > 5, c(100, 200), 0)  # c(100, 200) gets recycled
# [1]   0   0   0   0   0 100 200 100 200 100

# if_else() requires matching lengths or scalar
if_else(x > 5, c(100, 200), 0)
# Error: `true` must have size 10 or 1, not size 2

Proper vectorization with if_else():

# Scalar replacement values
if_else(x > 5, 100, 0)
# [1]   0   0   0   0   0 100 100 100 100 100

# Full-length vectors
if_else(x > 5, x * 10, x)
# [1]  1  2  3  4  5 60 70 80 90 100

Practical Migration Strategy

When migrating from ifelse() to if_else(), address type consistency first:

# Original code with ifelse()
legacy_transform <- function(df) {
  df %>%
    mutate(
      status = ifelse(is.na(value), "missing", 
               ifelse(value > 100, "high", "normal")),
      flag = ifelse(category == "A", 1, 0)
    )
}

# Migrated to if_else()
modern_transform <- function(df) {
  df %>%
    mutate(
      status = if_else(is.na(value), "missing",
               if_else(value > 100, "high", "normal")),
      flag = if_else(category == "A", 1L, 0L)  # Explicit integer type
    )
}

Use case_when() for complex multi-condition logic instead of nested if_else():

df %>%
  mutate(
    tier = case_when(
      revenue > 10000 ~ "platinum",
      revenue > 5000 ~ "gold",
      revenue > 1000 ~ "silver",
      TRUE ~ "bronze"
    )
  )

When to Use Each Function

Use if_else() for:

  • Production data pipelines requiring type safety
  • Date/time operations
  • Large datasets where performance matters
  • Code requiring explicit NA handling

Use ifelse() when:

  • Working with legacy R code requiring base R compatibility
  • Quick interactive analysis where type coercion is acceptable
  • Package dependencies prohibit dplyr

The transition to if_else() prevents entire classes of bugs related to implicit type conversion while delivering measurable performance improvements. The stricter semantics may require more explicit code, but this explicitness eliminates ambiguity and makes data transformations more maintainable.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.