R - ifelse() Function with Examples
• The `ifelse()` function provides vectorized conditional logic, evaluating conditions element-wise across vectors and returning values based on TRUE/FALSE results
Key Insights
• The ifelse() function provides vectorized conditional logic, evaluating conditions element-wise across vectors and returning values based on TRUE/FALSE results
• Unlike standard if-else statements, ifelse() operates on entire vectors simultaneously, making it essential for data manipulation in data frames and matrices
• Understanding ifelse() recycling rules, nested operations, and performance characteristics prevents common pitfalls in production R code
Understanding ifelse() Syntax
The ifelse() function takes three arguments: a test condition, a value to return when TRUE, and a value to return when FALSE. The syntax is:
ifelse(test, yes, no)
Here’s a basic example:
x <- c(1, 5, 10, 15, 20)
result <- ifelse(x > 10, "high", "low")
print(result)
# [1] "low" "low" "low" "high" "high"
The function evaluates each element in x, returning “high” when the condition is TRUE and “low” when FALSE. This vectorized approach eliminates the need for explicit loops.
Vectorized Operations on Data Frames
The primary use case for ifelse() is creating new columns in data frames based on existing column values:
# Sample sales data
sales <- data.frame(
product = c("A", "B", "C", "D", "E"),
revenue = c(15000, 8000, 25000, 12000, 30000),
units = c(150, 80, 200, 120, 250)
)
# Categorize revenue performance
sales$performance <- ifelse(sales$revenue > 20000, "Excellent", "Standard")
# Calculate bonus eligibility
sales$bonus_eligible <- ifelse(sales$units >= 150, TRUE, FALSE)
print(sales)
# product revenue units performance bonus_eligible
# 1 A 15000 150 Standard TRUE
# 2 B 8000 80 Standard FALSE
# 3 C 25000 200 Excellent TRUE
# 4 D 12000 120 Standard FALSE
# 5 E 30000 250 Excellent TRUE
Nested ifelse() for Multiple Conditions
For multi-tier categorization, nest ifelse() calls:
# Temperature classification
temps <- c(15, 25, 35, 5, 42, 18, 30)
classification <- ifelse(temps < 10, "Cold",
ifelse(temps < 20, "Cool",
ifelse(temps < 30, "Warm", "Hot")))
print(data.frame(temperature = temps, class = classification))
# temperature class
# 1 15 Cool
# 2 25 Warm
# 3 35 Hot
# 4 5 Cold
# 5 42 Hot
# 6 18 Cool
# 7 30 Hot
While functional, deeply nested ifelse() becomes difficult to read. For complex logic, consider dplyr::case_when():
library(dplyr)
temps_df <- data.frame(temp = temps)
temps_df$class <- case_when(
temps_df$temp < 10 ~ "Cold",
temps_df$temp < 20 ~ "Cool",
temps_df$temp < 30 ~ "Warm",
TRUE ~ "Hot"
)
Handling NA Values
ifelse() propagates NA values by default:
values <- c(5, 10, NA, 20, 15)
result <- ifelse(values > 12, "high", "low")
print(result)
# [1] "low" "low" NA "high" "high"
To handle NAs explicitly, add a condition:
result <- ifelse(is.na(values), "missing",
ifelse(values > 12, "high", "low"))
print(result)
# [1] "low" "low" "missing" "high" "high"
Working with Multiple Columns
Combine multiple conditions using logical operators:
employees <- data.frame(
name = c("Alice", "Bob", "Carol", "Dave", "Eve"),
salary = c(75000, 55000, 95000, 62000, 88000),
tenure = c(5, 2, 8, 3, 6)
)
# Senior employees: high salary AND long tenure
employees$level <- ifelse(employees$salary > 70000 & employees$tenure >= 5,
"Senior", "Junior")
print(employees)
# name salary tenure level
# 1 Alice 75000 5 Senior
# 2 Bob 55000 2 Junior
# 3 Carol 95000 8 Senior
# 4 Dave 62000 3 Junior
# 5 Eve 88000 6 Senior
Numeric Calculations with ifelse()
Use ifelse() for conditional calculations:
# Apply discount based on order size
orders <- data.frame(
order_id = 1:5,
amount = c(500, 1500, 800, 2500, 1200)
)
# 15% discount for orders over 1000, otherwise 5%
orders$discount <- ifelse(orders$amount > 1000,
orders$amount * 0.15,
orders$amount * 0.05)
orders$final_amount <- orders$amount - orders$discount
print(orders)
# order_id amount discount final_amount
# 1 1 500 25.0 475.0
# 2 2 1500 225.0 1275.0
# 3 3 800 40.0 760.0
# 4 4 2500 375.0 2125.0
# 5 5 1200 180.0 1020.0
Vector Recycling Behavior
ifelse() recycles shorter vectors to match longer ones:
# Single condition value recycled
x <- 1:10
result <- ifelse(x %% 2 == 0, "even", "odd")
print(result)
# [1] "odd" "even" "odd" "even" "odd" "even" "odd" "even" "odd" "even"
# Recycling with different length vectors
values <- 1:6
thresholds <- c(3, 5) # Will recycle: 3, 5, 3, 5, 3, 5
result <- ifelse(values > thresholds, "above", "below")
print(result)
# [1] "below" "below" "above" "below" "above" "above"
Be cautious with recycling—it can produce unexpected results if vectors don’t align properly.
Performance Considerations
For large datasets, ifelse() can be slower than alternatives:
# Benchmark different approaches
library(microbenchmark)
n <- 1e6
x <- runif(n, 0, 100)
microbenchmark(
ifelse_method = ifelse(x > 50, "high", "low"),
bracket_method = {
result <- character(n)
result[x > 50] <- "high"
result[x <= 50] <- "low"
result
},
times = 100
)
The bracket subsetting method often outperforms ifelse() for simple binary conditions on large vectors. However, ifelse() remains more readable and sufficient for most data analysis tasks.
Type Coercion Gotchas
ifelse() returns a vector with a single type, coercing values as needed:
# Numeric and character mix
result <- ifelse(c(TRUE, FALSE), 100, "low")
print(result)
# [1] "100" "low" # Both converted to character
# Better: keep types consistent
result <- ifelse(c(TRUE, FALSE), 100, 0)
print(result)
# [1] 100 0 # Both numeric
When mixing types, the result follows R’s coercion hierarchy: logical < integer < numeric < character.
Practical Example: Data Cleaning Pipeline
Here’s a realistic data cleaning scenario combining multiple ifelse() operations:
# Raw customer data with issues
customers <- data.frame(
id = 1:6,
age = c(25, -5, 150, 45, NA, 32),
income = c(50000, 75000, 0, 120000, 85000, NA),
status = c("active", "ACTIVE", "inactive", "Active", "pending", "active")
)
# Clean age: flag invalid values
customers$age_clean <- ifelse(is.na(customers$age) | customers$age < 0 | customers$age > 120,
NA, customers$age)
# Clean income: replace 0 and NA with median
median_income <- median(customers$income[customers$income > 0], na.rm = TRUE)
customers$income_clean <- ifelse(is.na(customers$income) | customers$income == 0,
median_income, customers$income)
# Standardize status
customers$status_clean <- ifelse(tolower(customers$status) == "active",
"Active", "Inactive")
# Create customer segment
customers$segment <- ifelse(customers$income_clean > 100000, "Premium",
ifelse(customers$income_clean > 60000, "Standard", "Basic"))
print(customers[, c("id", "age_clean", "income_clean", "status_clean", "segment")])
# id age_clean income_clean status_clean segment
# 1 1 25 50000 Active Basic
# 2 2 NA 75000 Active Standard
# 3 3 NA 80000 Inactive Standard
# 4 4 45 120000 Active Premium
# 5 5 NA 85000 Inactive Standard
# 6 6 32 80000 Active Standard
This example demonstrates how ifelse() handles real-world data quality issues efficiently, making it an essential tool for data preprocessing in R workflows.