R - If/Else/Else If Statements
R's conditional statements follow a straightforward structure. Unlike vectorized languages where conditions apply element-wise by default, R's base `if` statement evaluates a single logical value.
Key Insights
- R’s if/else statements use braces and support vectorized conditions through
ifelse()anddplyr::case_when()for efficient data manipulation - The
else ifladder evaluates conditions sequentially, stopping at the first TRUE match, making condition order critical for correct logic - Vectorized alternatives like
ifelse()andcase_when()dramatically outperform loops for data frame operations, often by 100x or more
Basic If/Else Syntax
R’s conditional statements follow a straightforward structure. Unlike vectorized languages where conditions apply element-wise by default, R’s base if statement evaluates a single logical value.
temperature <- 75
if (temperature > 80) {
print("It's hot outside")
} else if (temperature > 60) {
print("Pleasant weather")
} else {
print("It's cold")
}
# Output: "Pleasant weather"
The condition must evaluate to a single TRUE or FALSE. Passing a vector triggers a warning and uses only the first element:
temps <- c(65, 85, 55)
if (temps > 70) { # Warning: only using first element
print("Hot")
}
# Warning message: the condition has length > 1
Single-Line Conditionals
For simple assignments, omit braces for cleaner code:
score <- 85
grade <- if (score >= 90) "A" else if (score >= 80) "B" else "C"
print(grade) # "B"
# Inline assignment
status <- if (score >= 60) "Pass" else "Fail"
This pattern works well for configuration logic or simple transformations where readability isn’t compromised.
Vectorized Conditionals with ifelse()
When working with vectors or data frame columns, ifelse() applies conditions element-wise:
temperatures <- c(55, 75, 95, 62, 88)
conditions <- ifelse(temperatures > 80, "Hot", "Moderate")
print(conditions)
# [1] "Moderate" "Moderate" "Hot" "Moderate" "Hot"
Nested ifelse() handles multiple conditions but becomes unwieldy:
scores <- c(92, 78, 85, 65, 58)
grades <- ifelse(scores >= 90, "A",
ifelse(scores >= 80, "B",
ifelse(scores >= 70, "C",
ifelse(scores >= 60, "D", "F"))))
print(grades)
# [1] "A" "C" "B" "D" "F"
Performance note: ifelse() evaluates both the true and false expressions for all elements, which can be inefficient with expensive computations.
Multi-Condition Logic with case_when()
The dplyr::case_when() function provides cleaner syntax for complex conditional logic:
library(dplyr)
scores <- c(92, 78, 85, 65, 58, 95, 72)
grades <- case_when(
scores >= 90 ~ "A",
scores >= 80 ~ "B",
scores >= 70 ~ "C",
scores >= 60 ~ "D",
TRUE ~ "F" # Default case
)
print(grades)
# [1] "A" "C" "B" "D" "F" "A" "C"
Advantages over nested ifelse():
- Evaluates conditions sequentially, stopping at first match
- More readable for multiple conditions
- Type-safe: all outputs must be compatible types
- Works seamlessly in
mutate()pipelines
library(dplyr)
df <- data.frame(
product = c("Widget", "Gadget", "Tool", "Device"),
price = c(25, 150, 75, 300),
quantity = c(100, 20, 50, 5)
)
df <- df %>%
mutate(
price_category = case_when(
price < 50 ~ "Budget",
price < 100 ~ "Mid-range",
price < 200 ~ "Premium",
TRUE ~ "Luxury"
),
stock_status = case_when(
quantity == 0 ~ "Out of Stock",
quantity < 10 ~ "Low Stock",
quantity < 50 ~ "Available",
TRUE ~ "In Stock"
)
)
print(df)
# product price quantity price_category stock_status
# 1 Widget 25 100 Budget In Stock
# 2 Gadget 150 20 Premium Available
# 3 Tool 75 50 Mid-range In Stock
# 4 Device 300 5 Luxury Low Stock
Logical Operators in Conditions
Combine conditions using logical operators:
age <- 25
income <- 55000
if (age >= 18 && income > 50000) {
print("Eligible for premium account")
} else if (age >= 18 || income > 40000) {
print("Eligible for standard account")
} else {
print("Basic account only")
}
# Output: "Eligible for premium account"
Important distinction: Use && and || for scalar conditionals (short-circuit evaluation), & and | for vectorized operations:
# Scalar (if statements)
x <- 5
if (x > 3 && x < 10) print("In range") # Correct
# Vectorized (data operations)
values <- c(2, 5, 8, 12)
in_range <- values > 3 & values < 10
print(in_range)
# [1] FALSE TRUE TRUE FALSE
Handling NULL and NA Values
Conditional statements with NULL or NA require explicit handling:
value <- NA
# This doesn't work as expected
if (value > 10) {
print("Large")
} else {
print("Small")
}
# Error: missing value where TRUE/FALSE needed
# Correct approach
if (is.na(value)) {
print("Missing value")
} else if (value > 10) {
print("Large")
} else {
print("Small")
}
# Output: "Missing value"
For vectorized operations, use na.rm or explicit NA handling:
values <- c(5, NA, 15, 8, NA)
# ifelse preserves NAs
result <- ifelse(values > 10, "High", "Low")
print(result)
# [1] "Low" NA "High" "Low" NA
# case_when with explicit NA handling
result <- case_when(
is.na(values) ~ "Unknown",
values > 10 ~ "High",
TRUE ~ "Low"
)
print(result)
# [1] "Low" "Unknown" "High" "Low" "Unknown"
Performance Comparison
Vectorized operations vastly outperform loops for large datasets:
library(microbenchmark)
n <- 100000
values <- runif(n, 0, 100)
# Loop approach
loop_approach <- function(x) {
result <- character(length(x))
for (i in seq_along(x)) {
if (x[i] < 33) {
result[i] <- "Low"
} else if (x[i] < 67) {
result[i] <- "Medium"
} else {
result[i] <- "High"
}
}
result
}
# Vectorized approaches
ifelse_approach <- function(x) {
ifelse(x < 33, "Low", ifelse(x < 67, "Medium", "High"))
}
case_when_approach <- function(x) {
case_when(
x < 33 ~ "Low",
x < 67 ~ "Medium",
TRUE ~ "High"
)
}
microbenchmark(
loop = loop_approach(values),
ifelse = ifelse_approach(values),
case_when = case_when_approach(values),
times = 10
)
# Results (median times):
# loop: ~450ms
# ifelse: ~4ms
# case_when: ~8ms
The vectorized approaches are 50-100x faster. Use loops only when conditions depend on previous iterations or require complex state management.
Switch Statements for Discrete Values
For matching discrete values, switch() provides cleaner syntax than if/else chains:
get_day_type <- function(day) {
switch(day,
"Monday" = "Start of week",
"Friday" = "End of week",
"Saturday" = , # Fall through
"Sunday" = "Weekend",
"Weekday" # Default
)
}
print(get_day_type("Monday")) # "Start of week"
print(get_day_type("Saturday")) # "Weekend"
print(get_day_type("Tuesday")) # "Weekday"
Numeric indices work but are error-prone:
switch(2, "First", "Second", "Third") # "Second"
For data frame operations, use case_when() with exact matches or recode() from dplyr.
Practical Application: Data Cleaning Pipeline
Combining conditional logic in a real-world scenario:
library(dplyr)
sales_data <- data.frame(
order_id = 1:6,
amount = c(150, -50, 2500, 75, NA, 180),
region = c("North", "South", "West", "East", "North", NA),
customer_type = c("New", "Returning", "VIP", "New", "Returning", "New")
)
cleaned_data <- sales_data %>%
mutate(
# Flag invalid amounts
valid_amount = !is.na(amount) & amount > 0,
# Categorize order size
order_size = case_when(
!valid_amount ~ "Invalid",
amount < 100 ~ "Small",
amount < 500 ~ "Medium",
amount < 1000 ~ "Large",
TRUE ~ "Enterprise"
),
# Handle missing regions
region = if_else(is.na(region), "Unknown", region),
# Calculate discount eligibility
discount_eligible = case_when(
!valid_amount ~ FALSE,
customer_type == "VIP" ~ TRUE,
customer_type == "Returning" & amount > 100 ~ TRUE,
TRUE ~ FALSE
)
)
print(cleaned_data)
This pattern—combining validation, categorization, and business logic—forms the backbone of most data preparation workflows in R.