R dplyr - filter() with Multiple Conditions
The `filter()` function from dplyr accepts multiple conditions separated by commas, which implicitly creates an AND relationship. Each condition must evaluate to a logical vector.
Key Insights
- The
filter()function supports multiple conditions using logical operators (&,|,!) and can be combined with helper functions like%in%,between(), andif_any()for complex filtering scenarios - Conditions can be chained with commas (implicit AND) or connected explicitly with
&for AND operations and|for OR operations, with performance implications favoring comma separation - Advanced filtering techniques include using
across()for column-wise operations,str_detect()for pattern matching, and combining multiple filter criteria with grouped data operations
Basic Multiple Condition Filtering
The filter() function from dplyr accepts multiple conditions separated by commas, which implicitly creates an AND relationship. Each condition must evaluate to a logical vector.
library(dplyr)
# Sample dataset
employees <- data.frame(
name = c("Alice", "Bob", "Charlie", "Diana", "Eve"),
department = c("Sales", "IT", "Sales", "HR", "IT"),
salary = c(65000, 75000, 58000, 72000, 80000),
years_experience = c(5, 8, 3, 6, 10)
)
# Filter with comma-separated conditions (AND)
high_earners_sales <- employees %>%
filter(
department == "Sales",
salary > 60000,
years_experience >= 4
)
print(high_earners_sales)
This returns only rows where all three conditions are TRUE. The comma-separated approach is cleaner and slightly more efficient than using explicit & operators.
Using Logical Operators
For more complex logic, combine conditions with & (AND), | (OR), and ! (NOT) operators.
# Explicit AND operator
it_or_high_salary <- employees %>%
filter(department == "IT" & salary > 70000)
# OR operator - employees in Sales OR with high salaries
sales_or_wealthy <- employees %>%
filter(department == "Sales" | salary > 70000)
# Complex combination
complex_filter <- employees %>%
filter(
(department == "IT" | department == "Sales") &
salary > 60000 &
years_experience >= 5
)
# Negation
not_sales <- employees %>%
filter(!department == "Sales")
# Equivalent to: filter(department != "Sales")
Parentheses control operator precedence. Without them, & binds tighter than |, which can produce unexpected results.
Filtering with %in% Operator
The %in% operator efficiently filters rows matching any value in a vector, replacing multiple OR conditions.
# Instead of: department == "IT" | department == "Sales" | department == "HR"
selected_departments <- employees %>%
filter(department %in% c("IT", "Sales", "HR"))
# Negation with %in%
not_in_departments <- employees %>%
filter(!department %in% c("HR"))
# Combining %in% with other conditions
experienced_it_sales <- employees %>%
filter(
department %in% c("IT", "Sales"),
years_experience > 5,
salary >= 65000
)
The %in% approach scales better than chained OR operators and improves readability when checking multiple values.
Range Filtering with between()
The between() helper function simplifies range checks with inclusive boundaries.
# Traditional approach
mid_salary_traditional <- employees %>%
filter(salary >= 60000 & salary <= 75000)
# Using between() - cleaner syntax
mid_salary_between <- employees %>%
filter(between(salary, 60000, 75000))
# Combining between() with other conditions
mid_career_mid_salary <- employees %>%
filter(
between(salary, 60000, 75000),
between(years_experience, 4, 8),
department %in% c("IT", "Sales")
)
The between() function is inclusive on both ends and handles NA values by returning FALSE.
Filtering Across Multiple Columns
The across() function applies conditions to multiple columns simultaneously, useful for datasets with many similar columns.
# Sample data with multiple numeric columns
performance <- data.frame(
employee = c("Alice", "Bob", "Charlie", "Diana"),
q1_score = c(85, 92, 78, 95),
q2_score = c(88, 89, 82, 93),
q3_score = c(90, 91, 75, 94),
q4_score = c(87, 93, 80, 96)
)
# Filter rows where all quarterly scores exceed 80
consistent_performers <- performance %>%
filter(across(starts_with("q"), ~ .x > 80))
# Filter where at least one score is above 90
high_achievers <- performance %>%
filter(if_any(starts_with("q"), ~ .x > 90))
# Filter where all scores are within range
stable_performers <- performance %>%
filter(across(q1_score:q4_score, ~ between(.x, 80, 95)))
The if_any() function returns TRUE if any condition matches, while if_all() requires all conditions to be TRUE (equivalent to across() in filter context).
String Pattern Matching
Combine filter() with stringr functions for text-based filtering.
library(stringr)
employees_extended <- data.frame(
name = c("Alice Smith", "Bob Jones", "Charlie Brown", "Diana Smith"),
email = c("alice@sales.com", "bob@it.com", "charlie@sales.com", "diana@hr.com"),
department = c("Sales", "IT", "Sales", "HR")
)
# Filter by string pattern
smith_family <- employees_extended %>%
filter(str_detect(name, "Smith"))
# Multiple string conditions
sales_emails <- employees_extended %>%
filter(
str_detect(email, "@sales\\.com$"),
str_detect(name, "^[A-C]") # Names starting with A, B, or C
)
# Case-insensitive matching
contains_sales <- employees_extended %>%
filter(str_detect(str_to_lower(email), "sales"))
Regular expressions in str_detect() enable sophisticated pattern matching for complex filtering requirements.
Filtering with NA Values
Handle missing data explicitly in filter conditions using is.na() and !is.na().
data_with_na <- data.frame(
id = 1:5,
value = c(10, NA, 30, 40, NA),
category = c("A", "B", NA, "A", "B")
)
# Remove rows with NA in specific column
complete_values <- data_with_na %>%
filter(!is.na(value))
# Keep only rows with NA
only_na <- data_with_na %>%
filter(is.na(value))
# Multiple NA conditions
complete_cases <- data_with_na %>%
filter(
!is.na(value),
!is.na(category)
)
# Filter with condition that handles NA
safe_filter <- data_with_na %>%
filter(value > 20 | is.na(value))
Logical operations with NA values return NA, which filter() treats as FALSE, removing those rows unless explicitly handled.
Grouped Filtering
Apply filters within groups using group_by() before filtering, enabling context-aware conditions.
sales_data <- data.frame(
department = c("IT", "IT", "Sales", "Sales", "HR", "HR"),
employee = c("Bob", "Eve", "Alice", "Charlie", "Diana", "Frank"),
salary = c(75000, 80000, 65000, 58000, 72000, 68000)
)
# Filter for above-average salaries within each department
above_dept_average <- sales_data %>%
group_by(department) %>%
filter(salary > mean(salary)) %>%
ungroup()
# Top 1 salary per department
top_earners <- sales_data %>%
group_by(department) %>%
filter(salary == max(salary)) %>%
ungroup()
# Keep groups with more than 1 member above threshold
large_high_salary_groups <- sales_data %>%
group_by(department) %>%
filter(
salary > 65000,
n() > 1 # n() counts rows in current group
) %>%
ungroup()
Always ungroup() after grouped operations to avoid unexpected behavior in subsequent operations.
Performance Considerations
For large datasets, filter order and technique impact performance significantly.
# More efficient: filter early, reduce data size first
efficient_approach <- large_dataset %>%
filter(common_condition) %>% # Eliminates most rows
filter(expensive_calculation > threshold)
# Less efficient: expensive operations on full dataset
inefficient_approach <- large_dataset %>%
filter(expensive_calculation > threshold) %>%
filter(common_condition)
# Use comma separation over & for multiple conditions
# Slightly faster due to short-circuit evaluation
optimized <- data %>%
filter(
condition1,
condition2,
condition3
)
Place the most selective conditions first to minimize rows processed by subsequent conditions. Comma-separated conditions allow R to short-circuit evaluation more effectively than explicit & operators.