R dplyr - filter() Rows by Condition

The **filter()** function from dplyr selects rows where conditions evaluate to TRUE. Unlike base R subsetting with brackets, filter() automatically removes NA values and integrates cleanly into piped...

Key Insights

  • filter() selects rows based on logical conditions, supporting multiple criteria with AND/OR logic and working seamlessly with pipe operators for readable data transformations
  • String matching requires grepl() or stringr functions since filter() evaluates logical expressions, not pattern matching directly
  • Performance optimization comes from filtering early in pipelines, using %in% for multiple values, and combining conditions efficiently rather than chaining multiple filter() calls

Basic Filtering Syntax

The filter() function from dplyr selects rows where conditions evaluate to TRUE. Unlike base R subsetting with brackets, filter() automatically removes NA values and integrates cleanly into piped workflows.

library(dplyr)

# Sample dataset
employees <- data.frame(
  id = 1:8,
  name = c("Alice", "Bob", "Carol", "David", "Eve", "Frank", "Grace", "Henry"),
  department = c("Sales", "IT", "Sales", "IT", "HR", "IT", "Sales", "HR"),
  salary = c(65000, 72000, 58000, 81000, 69000, 75000, 62000, 71000),
  years_experience = c(3, 5, 2, 8, 4, 6, 3, 5)
)

# Single condition
high_earners <- employees %>%
  filter(salary > 70000)

print(high_earners)
#   id   name department salary years_experience
# 1  2    Bob         IT  72000                5
# 2  4  David         IT  81000                8
# 3  6  Frank         IT  75000                6
# 4  8  Henry         HR  71000                5

Multiple Conditions with AND Logic

Combine conditions with commas or the & operator. Each condition must evaluate to TRUE for a row to be included.

# IT department with high salaries (comma-separated)
it_high_earners <- employees %>%
  filter(department == "IT", salary > 70000)

# Equivalent using & operator
it_high_earners_alt <- employees %>%
  filter(department == "IT" & salary > 70000)

print(it_high_earners)
#   id   name department salary years_experience
# 1  2    Bob         IT  72000                5
# 2  4  David         IT  81000                8
# 3  6  Frank         IT  75000                6

# Range filtering
mid_career <- employees %>%
  filter(years_experience >= 4, years_experience <= 6)

print(mid_career)
#   id   name department salary years_experience
# 1  2    Bob         IT  72000                5
# 2  5    Eve         HR  69000                4
# 3  6  Frank         IT  75000                6
# 4  8  Henry         HR  71000                5

OR Conditions and Complex Logic

Use the | operator for OR logic. Parentheses control evaluation order when mixing AND/OR conditions.

# Sales OR high salary
sales_or_high <- employees %>%
  filter(department == "Sales" | salary > 70000)

print(sales_or_high)
#   id   name department salary years_experience
# 1  1  Alice      Sales  65000                3
# 2  2    Bob         IT  72000                5
# 3  3  Carol      Sales  58000                2
# 4  4  David         IT  81000                8
# 5  6  Frank         IT  75000                6
# 6  7  Grace      Sales  62000                3
# 7  8  Henry         HR  71000                5

# Complex: (Sales with experience > 2) OR (IT with salary > 80000)
complex_filter <- employees %>%
  filter((department == "Sales" & years_experience > 2) | 
         (department == "IT" & salary > 80000))

print(complex_filter)
#   id   name department salary years_experience
# 1  1  Alice      Sales  65000                3
# 2  4  David         IT  81000                8
# 3  7  Grace      Sales  62000                3

Filtering with %in% for Multiple Values

The %in% operator efficiently checks membership in a vector, replacing multiple OR conditions.

# Multiple departments - inefficient
dept_filter_bad <- employees %>%
  filter(department == "IT" | department == "HR")

# Multiple departments - efficient
dept_filter_good <- employees %>%
  filter(department %in% c("IT", "HR"))

print(dept_filter_good)
#   id   name department salary years_experience
# 1  2    Bob         IT  72000                5
# 2  4  David         IT  81000                8
# 3  5    Eve         HR  69000                4
# 4  6  Frank         IT  75000                6
# 5  8  Henry         HR  71000                5

# Exclude values with negation
not_sales <- employees %>%
  filter(!department %in% c("Sales"))

# Filter by ID list
target_ids <- c(2, 4, 6, 8)
selected_employees <- employees %>%
  filter(id %in% target_ids)

String Pattern Matching

Filter text columns using grepl() for pattern matching or stringr functions for advanced operations.

library(stringr)

# Names containing 'a' (case-insensitive)
names_with_a <- employees %>%
  filter(grepl("a", name, ignore.case = TRUE))

print(names_with_a)
#   id   name department salary years_experience
# 1  1  Alice      Sales  65000                3
# 2  3  Carol      Sales  58000                2
# 3  4  David         IT  81000                8
# 4  6  Frank         IT  75000                6
# 5  7  Grace      Sales  62000                3

# Using stringr for more complex patterns
starts_with_c_or_g <- employees %>%
  filter(str_detect(name, "^[CG]"))

print(starts_with_c_or_g)
#   id   name department salary years_experience
# 1  3  Carol      Sales  58000                2
# 2  7  Grace      Sales  62000                3

# Multiple pattern matching
sales_or_hr_dept <- employees %>%
  filter(str_detect(department, "Sales|HR"))

Handling Missing Values

Filter() automatically excludes NA values. Use is.na() or !is.na() for explicit NA handling.

# Dataset with missing values
employees_na <- employees
employees_na$salary[c(2, 5)] <- NA

# Automatic NA removal
has_salary <- employees_na %>%
  filter(salary > 65000)

print(has_salary)
#   id   name department salary years_experience
# 1  4  David         IT  81000                8
# 2  6  Frank         IT  75000                6
# 3  8  Henry         HR  71000                5

# Explicit NA filtering
only_na_salary <- employees_na %>%
  filter(is.na(salary))

print(only_na_salary)
#   id name department salary years_experience
# 1  2  Bob         IT     NA                5
# 2  5  Eve         HR     NA                4

# Keep rows with complete data
complete_records <- employees_na %>%
  filter(!is.na(salary), !is.na(department))

Using filter() with Helper Functions

Combine filter() with dplyr helper functions like between(), near(), and if_any()/if_all() for advanced filtering.

# between() for range checks
mid_salary <- employees %>%
  filter(between(salary, 60000, 75000))

print(mid_salary)
#   id   name department salary years_experience
# 1  1  Alice      Sales  65000                3
# 2  2    Bob         IT  72000                5
# 3  5    Eve         HR  69000                4
# 4  6  Frank         IT  75000                6
# 5  7  Grace      Sales  62000                3
# 6  8  Henry         HR  71000                5

# near() for floating-point comparisons
target_salary <- 72000
near_target <- employees %>%
  filter(near(salary, target_salary, tol = 5000))

# if_any() - match any condition across columns (dplyr 1.0.4+)
library(tidyselect)

high_values <- employees %>%
  filter(if_any(c(salary, years_experience), ~ . > 7))

print(high_values)
#   id   name department salary years_experience
# 1  2    Bob         IT  72000                5
# 2  4  David         IT  81000                8
# 3  5    Eve         HR  69000                4
# 4  6  Frank         IT  75000                6
# 5  8  Henry         HR  71000                5

Performance Considerations

Position filter() early in pipelines to reduce data volume for subsequent operations. Combine conditions in a single filter() call rather than chaining.

# Inefficient - multiple filter calls
result_slow <- employees %>%
  filter(department == "IT") %>%
  filter(salary > 70000) %>%
  filter(years_experience >= 5)

# Efficient - single filter call
result_fast <- employees %>%
  filter(department == "IT", salary > 70000, years_experience >= 5)

# Pre-compute complex conditions for readability
employees %>%
  mutate(is_senior = years_experience >= 5 & salary > 70000) %>%
  filter(is_senior, department == "IT")

# Use %in% instead of multiple OR conditions
# Bad: filter(id == 1 | id == 2 | id == 3 | id == 4)
# Good: filter(id %in% 1:4)

Filtering Grouped Data

Filter() respects grouping from group_by(), enabling per-group filtering operations.

# Filter to above-median salary within each department
above_dept_median <- employees %>%
  group_by(department) %>%
  filter(salary > median(salary)) %>%
  ungroup()

print(above_dept_median)
#   id   name department salary years_experience
# 1  1  Alice      Sales  65000                3
# 2  2    Bob         IT  72000                5
# 3  4  David         IT  81000                8
# 4  6  Frank         IT  75000                6
# 5  7  Grace      Sales  62000                3
# 6  8  Henry         HR  71000                5

# Keep only departments with more than 2 employees
large_departments <- employees %>%
  group_by(department) %>%
  filter(n() > 2) %>%
  ungroup()

print(large_departments)
#   id   name department salary years_experience
# 1  1  Alice      Sales  65000                3
# 2  2    Bob         IT  72000                5
# 3  3  Carol      Sales  58000                2
# 4  4  David         IT  81000                8
# 5  6  Frank         IT  75000                6
# 6  7  Grace      Sales  62000                3

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.