R dplyr - slice() - Select Rows by Position
The `slice()` function selects rows by their integer positions. Unlike `filter()` which uses logical conditions, `slice()` works with row numbers directly.
Key Insights
slice()provides position-based row selection in dplyr, offering a cleaner alternative to base R bracket notation with better integration into pipe workflows- The slice family includes specialized functions like
slice_head(),slice_tail(),slice_min(),slice_max(), andslice_sample()for common selection patterns - Position indexing in
slice()respects grouped data, allowing per-group row selection that would require complex base R logic
Basic Position-Based Selection
The slice() function selects rows by their integer positions. Unlike filter() which uses logical conditions, slice() works with row numbers directly.
library(dplyr)
# Sample dataset
employees <- data.frame(
id = 1:10,
name = c("Alice", "Bob", "Carol", "David", "Eve",
"Frank", "Grace", "Henry", "Iris", "Jack"),
salary = c(75000, 82000, 68000, 91000, 77000,
85000, 73000, 88000, 79000, 84000),
department = c("Sales", "IT", "Sales", "IT", "HR",
"Sales", "HR", "IT", "Sales", "HR")
)
# Select first three rows
employees %>% slice(1:3)
# id name salary department
# 1 1 Alice 75000 Sales
# 2 2 Bob 82000 IT
# 3 3 Carol 68000 Sales
# Select specific positions
employees %>% slice(c(2, 5, 8))
# id name salary department
# 1 2 Bob 82000 IT
# 2 5 Eve 77000 HR
# 3 8 Henry 88000 IT
# Select all except specific positions (negative indexing)
employees %>% slice(-c(1:5))
# id name salary department
# 1 6 Frank 85000 Sales
# 2 7 Grace 73000 HR
# 3 8 Henry 88000 IT
# 4 9 Iris 79000 Sales
# 5 10 Jack 84000 HR
Working with Grouped Data
slice() becomes particularly powerful with grouped data, applying position selection within each group independently.
# Select first two employees from each department
employees %>%
group_by(department) %>%
slice(1:2) %>%
ungroup()
# id name salary department
# 1 5 Eve 77000 HR
# 2 7 Grace 73000 HR
# 3 2 Bob 82000 IT
# 4 4 David 91000 IT
# 5 1 Alice 75000 Sales
# 6 3 Carol 68000 Sales
# Last employee in each department
employees %>%
group_by(department) %>%
slice(n()) %>%
ungroup()
# id name salary department
# 1 10 Jack 84000 HR
# 2 8 Henry 88000 IT
# 3 9 Iris 79000 Sales
The n() function returns the number of rows in each group, making it useful for selecting the last row or calculating relative positions.
Specialized Slice Functions
The slice family includes convenience functions for common patterns.
slice_head() and slice_tail()
# First 3 rows
employees %>% slice_head(n = 3)
# id name salary department
# 1 1 Alice 75000 Sales
# 2 2 Bob 82000 IT
# 3 3 Carol 68000 Sales
# Last 2 rows from each department
employees %>%
group_by(department) %>%
slice_tail(n = 2)
# id name salary department
# 1 7 Grace 73000 HR
# 2 10 Jack 84000 HR
# 3 4 David 91000 IT
# 4 8 Henry 88000 IT
# 5 6 Frank 85000 Sales
# 6 9 Iris 79000 Sales
# Proportion-based selection (20% of rows)
employees %>% slice_head(prop = 0.2)
# id name salary department
# 1 1 Alice 75000 Sales
# 2 2 Bob 82000 IT
slice_min() and slice_max()
These functions select rows with minimum or maximum values of a specified variable.
# Three employees with lowest salaries
employees %>% slice_min(salary, n = 3)
# id name salary department
# 1 3 Carol 68000 Sales
# 2 7 Grace 73000 HR
# 3 1 Alice 75000 Sales
# Highest paid employee per department
employees %>%
group_by(department) %>%
slice_max(salary, n = 1)
# id name salary department
# 1 10 Jack 84000 HR
# 2 4 David 91000 IT
# 3 6 Frank 85000 Sales
# Handle ties by keeping all tied rows
employees_with_ties <- employees
employees_with_ties$salary[10] <- 91000 # Create tie
employees_with_ties %>%
slice_max(salary, n = 1, with_ties = TRUE)
# id name salary department
# 1 4 David 91000 IT
# 2 10 Jack 91000 HR
slice_sample()
Random sampling for exploratory analysis or creating training/test splits.
# Random 4 rows
set.seed(123)
employees %>% slice_sample(n = 4)
# id name salary department
# 1 3 Carol 68000 Sales
# 2 8 Henry 88000 IT
# 3 4 David 91000 IT
# 4 9 Iris 79000 Sales
# 30% random sample per department
employees %>%
group_by(department) %>%
slice_sample(prop = 0.3)
# Sample with replacement
employees %>%
slice_sample(n = 15, replace = TRUE)
# Weighted sampling
employees %>%
slice_sample(n = 5, weight_by = salary)
Practical Applications
Removing Duplicate Records After Sorting
transactions <- data.frame(
customer_id = c(101, 101, 101, 102, 102, 103),
date = as.Date(c("2024-01-15", "2024-01-20", "2024-01-10",
"2024-01-18", "2024-01-22", "2024-01-12")),
amount = c(250, 180, 300, 420, 390, 150)
)
# Get most recent transaction per customer
transactions %>%
arrange(customer_id, desc(date)) %>%
group_by(customer_id) %>%
slice(1) %>%
ungroup()
# customer_id date amount
# 1 101 2024-01-20 180
# 2 102 2024-01-22 390
# 3 103 2024-01-12 150
Pagination for Large Datasets
# Page 2 with 3 items per page
page_size <- 3
page_number <- 2
employees %>%
slice(((page_number - 1) * page_size + 1):(page_number * page_size))
# id name salary department
# 1 4 David 91000 IT
# 2 5 Eve 77000 HR
# 3 6 Frank 85000 Sales
Top N Analysis per Category
sales_data <- data.frame(
product = paste0("Product_", 1:20),
category = rep(c("Electronics", "Clothing", "Food", "Books"), each = 5),
revenue = runif(20, 1000, 10000)
)
# Top 2 products by revenue in each category
sales_data %>%
group_by(category) %>%
slice_max(revenue, n = 2, with_ties = FALSE) %>%
arrange(category, desc(revenue))
Performance Considerations
slice() operates on row positions, making it efficient for position-based selection. However, when working with large grouped datasets, consider these optimizations:
library(microbenchmark)
large_df <- data.frame(
group = rep(1:1000, each = 100),
value = rnorm(100000)
)
# slice() is optimized for grouped operations
microbenchmark(
slice_approach = large_df %>%
group_by(group) %>%
slice(1:5),
filter_approach = large_df %>%
group_by(group) %>%
filter(row_number() <= 5),
times = 50
)
For very large datasets where you only need the first few rows, slice_head() can short-circuit evaluation:
# More efficient than slice(1:10) on massive datasets
huge_dataset %>% slice_head(n = 10)
Combining slice() with Other dplyr Verbs
# Complex pipeline: filter, arrange, then select top performers
employees %>%
filter(department %in% c("IT", "Sales")) %>%
arrange(desc(salary)) %>%
slice_head(n = 3)
# id name salary department
# 1 4 David 91000 IT
# 2 8 Henry 88000 IT
# 3 6 Frank 85000 Sales
# Select middle 50% of salaries per department
employees %>%
group_by(department) %>%
arrange(salary) %>%
slice(ceiling(n() * 0.25):floor(n() * 0.75))
The slice() family provides intuitive, readable syntax for position-based row selection. By integrating seamlessly with dplyr’s grouped operations and pipe workflows, these functions eliminate the need for complex indexing logic while maintaining excellent performance on both small and large datasets.