R dplyr - slice() - Select Rows by Position

The `slice()` function selects rows by their integer positions. Unlike `filter()` which uses logical conditions, `slice()` works with row numbers directly.

Key Insights

  • slice() provides position-based row selection in dplyr, offering a cleaner alternative to base R bracket notation with better integration into pipe workflows
  • The slice family includes specialized functions like slice_head(), slice_tail(), slice_min(), slice_max(), and slice_sample() for common selection patterns
  • Position indexing in slice() respects grouped data, allowing per-group row selection that would require complex base R logic

Basic Position-Based Selection

The slice() function selects rows by their integer positions. Unlike filter() which uses logical conditions, slice() works with row numbers directly.

library(dplyr)

# Sample dataset
employees <- data.frame(
  id = 1:10,
  name = c("Alice", "Bob", "Carol", "David", "Eve", 
           "Frank", "Grace", "Henry", "Iris", "Jack"),
  salary = c(75000, 82000, 68000, 91000, 77000, 
             85000, 73000, 88000, 79000, 84000),
  department = c("Sales", "IT", "Sales", "IT", "HR",
                 "Sales", "HR", "IT", "Sales", "HR")
)

# Select first three rows
employees %>% slice(1:3)
#   id  name salary department
# 1  1 Alice  75000      Sales
# 2  2   Bob  82000         IT
# 3  3 Carol  68000      Sales

# Select specific positions
employees %>% slice(c(2, 5, 8))
#   id name salary department
# 1  2  Bob  82000         IT
# 2  5  Eve  77000         HR
# 3  8 Henry 88000         IT

# Select all except specific positions (negative indexing)
employees %>% slice(-c(1:5))
#   id  name salary department
# 1  6 Frank  85000      Sales
# 2  7 Grace  73000         HR
# 3  8 Henry  88000         IT
# 4  9  Iris  79000      Sales
# 5 10  Jack  84000         HR

Working with Grouped Data

slice() becomes particularly powerful with grouped data, applying position selection within each group independently.

# Select first two employees from each department
employees %>%
  group_by(department) %>%
  slice(1:2) %>%
  ungroup()
#   id  name salary department
# 1  5   Eve  77000         HR
# 2  7 Grace  73000         HR
# 3  2   Bob  82000         IT
# 4  4 David  91000         IT
# 5  1 Alice  75000      Sales
# 6  3 Carol  68000      Sales

# Last employee in each department
employees %>%
  group_by(department) %>%
  slice(n()) %>%
  ungroup()
#   id name salary department
# 1 10 Jack  84000         HR
# 2  8 Henry 88000         IT
# 3  9  Iris 79000      Sales

The n() function returns the number of rows in each group, making it useful for selecting the last row or calculating relative positions.

Specialized Slice Functions

The slice family includes convenience functions for common patterns.

slice_head() and slice_tail()

# First 3 rows
employees %>% slice_head(n = 3)
#   id  name salary department
# 1  1 Alice  75000      Sales
# 2  2   Bob  82000         IT
# 3  3 Carol  68000      Sales

# Last 2 rows from each department
employees %>%
  group_by(department) %>%
  slice_tail(n = 2)
#   id  name salary department
# 1  7 Grace  73000         HR
# 2 10  Jack  84000         HR
# 3  4 David  91000         IT
# 4  8 Henry  88000         IT
# 5  6 Frank  85000      Sales
# 6  9  Iris  79000      Sales

# Proportion-based selection (20% of rows)
employees %>% slice_head(prop = 0.2)
#   id  name salary department
# 1  1 Alice  75000      Sales
# 2  2   Bob  82000         IT

slice_min() and slice_max()

These functions select rows with minimum or maximum values of a specified variable.

# Three employees with lowest salaries
employees %>% slice_min(salary, n = 3)
#   id  name salary department
# 1  3 Carol  68000      Sales
# 2  7 Grace  73000         HR
# 3  1 Alice  75000      Sales

# Highest paid employee per department
employees %>%
  group_by(department) %>%
  slice_max(salary, n = 1)
#   id  name salary department
# 1 10  Jack  84000         HR
# 2  4 David  91000         IT
# 3  6 Frank  85000      Sales

# Handle ties by keeping all tied rows
employees_with_ties <- employees
employees_with_ties$salary[10] <- 91000  # Create tie

employees_with_ties %>%
  slice_max(salary, n = 1, with_ties = TRUE)
#   id  name salary department
# 1  4 David  91000         IT
# 2 10  Jack  91000         HR

slice_sample()

Random sampling for exploratory analysis or creating training/test splits.

# Random 4 rows
set.seed(123)
employees %>% slice_sample(n = 4)
#   id  name salary department
# 1  3 Carol  68000      Sales
# 2  8 Henry  88000         IT
# 3  4 David  91000         IT
# 4  9  Iris  79000      Sales

# 30% random sample per department
employees %>%
  group_by(department) %>%
  slice_sample(prop = 0.3)

# Sample with replacement
employees %>% 
  slice_sample(n = 15, replace = TRUE)

# Weighted sampling
employees %>%
  slice_sample(n = 5, weight_by = salary)

Practical Applications

Removing Duplicate Records After Sorting

transactions <- data.frame(
  customer_id = c(101, 101, 101, 102, 102, 103),
  date = as.Date(c("2024-01-15", "2024-01-20", "2024-01-10",
                   "2024-01-18", "2024-01-22", "2024-01-12")),
  amount = c(250, 180, 300, 420, 390, 150)
)

# Get most recent transaction per customer
transactions %>%
  arrange(customer_id, desc(date)) %>%
  group_by(customer_id) %>%
  slice(1) %>%
  ungroup()
#   customer_id       date amount
# 1         101 2024-01-20    180
# 2         102 2024-01-22    390
# 3         103 2024-01-12    150

Pagination for Large Datasets

# Page 2 with 3 items per page
page_size <- 3
page_number <- 2

employees %>%
  slice(((page_number - 1) * page_size + 1):(page_number * page_size))
#   id  name salary department
# 1  4 David  91000         IT
# 2  5   Eve  77000         HR
# 3  6 Frank  85000      Sales

Top N Analysis per Category

sales_data <- data.frame(
  product = paste0("Product_", 1:20),
  category = rep(c("Electronics", "Clothing", "Food", "Books"), each = 5),
  revenue = runif(20, 1000, 10000)
)

# Top 2 products by revenue in each category
sales_data %>%
  group_by(category) %>%
  slice_max(revenue, n = 2, with_ties = FALSE) %>%
  arrange(category, desc(revenue))

Performance Considerations

slice() operates on row positions, making it efficient for position-based selection. However, when working with large grouped datasets, consider these optimizations:

library(microbenchmark)

large_df <- data.frame(
  group = rep(1:1000, each = 100),
  value = rnorm(100000)
)

# slice() is optimized for grouped operations
microbenchmark(
  slice_approach = large_df %>% 
    group_by(group) %>% 
    slice(1:5),
  
  filter_approach = large_df %>%
    group_by(group) %>%
    filter(row_number() <= 5),
  
  times = 50
)

For very large datasets where you only need the first few rows, slice_head() can short-circuit evaluation:

# More efficient than slice(1:10) on massive datasets
huge_dataset %>% slice_head(n = 10)

Combining slice() with Other dplyr Verbs

# Complex pipeline: filter, arrange, then select top performers
employees %>%
  filter(department %in% c("IT", "Sales")) %>%
  arrange(desc(salary)) %>%
  slice_head(n = 3)
#   id  name salary department
# 1  4 David  91000         IT
# 2  8 Henry  88000         IT
# 3  6 Frank  85000      Sales

# Select middle 50% of salaries per department
employees %>%
  group_by(department) %>%
  arrange(salary) %>%
  slice(ceiling(n() * 0.25):floor(n() * 0.75))

The slice() family provides intuitive, readable syntax for position-based row selection. By integrating seamlessly with dplyr’s grouped operations and pipe workflows, these functions eliminate the need for complex indexing logic while maintaining excellent performance on both small and large datasets.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.