Dplyr

Dec 10, 2025 R

R dplyr - select() Columns

The select() function from dplyr extracts columns from data frames using intuitive syntax. Unlike base R’s bracket notation, select() returns a tibble and allows unquoted column names.

Read more →

Dec 10, 2025 R

R dplyr - select() Helpers (starts_with, ends_with, contains)

• The select() function in dplyr offers helper functions that match column names by patterns, eliminating tedious manual column specification and reducing errors in data manipulation workflows

Read more →

Dec 10, 2025 R

R dplyr - slice() - Select Rows by Position

The slice() function selects rows by their integer positions. Unlike filter() which uses logical conditions, slice() works with row numbers directly.

Read more →

Dec 10, 2025 R

R dplyr - summarise() with Examples

The summarise() function from dplyr condenses data frames into summary statistics. At its core, it takes a data frame and returns a smaller one containing computed aggregate values.

Read more →

Dec 10, 2025 R

R dplyr - top_n() and slice_max()

The dplyr package deprecated top_n() in version 1.0.0, recommending slice_max() and slice_min() as replacements. This wasn’t arbitrary—top_n() had ambiguous behavior, particularly around tie…

Read more →

Dec 09, 2025 R

R dplyr - left_join, right_join, inner_join, full_join

Joins combine two dataframes based on shared key columns. Each join type handles non-matching rows differently, which directly impacts your result set size and content.

Read more →

Dec 09, 2025 R

R dplyr - mutate() - Add/Modify Columns

The mutate() function from dplyr adds new variables or transforms existing ones in your data frame. Unlike base R’s approach of modifying columns with $ or [], mutate() keeps your data…

Read more →

Dec 09, 2025 R

R dplyr - n() and n_distinct()

• n() counts rows within groups while n_distinct() counts unique values, forming the foundation of aggregation operations in dplyr

Read more →

Dec 09, 2025 R

R dplyr - ntile() - Bin into N Groups

The ntile() function from dplyr divides a vector into N bins of approximately equal size. It assigns each observation a bin number from 1 to N based on its rank in ascending order. This differs…

Read more →

Dec 09, 2025 R

R dplyr - Pipe Operator (%>% and |>)

The pipe operator revolutionizes R code readability by eliminating nested function calls. Instead of writing function3(function2(function1(data))), you write `data %>% function1() %>% function2()…

Read more →

Dec 09, 2025 R

R dplyr - relocate() - Reorder Columns

The relocate() function from dplyr moves columns to new positions within a data frame. By default, it moves specified columns to the leftmost position.

Read more →

Dec 09, 2025 R

R dplyr - rename() Columns

The rename() function from dplyr uses a straightforward syntax where you specify the new name on the left and the old name on the right. This reversed assignment feels natural when reading code…

Read more →

Dec 09, 2025 R

R dplyr - row_number(), rank(), dense_rank()

The dplyr package provides three distinct ranking functions that assign positional values to rows. While they appear similar, their handling of tied values creates fundamentally different outputs.

Read more →

Dec 08, 2025 R

R dplyr - case_when() Examples

The case_when() function evaluates conditions from top to bottom, returning the right-hand side value when a condition evaluates to TRUE. Each condition follows the formula syntax: `condition ~…

Read more →

Dec 08, 2025 R

R dplyr - Complete Tutorial with Examples

dplyr transforms data manipulation in R by providing a grammar of data manipulation. Instead of learning dozens of functions with inconsistent interfaces, you master five verbs that combine to solve…

Read more →

Dec 08, 2025 R

R dplyr - count() and tally()

The dplyr package provides two complementary functions for counting observations: count() and tally(). While both produce frequency counts, they differ in their workflow position. count()…

Read more →

Dec 08, 2025 R

R dplyr - distinct() - Remove Duplicates

The distinct() function from dplyr identifies and removes duplicate rows from data frames. Unlike base R’s unique(), it works naturally with tibbles and integrates into pipe-based workflows.

Read more →

Dec 08, 2025 R

R dplyr - filter() Rows by Condition

The filter() function from dplyr selects rows where conditions evaluate to TRUE. Unlike base R subsetting with brackets, filter() automatically removes NA values and integrates cleanly into piped…

Read more →

Dec 08, 2025 R

R dplyr - filter() with Multiple Conditions

The filter() function from dplyr accepts multiple conditions separated by commas, which implicitly creates an AND relationship. Each condition must evaluate to a logical vector.

Read more →

Dec 08, 2025 R

R dplyr - group_by() and summarise()

The group_by() function transforms a regular data frame into a grouped tibble, which subsequent operations treat as separate partitions. This grouping is metadata—the physical data structure…

Read more →

Dec 08, 2025 R

R dplyr - if_else() vs ifelse()

The fundamental distinction between if_else() and ifelse() lies in type checking. if_else() enforces strict type consistency between the true and false branches, preventing silent type coercion…

Read more →

Dec 08, 2025 R

R dplyr - lag() and lead() Functions

• The lag() and lead() functions shift values within a vector by a specified number of positions, essential for time-series analysis, calculating differences between consecutive rows, and…

Read more →

Dec 07, 2025 R

R dplyr - across() - Apply Function Across Columns

The across() function operates within dplyr verbs like mutate(), summarise(), and filter(). Its basic structure takes a column selection and a function to apply:

Read more →

Dec 07, 2025 R

R dplyr - anti_join() and semi_join()

The dplyr package provides two filtering joins that differ fundamentally from mutating joins like inner_join() or left_join(). While mutating joins combine columns from both tables, filtering…

Read more →

Dec 07, 2025 R

R dplyr - arrange() - Sort Data Frame

The arrange() function from dplyr provides an intuitive interface for sorting data frames. Unlike base R’s order(), it returns the entire data frame in sorted order rather than just indices.

Read more →

Dec 07, 2025 R

R dplyr - between() - Filter Between Values

The between() function in dplyr filters rows where values fall within a specified range, inclusive of both boundaries. The syntax is straightforward: