SQL - ROW_NUMBER() Function

ROW_NUMBER() is a window function that assigns a unique sequential integer to each row within a partition of a result set. The numbering starts at 1 and increments by 1 for each row, regardless of...

Key Insights

  • ROW_NUMBER() assigns sequential integers to rows within a result set partition, enabling precise row identification and pagination without relying on physical table ordering
  • When combined with PARTITION BY, ROW_NUMBER() restarts numbering for each logical group, making it essential for operations like finding top N records per category or deduplicating data
  • The function’s deterministic ordering requires an ORDER BY clause within the OVER() specification, distinguishing it from RANK() and DENSE_RANK() which handle ties differently

Understanding ROW_NUMBER() Basics

ROW_NUMBER() is a window function that assigns a unique sequential integer to each row within a partition of a result set. The numbering starts at 1 and increments by 1 for each row, regardless of duplicate values in the ordering columns.

The basic syntax:

ROW_NUMBER() OVER (
    [PARTITION BY partition_expression]
    ORDER BY sort_expression [ASC|DESC]
)

Here’s a simple example using an employees table:

SELECT 
    employee_id,
    first_name,
    department,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) as row_num
FROM employees;

This assigns row numbers based on salary in descending order. Employee with the highest salary gets row_num = 1, second highest gets 2, and so on.

Partitioning Data with PARTITION BY

The real power of ROW_NUMBER() emerges when using PARTITION BY. This clause divides the result set into partitions, and ROW_NUMBER() restarts at 1 for each partition.

Finding the top 3 highest-paid employees per department:

WITH ranked_employees AS (
    SELECT 
        employee_id,
        first_name,
        department,
        salary,
        ROW_NUMBER() OVER (
            PARTITION BY department 
            ORDER BY salary DESC
        ) as dept_rank
    FROM employees
)
SELECT 
    employee_id,
    first_name,
    department,
    salary
FROM ranked_employees
WHERE dept_rank <= 3;

The PARTITION BY department clause resets the counter for each department. Within each department, employees are ordered by salary, allowing you to filter the top N records per group.

Pagination Implementation

ROW_NUMBER() provides an efficient mechanism for implementing pagination in SQL queries, especially when OFFSET/FETCH isn’t available or when you need more control.

-- Page 3, 20 records per page
WITH paginated_data AS (
    SELECT 
        product_id,
        product_name,
        category,
        price,
        ROW_NUMBER() OVER (ORDER BY product_name) as row_num
    FROM products
)
SELECT 
    product_id,
    product_name,
    category,
    price
FROM paginated_data
WHERE row_num BETWEEN 41 AND 60;  -- (3-1) * 20 + 1 to 3 * 20

For dynamic pagination with parameters:

DECLARE @PageNumber INT = 3;
DECLARE @PageSize INT = 20;
DECLARE @StartRow INT = (@PageNumber - 1) * @PageSize + 1;
DECLARE @EndRow INT = @PageNumber * @PageSize;

WITH paginated_data AS (
    SELECT 
        product_id,
        product_name,
        ROW_NUMBER() OVER (ORDER BY product_name) as row_num
    FROM products
)
SELECT 
    product_id,
    product_name
FROM paginated_data
WHERE row_num BETWEEN @StartRow AND @EndRow;

Deduplication Strategies

ROW_NUMBER() excels at removing duplicate records while giving you control over which duplicate to keep.

Removing duplicates, keeping the most recent record:

WITH duplicates AS (
    SELECT 
        customer_id,
        email,
        created_date,
        ROW_NUMBER() OVER (
            PARTITION BY email 
            ORDER BY created_date DESC
        ) as rn
    FROM customer_records
)
DELETE FROM duplicates
WHERE rn > 1;

For databases that don’t support DELETE with CTEs:

DELETE FROM customer_records
WHERE customer_id IN (
    SELECT customer_id
    FROM (
        SELECT 
            customer_id,
            ROW_NUMBER() OVER (
                PARTITION BY email 
                ORDER BY created_date DESC
            ) as rn
        FROM customer_records
    ) ranked
    WHERE rn > 1
);

ROW_NUMBER() vs RANK() vs DENSE_RANK()

Understanding the differences between these functions is crucial for correct implementation:

SELECT 
    product_name,
    sales_amount,
    ROW_NUMBER() OVER (ORDER BY sales_amount DESC) as row_num,
    RANK() OVER (ORDER BY sales_amount DESC) as rank_num,
    DENSE_RANK() OVER (ORDER BY sales_amount DESC) as dense_rank_num
FROM product_sales;

Example results with tied values:

product_name  | sales_amount | row_num | rank_num | dense_rank_num
Product A     | 1000         | 1       | 1        | 1
Product B     | 1000         | 2       | 1        | 1
Product C     | 800          | 3       | 3        | 2
Product D     | 800          | 4       | 3        | 2
Product E     | 600          | 5       | 5        | 3
  • ROW_NUMBER(): Always unique, arbitrary ordering for ties
  • RANK(): Same rank for ties, skips subsequent ranks
  • DENSE_RANK(): Same rank for ties, no gaps in ranking sequence

Advanced Pattern: Running Totals with Row Identification

Combining ROW_NUMBER() with other window functions for complex analytics:

SELECT 
    order_date,
    order_amount,
    ROW_NUMBER() OVER (ORDER BY order_date) as order_sequence,
    SUM(order_amount) OVER (
        ORDER BY order_date 
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as running_total,
    AVG(order_amount) OVER (
        ORDER BY order_date 
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) as moving_avg_7days
FROM orders
WHERE order_date >= '2024-01-01';

Performance Considerations

ROW_NUMBER() operations can be resource-intensive. Several optimization strategies:

Index the ORDER BY columns:

CREATE INDEX idx_employees_dept_salary 
ON employees(department, salary DESC);

Limit the dataset before applying ROW_NUMBER():

WITH filtered_data AS (
    SELECT employee_id, department, salary
    FROM employees
    WHERE hire_date >= '2023-01-01'
    AND status = 'Active'
)
SELECT 
    employee_id,
    department,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rn
FROM filtered_data;

Use appropriate WHERE clauses after ROW_NUMBER():

-- Efficient: Filter immediately after row numbering
WITH ranked AS (
    SELECT 
        employee_id,
        salary,
        ROW_NUMBER() OVER (ORDER BY salary DESC) as rn
    FROM employees
)
SELECT employee_id, salary
FROM ranked
WHERE rn <= 10;  -- Top 10 only

Practical Use Case: Time-Series Gap Detection

Identifying gaps in sequential data:

WITH numbered_transactions AS (
    SELECT 
        transaction_id,
        transaction_date,
        ROW_NUMBER() OVER (ORDER BY transaction_date) as rn,
        DATEADD(day, -ROW_NUMBER() OVER (ORDER BY transaction_date), transaction_date) as group_date
    FROM transactions
    WHERE account_id = 12345
),
gaps AS (
    SELECT 
        group_date,
        MIN(transaction_date) as period_start,
        MAX(transaction_date) as period_end,
        COUNT(*) as consecutive_days
    FROM numbered_transactions
    GROUP BY group_date
)
SELECT 
    period_end as gap_before,
    LEAD(period_start) OVER (ORDER BY period_start) as gap_after,
    DATEDIFF(day, period_end, LEAD(period_start) OVER (ORDER BY period_start)) - 1 as gap_days
FROM gaps
WHERE LEAD(period_start) OVER (ORDER BY period_start) IS NOT NULL
AND DATEDIFF(day, period_end, LEAD(period_start) OVER (ORDER BY period_start)) > 1;

ROW_NUMBER() provides deterministic row identification essential for complex data manipulation, pagination, and analytical queries. Master its interaction with PARTITION BY and ORDER BY clauses to unlock powerful data transformation patterns.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.