SQL - ROW_NUMBER() Function
ROW_NUMBER() is a window function that assigns a unique sequential integer to each row within a partition of a result set. The numbering starts at 1 and increments by 1 for each row, regardless of...
Key Insights
- ROW_NUMBER() assigns sequential integers to rows within a result set partition, enabling precise row identification and pagination without relying on physical table ordering
- When combined with PARTITION BY, ROW_NUMBER() restarts numbering for each logical group, making it essential for operations like finding top N records per category or deduplicating data
- The function’s deterministic ordering requires an ORDER BY clause within the OVER() specification, distinguishing it from RANK() and DENSE_RANK() which handle ties differently
Understanding ROW_NUMBER() Basics
ROW_NUMBER() is a window function that assigns a unique sequential integer to each row within a partition of a result set. The numbering starts at 1 and increments by 1 for each row, regardless of duplicate values in the ordering columns.
The basic syntax:
ROW_NUMBER() OVER (
[PARTITION BY partition_expression]
ORDER BY sort_expression [ASC|DESC]
)
Here’s a simple example using an employees table:
SELECT
employee_id,
first_name,
department,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) as row_num
FROM employees;
This assigns row numbers based on salary in descending order. Employee with the highest salary gets row_num = 1, second highest gets 2, and so on.
Partitioning Data with PARTITION BY
The real power of ROW_NUMBER() emerges when using PARTITION BY. This clause divides the result set into partitions, and ROW_NUMBER() restarts at 1 for each partition.
Finding the top 3 highest-paid employees per department:
WITH ranked_employees AS (
SELECT
employee_id,
first_name,
department,
salary,
ROW_NUMBER() OVER (
PARTITION BY department
ORDER BY salary DESC
) as dept_rank
FROM employees
)
SELECT
employee_id,
first_name,
department,
salary
FROM ranked_employees
WHERE dept_rank <= 3;
The PARTITION BY department clause resets the counter for each department. Within each department, employees are ordered by salary, allowing you to filter the top N records per group.
Pagination Implementation
ROW_NUMBER() provides an efficient mechanism for implementing pagination in SQL queries, especially when OFFSET/FETCH isn’t available or when you need more control.
-- Page 3, 20 records per page
WITH paginated_data AS (
SELECT
product_id,
product_name,
category,
price,
ROW_NUMBER() OVER (ORDER BY product_name) as row_num
FROM products
)
SELECT
product_id,
product_name,
category,
price
FROM paginated_data
WHERE row_num BETWEEN 41 AND 60; -- (3-1) * 20 + 1 to 3 * 20
For dynamic pagination with parameters:
DECLARE @PageNumber INT = 3;
DECLARE @PageSize INT = 20;
DECLARE @StartRow INT = (@PageNumber - 1) * @PageSize + 1;
DECLARE @EndRow INT = @PageNumber * @PageSize;
WITH paginated_data AS (
SELECT
product_id,
product_name,
ROW_NUMBER() OVER (ORDER BY product_name) as row_num
FROM products
)
SELECT
product_id,
product_name
FROM paginated_data
WHERE row_num BETWEEN @StartRow AND @EndRow;
Deduplication Strategies
ROW_NUMBER() excels at removing duplicate records while giving you control over which duplicate to keep.
Removing duplicates, keeping the most recent record:
WITH duplicates AS (
SELECT
customer_id,
email,
created_date,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY created_date DESC
) as rn
FROM customer_records
)
DELETE FROM duplicates
WHERE rn > 1;
For databases that don’t support DELETE with CTEs:
DELETE FROM customer_records
WHERE customer_id IN (
SELECT customer_id
FROM (
SELECT
customer_id,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY created_date DESC
) as rn
FROM customer_records
) ranked
WHERE rn > 1
);
ROW_NUMBER() vs RANK() vs DENSE_RANK()
Understanding the differences between these functions is crucial for correct implementation:
SELECT
product_name,
sales_amount,
ROW_NUMBER() OVER (ORDER BY sales_amount DESC) as row_num,
RANK() OVER (ORDER BY sales_amount DESC) as rank_num,
DENSE_RANK() OVER (ORDER BY sales_amount DESC) as dense_rank_num
FROM product_sales;
Example results with tied values:
product_name | sales_amount | row_num | rank_num | dense_rank_num
Product A | 1000 | 1 | 1 | 1
Product B | 1000 | 2 | 1 | 1
Product C | 800 | 3 | 3 | 2
Product D | 800 | 4 | 3 | 2
Product E | 600 | 5 | 5 | 3
- ROW_NUMBER(): Always unique, arbitrary ordering for ties
- RANK(): Same rank for ties, skips subsequent ranks
- DENSE_RANK(): Same rank for ties, no gaps in ranking sequence
Advanced Pattern: Running Totals with Row Identification
Combining ROW_NUMBER() with other window functions for complex analytics:
SELECT
order_date,
order_amount,
ROW_NUMBER() OVER (ORDER BY order_date) as order_sequence,
SUM(order_amount) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as running_total,
AVG(order_amount) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as moving_avg_7days
FROM orders
WHERE order_date >= '2024-01-01';
Performance Considerations
ROW_NUMBER() operations can be resource-intensive. Several optimization strategies:
Index the ORDER BY columns:
CREATE INDEX idx_employees_dept_salary
ON employees(department, salary DESC);
Limit the dataset before applying ROW_NUMBER():
WITH filtered_data AS (
SELECT employee_id, department, salary
FROM employees
WHERE hire_date >= '2023-01-01'
AND status = 'Active'
)
SELECT
employee_id,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rn
FROM filtered_data;
Use appropriate WHERE clauses after ROW_NUMBER():
-- Efficient: Filter immediately after row numbering
WITH ranked AS (
SELECT
employee_id,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) as rn
FROM employees
)
SELECT employee_id, salary
FROM ranked
WHERE rn <= 10; -- Top 10 only
Practical Use Case: Time-Series Gap Detection
Identifying gaps in sequential data:
WITH numbered_transactions AS (
SELECT
transaction_id,
transaction_date,
ROW_NUMBER() OVER (ORDER BY transaction_date) as rn,
DATEADD(day, -ROW_NUMBER() OVER (ORDER BY transaction_date), transaction_date) as group_date
FROM transactions
WHERE account_id = 12345
),
gaps AS (
SELECT
group_date,
MIN(transaction_date) as period_start,
MAX(transaction_date) as period_end,
COUNT(*) as consecutive_days
FROM numbered_transactions
GROUP BY group_date
)
SELECT
period_end as gap_before,
LEAD(period_start) OVER (ORDER BY period_start) as gap_after,
DATEDIFF(day, period_end, LEAD(period_start) OVER (ORDER BY period_start)) - 1 as gap_days
FROM gaps
WHERE LEAD(period_start) OVER (ORDER BY period_start) IS NOT NULL
AND DATEDIFF(day, period_end, LEAD(period_start) OVER (ORDER BY period_start)) > 1;
ROW_NUMBER() provides deterministic row identification essential for complex data manipulation, pagination, and analytical queries. Master its interaction with PARTITION BY and ORDER BY clauses to unlock powerful data transformation patterns.