How to Use ROW_NUMBER in MySQL
ROW_NUMBER() is a window function introduced in MySQL 8.0 that assigns a unique sequential integer to each row within a result set. Unlike traditional aggregate functions that collapse rows, window...
Key Insights
- ROW_NUMBER() assigns unique sequential integers to rows within a result set, making it essential for pagination, deduplication, and ranking tasks in MySQL 8.0+
- The PARTITION BY clause resets row numbering for each group, enabling powerful per-category operations like finding top N records within departments or categories
- While ROW_NUMBER() is versatile, understanding when to use RANK() or DENSE_RANK() instead—and how indexing affects performance—separates effective implementations from slow ones
Introduction to ROW_NUMBER()
ROW_NUMBER() is a window function introduced in MySQL 8.0 that assigns a unique sequential integer to each row within a result set. Unlike traditional aggregate functions that collapse rows, window functions perform calculations across rows while maintaining the individual row structure.
The primary purpose of ROW_NUMBER() is to generate sequential numbering based on a specific ordering criteria. This seemingly simple capability unlocks powerful solutions for common database challenges: implementing efficient pagination, identifying top performers within categories, removing duplicate records, and creating ranked lists.
Before MySQL 8.0, developers had to resort to user-defined variables or complex subqueries to achieve similar results. The introduction of window functions brought MySQL in line with other enterprise databases and dramatically simplified these operations.
Basic Syntax and Simple Examples
The fundamental syntax for ROW_NUMBER() consists of two parts: the function itself and the OVER clause that defines how rows are ordered:
ROW_NUMBER() OVER (ORDER BY column_name [ASC|DESC])
The OVER clause is mandatory and specifies the window over which the function operates. At minimum, you need an ORDER BY within the OVER clause to determine the sequence of row numbering.
Here’s a straightforward example using an employees table:
SELECT
employee_id,
first_name,
last_name,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees;
This query assigns row numbers starting from 1, ordering employees by salary from highest to lowest. The employee with the highest salary gets row number 1, the second-highest gets 2, and so on. Even if two employees have identical salaries, each receives a unique row number—this deterministic behavior distinguishes ROW_NUMBER() from other ranking functions.
Using PARTITION BY for Grouped Numbering
The real power of ROW_NUMBER() emerges when you combine it with PARTITION BY. This clause divides the result set into groups, and row numbering resets to 1 for each partition.
The syntax expands to:
ROW_NUMBER() OVER (PARTITION BY column_name ORDER BY column_name)
Consider this example that numbers employees within each department:
SELECT
department_id,
employee_id,
first_name,
last_name,
hire_date,
ROW_NUMBER() OVER (
PARTITION BY department_id
ORDER BY hire_date ASC
) AS seniority_rank
FROM employees
ORDER BY department_id, seniority_rank;
This query assigns row numbers to employees based on their hire date, but the numbering resets for each department. The first person hired in Sales gets rank 1, and the first person hired in Engineering also gets rank 1. This partitioning capability is crucial for per-group analysis.
Practical Use Cases
Pagination
ROW_NUMBER() provides a robust alternative to LIMIT/OFFSET for pagination, especially when dealing with dynamic datasets or when you need consistent page boundaries.
WITH numbered_products AS (
SELECT
product_id,
product_name,
price,
category,
ROW_NUMBER() OVER (ORDER BY product_id) AS row_num
FROM products
)
SELECT
product_id,
product_name,
price,
category
FROM numbered_products
WHERE row_num BETWEEN 21 AND 30;
This query retrieves page 3 of results (assuming 10 items per page). The CTE (Common Table Expression) assigns row numbers to all products, then the outer query filters for rows 21-30. This approach is more stable than OFFSET when underlying data changes between page requests.
Finding Top N Records Per Group
One of the most common use cases is identifying the top N records within each category—highest-paid employees per department, best-selling products per region, or most recent orders per customer.
WITH ranked_employees AS (
SELECT
department_id,
employee_id,
first_name,
last_name,
salary,
ROW_NUMBER() OVER (
PARTITION BY department_id
ORDER BY salary DESC
) AS salary_rank
FROM employees
)
SELECT
department_id,
employee_id,
first_name,
last_name,
salary
FROM ranked_employees
WHERE salary_rank <= 3
ORDER BY department_id, salary_rank;
This query returns the three highest-paid employees in each department. The PARTITION BY ensures rankings are calculated independently for each department, and the WHERE clause filters for only the top 3 positions.
Removing Duplicates
ROW_NUMBER() excels at identifying and removing duplicate records while retaining control over which record to keep.
WITH duplicate_customers AS (
SELECT
customer_id,
email,
created_at,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY created_at DESC
) AS row_num
FROM customers
)
DELETE FROM customers
WHERE customer_id IN (
SELECT customer_id
FROM duplicate_customers
WHERE row_num > 1
);
This query identifies duplicate customers by email address and deletes all but the most recent record (highest created_at timestamp). The PARTITION BY email groups duplicates together, and ORDER BY created_at DESC ensures the most recent record gets row_num = 1.
ROW_NUMBER() vs RANK() vs DENSE_RANK()
MySQL provides three ranking window functions, each handling tied values differently. Understanding these differences is critical for choosing the right function.
SELECT
player_name,
score,
ROW_NUMBER() OVER (ORDER BY score DESC) AS row_number,
RANK() OVER (ORDER BY score DESC) AS rank,
DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank
FROM leaderboard
ORDER BY score DESC;
Given these scores:
- Alice: 100
- Bob: 95
- Carol: 95
- David: 90
The results would be:
| player_name | score | row_number | rank | dense_rank |
|---|---|---|---|---|
| Alice | 100 | 1 | 1 | 1 |
| Bob | 95 | 2 | 2 | 2 |
| Carol | 95 | 3 | 2 | 2 |
| David | 90 | 4 | 4 | 3 |
ROW_NUMBER() assigns unique numbers regardless of ties. Bob and Carol both get different row numbers even though they have the same score.
RANK() assigns the same rank to tied values but skips subsequent ranks. Both Bob and Carol get rank 2, so David gets rank 4 (skipping rank 3).
DENSE_RANK() also assigns the same rank to ties but doesn’t skip ranks. Bob and Carol get rank 2, and David gets rank 3.
Use ROW_NUMBER() when you need unique identifiers for every row, RANK() for traditional competition-style rankings, and DENSE_RANK() when you want consecutive rank values without gaps.
Performance Considerations and Best Practices
Window functions can be resource-intensive on large datasets. Here are key performance considerations:
Indexing matters. Create indexes on columns used in ORDER BY and PARTITION BY clauses. For the employees example, an index on (department_id, salary DESC) would significantly improve performance:
CREATE INDEX idx_dept_salary ON employees(department_id, salary DESC);
ROW_NUMBER() vs LIMIT/OFFSET. For simple pagination on static datasets, LIMIT/OFFSET is faster. Use ROW_NUMBER() when you need:
- Consistent results across multiple queries
- Complex partitioning logic
- To combine pagination with other window functions
Avoid unnecessary sorting. If your data is already ordered by the clustering key, align your ORDER BY clause with that ordering to leverage existing sort order.
Consider materialized results. For frequently accessed rankings that don’t change often, consider storing ROW_NUMBER() results in a materialized table and refreshing periodically rather than calculating on every query.
Watch memory usage. Window functions may require temporary tables for sorting and partitioning. Monitor your tmp_table_size and max_heap_table_size settings for large result sets.
Conclusion
ROW_NUMBER() is an indispensable tool in modern MySQL development. It simplifies complex ranking and numbering operations that previously required convoluted subqueries or application-level processing. Master the basic syntax with ORDER BY, leverage PARTITION BY for grouped operations, and understand the distinctions between ROW_NUMBER(), RANK(), and DENSE_RANK() to choose the right tool for each scenario.
Start with simple examples in your development environment, then progress to more complex use cases like pagination and deduplication. Pay attention to performance implications on production datasets, and always index appropriately. The investment in learning window functions pays dividends in cleaner, more maintainable SQL code.