SQL Window Functions: ROW_NUMBER, RANK, and PARTITION BY
Window functions calculate values across sets of rows while keeping each row intact. Unlike GROUP BY, which collapses rows into summary groups, window functions add computed columns to your existing...
Key Insights
- Window functions perform calculations across rows without collapsing them like GROUP BY does, letting you add aggregations and rankings while preserving individual row details
- ROW_NUMBER assigns unique sequential numbers regardless of duplicates, while RANK and DENSE_RANK handle ties differently—RANK leaves gaps in numbering after ties, DENSE_RANK doesn’t
- PARTITION BY creates independent calculation windows within your result set, enabling per-group rankings, running totals, and comparisons without multiple subqueries
Understanding Window Functions vs GROUP BY
Window functions calculate values across sets of rows while keeping each row intact. Unlike GROUP BY, which collapses rows into summary groups, window functions add computed columns to your existing rows.
-- GROUP BY: Collapses to one row per department
SELECT department, COUNT(*) as employee_count
FROM employees
GROUP BY department;
-- Window function: Keeps all rows, adds count to each
SELECT
employee_id,
name,
department,
COUNT(*) OVER (PARTITION BY department) as dept_employee_count
FROM employees;
The GROUP BY query returns one row per department. The window function returns every employee row with an additional column showing how many employees are in their department. This distinction is fundamental—window functions let you have your aggregation and your detail rows too.
ROW_NUMBER(): Assigning Sequential Numbers
ROW_NUMBER() assigns a unique integer to each row within a partition, starting at 1. It always produces distinct values, even when the underlying data contains duplicates.
Basic syntax:
SELECT
product_name,
price,
ROW_NUMBER() OVER (ORDER BY price DESC) as row_num
FROM products;
This assigns sequential numbers based on price, with the most expensive product getting 1.
Pagination with ROW_NUMBER
ROW_NUMBER excels at pagination, especially in databases that don’t support OFFSET/LIMIT:
WITH numbered_results AS (
SELECT
*,
ROW_NUMBER() OVER (ORDER BY created_at DESC) as rn
FROM articles
)
SELECT article_id, title, created_at
FROM numbered_results
WHERE rn BETWEEN 21 AND 30; -- Page 3, 10 items per page
Finding Top N Per Category
One of the most practical uses is getting the top N records per group:
WITH ranked_products AS (
SELECT
category,
product_name,
revenue,
ROW_NUMBER() OVER (
PARTITION BY category
ORDER BY revenue DESC
) as rank_in_category
FROM product_sales
)
SELECT category, product_name, revenue
FROM ranked_products
WHERE rank_in_category <= 3;
This returns the top 3 revenue-generating products in each category. PARTITION BY resets the numbering for each category, while ORDER BY determines which products get lower numbers.
RANK() and DENSE_RANK(): Handling Ties Properly
When dealing with tied values, ROW_NUMBER arbitrarily assigns different numbers. RANK() and DENSE_RANK() handle ties explicitly but differently.
SELECT
student_name,
score,
ROW_NUMBER() OVER (ORDER BY score DESC) as row_num,
RANK() OVER (ORDER BY score DESC) as rank,
DENSE_RANK() OVER (ORDER BY score DESC) as dense_rank
FROM exam_results
ORDER BY score DESC;
Sample output:
student_name | score | row_num | rank | dense_rank
-------------|-------|---------|------|------------
Alice | 95 | 1 | 1 | 1
Bob | 95 | 2 | 1 | 1
Carol | 90 | 3 | 3 | 2
David | 90 | 4 | 3 | 2
Eve | 85 | 5 | 5 | 3
Notice the difference: RANK() jumps from 1 to 3 (leaving a gap), while DENSE_RANK() goes from 1 to 2 (no gap). RANK() reflects that four students scored higher than Eve, while DENSE_RANK() shows there are three distinct score levels.
Real-World Leaderboard Example
For a sales leaderboard where you want to show actual competitive position:
SELECT
salesperson,
total_sales,
RANK() OVER (ORDER BY total_sales DESC) as position,
DENSE_RANK() OVER (ORDER BY total_sales DESC) as tier
FROM monthly_sales
WHERE month = '2024-01'
ORDER BY total_sales DESC;
Use RANK() for position (reflects true competitive standing) and DENSE_RANK() for tier (grouping similar performers).
Top Salaries Per Department
WITH salary_rankings AS (
SELECT
department,
employee_name,
salary,
DENSE_RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) as salary_rank
FROM employees
)
SELECT department, employee_name, salary
FROM salary_rankings
WHERE salary_rank <= 3;
Using DENSE_RANK ensures that if three people tie for the highest salary, you still see the second-highest salary tier.
PARTITION BY: Creating Independent Windows
PARTITION BY divides your result set into groups, and window functions calculate independently within each group. Think of it as creating mini-result sets.
-- Row numbers reset per department
SELECT
department,
employee_name,
hire_date,
ROW_NUMBER() OVER (
PARTITION BY department
ORDER BY hire_date
) as seniority_rank
FROM employees;
Running Totals Per Category
SELECT
transaction_date,
category,
amount,
SUM(amount) OVER (
PARTITION BY category
ORDER BY transaction_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as running_total
FROM transactions
ORDER BY category, transaction_date;
This calculates separate running totals for each category. Without PARTITION BY, you’d get one running total across all categories.
Comparing to Partition Averages
SELECT
product_name,
category,
price,
AVG(price) OVER (PARTITION BY category) as category_avg_price,
price - AVG(price) OVER (PARTITION BY category) as price_vs_avg
FROM products;
Each row shows how its price compares to the average price in its category—useful for identifying outliers or premium products.
ORDER BY Within Window Functions
The ORDER BY clause inside OVER() serves two purposes: it determines ranking order and defines the window frame for accumulating functions.
-- ORDER BY affects which rows get which ranks
SELECT
order_date,
order_total,
ROW_NUMBER() OVER (ORDER BY order_date) as chronological,
ROW_NUMBER() OVER (ORDER BY order_total DESC) as by_value
FROM orders;
For accumulating functions, ORDER BY defines the frame:
SELECT
sale_date,
daily_revenue,
SUM(daily_revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as seven_day_total
FROM daily_sales;
Multiple Column Ordering
SELECT
region,
salesperson,
sales_amount,
RANK() OVER (
PARTITION BY region
ORDER BY sales_amount DESC, salesperson ASC
) as rank
FROM sales_data;
When sales_amount ties, salesperson name breaks the tie alphabetically.
Practical Use Cases
Deduplication: Keeping First or Last Occurrence
WITH numbered_records AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY created_at DESC
) as rn
FROM user_signups
)
DELETE FROM user_signups
WHERE id IN (
SELECT id FROM numbered_records WHERE rn > 1
);
This keeps the most recent signup for each email address.
Gap Detection in Sequential Data
WITH numbered_sequence AS (
SELECT
invoice_number,
ROW_NUMBER() OVER (ORDER BY invoice_number) as rn,
invoice_number - ROW_NUMBER() OVER (ORDER BY invoice_number) as gap_group
FROM invoices
)
SELECT
MIN(invoice_number) as gap_start,
MAX(invoice_number) as gap_end
FROM numbered_sequence
GROUP BY gap_group
HAVING COUNT(*) > 1;
Identifies missing invoice numbers in a sequence.
Moving Averages with Partitions
SELECT
product_id,
sale_date,
daily_units,
AVG(daily_units) OVER (
PARTITION BY product_id
ORDER BY sale_date
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
) as thirty_day_avg
FROM product_sales;
Calculates a 30-day moving average separately for each product.
Combining with LAG/LEAD
SELECT
employee_id,
salary,
hire_date,
LAG(salary) OVER (ORDER BY hire_date) as previous_hire_salary,
RANK() OVER (ORDER BY salary DESC) as salary_rank
FROM employees;
Shows each employee’s rank while also displaying what the previous hire was paid.
Performance Considerations
Window functions can be expensive. Here’s how to optimize them:
Index your ORDER BY and PARTITION BY columns. If you’re partitioning by department and ordering by salary, create an index on (department, salary).
CREATE INDEX idx_emp_dept_salary ON employees(department, salary DESC);
Avoid multiple passes when possible. Instead of separate queries for different window functions, combine them:
-- Good: Single pass
SELECT
employee_id,
ROW_NUMBER() OVER (ORDER BY salary DESC) as rn,
RANK() OVER (ORDER BY salary DESC) as rank
FROM employees;
-- Avoid: Multiple passes
SELECT employee_id, ROW_NUMBER() OVER (ORDER BY salary DESC) as rn FROM employees;
SELECT employee_id, RANK() OVER (ORDER BY salary DESC) as rank FROM employees;
Use CTEs or subqueries to filter early. Apply WHERE clauses before window functions when possible:
WITH recent_sales AS (
SELECT * FROM sales WHERE sale_date >= '2024-01-01'
)
SELECT
product_id,
RANK() OVER (PARTITION BY category ORDER BY revenue DESC) as rank
FROM recent_sales;
Consider materialized views for repeated calculations. If you’re constantly ranking the same dataset, materialize it.
Watch for memory usage with large partitions. Some databases load entire partitions into memory. Monitor your execution plans.
Window functions eliminate complex self-joins and correlated subqueries, making your SQL more readable and often more performant. Master ROW_NUMBER for pagination and deduplication, RANK/DENSE_RANK for competitive rankings, and PARTITION BY for per-group calculations. Combined, they solve most analytical SQL challenges you’ll encounter.