SQL Window Functions: ROW_NUMBER, RANK, and PARTITION BY

Window functions calculate values across sets of rows while keeping each row intact. Unlike GROUP BY, which collapses rows into summary groups, window functions add computed columns to your existing...

Key Insights

  • Window functions perform calculations across rows without collapsing them like GROUP BY does, letting you add aggregations and rankings while preserving individual row details
  • ROW_NUMBER assigns unique sequential numbers regardless of duplicates, while RANK and DENSE_RANK handle ties differently—RANK leaves gaps in numbering after ties, DENSE_RANK doesn’t
  • PARTITION BY creates independent calculation windows within your result set, enabling per-group rankings, running totals, and comparisons without multiple subqueries

Understanding Window Functions vs GROUP BY

Window functions calculate values across sets of rows while keeping each row intact. Unlike GROUP BY, which collapses rows into summary groups, window functions add computed columns to your existing rows.

-- GROUP BY: Collapses to one row per department
SELECT department, COUNT(*) as employee_count
FROM employees
GROUP BY department;

-- Window function: Keeps all rows, adds count to each
SELECT 
    employee_id,
    name,
    department,
    COUNT(*) OVER (PARTITION BY department) as dept_employee_count
FROM employees;

The GROUP BY query returns one row per department. The window function returns every employee row with an additional column showing how many employees are in their department. This distinction is fundamental—window functions let you have your aggregation and your detail rows too.

ROW_NUMBER(): Assigning Sequential Numbers

ROW_NUMBER() assigns a unique integer to each row within a partition, starting at 1. It always produces distinct values, even when the underlying data contains duplicates.

Basic syntax:

SELECT 
    product_name,
    price,
    ROW_NUMBER() OVER (ORDER BY price DESC) as row_num
FROM products;

This assigns sequential numbers based on price, with the most expensive product getting 1.

Pagination with ROW_NUMBER

ROW_NUMBER excels at pagination, especially in databases that don’t support OFFSET/LIMIT:

WITH numbered_results AS (
    SELECT 
        *,
        ROW_NUMBER() OVER (ORDER BY created_at DESC) as rn
    FROM articles
)
SELECT article_id, title, created_at
FROM numbered_results
WHERE rn BETWEEN 21 AND 30;  -- Page 3, 10 items per page

Finding Top N Per Category

One of the most practical uses is getting the top N records per group:

WITH ranked_products AS (
    SELECT 
        category,
        product_name,
        revenue,
        ROW_NUMBER() OVER (
            PARTITION BY category 
            ORDER BY revenue DESC
        ) as rank_in_category
    FROM product_sales
)
SELECT category, product_name, revenue
FROM ranked_products
WHERE rank_in_category <= 3;

This returns the top 3 revenue-generating products in each category. PARTITION BY resets the numbering for each category, while ORDER BY determines which products get lower numbers.

RANK() and DENSE_RANK(): Handling Ties Properly

When dealing with tied values, ROW_NUMBER arbitrarily assigns different numbers. RANK() and DENSE_RANK() handle ties explicitly but differently.

SELECT 
    student_name,
    score,
    ROW_NUMBER() OVER (ORDER BY score DESC) as row_num,
    RANK() OVER (ORDER BY score DESC) as rank,
    DENSE_RANK() OVER (ORDER BY score DESC) as dense_rank
FROM exam_results
ORDER BY score DESC;

Sample output:

student_name | score | row_num | rank | dense_rank
-------------|-------|---------|------|------------
Alice        | 95    | 1       | 1    | 1
Bob          | 95    | 2       | 1    | 1
Carol        | 90    | 3       | 3    | 2
David        | 90    | 4       | 3    | 2
Eve          | 85    | 5       | 5    | 3

Notice the difference: RANK() jumps from 1 to 3 (leaving a gap), while DENSE_RANK() goes from 1 to 2 (no gap). RANK() reflects that four students scored higher than Eve, while DENSE_RANK() shows there are three distinct score levels.

Real-World Leaderboard Example

For a sales leaderboard where you want to show actual competitive position:

SELECT 
    salesperson,
    total_sales,
    RANK() OVER (ORDER BY total_sales DESC) as position,
    DENSE_RANK() OVER (ORDER BY total_sales DESC) as tier
FROM monthly_sales
WHERE month = '2024-01'
ORDER BY total_sales DESC;

Use RANK() for position (reflects true competitive standing) and DENSE_RANK() for tier (grouping similar performers).

Top Salaries Per Department

WITH salary_rankings AS (
    SELECT 
        department,
        employee_name,
        salary,
        DENSE_RANK() OVER (
            PARTITION BY department 
            ORDER BY salary DESC
        ) as salary_rank
    FROM employees
)
SELECT department, employee_name, salary
FROM salary_rankings
WHERE salary_rank <= 3;

Using DENSE_RANK ensures that if three people tie for the highest salary, you still see the second-highest salary tier.

PARTITION BY: Creating Independent Windows

PARTITION BY divides your result set into groups, and window functions calculate independently within each group. Think of it as creating mini-result sets.

-- Row numbers reset per department
SELECT 
    department,
    employee_name,
    hire_date,
    ROW_NUMBER() OVER (
        PARTITION BY department 
        ORDER BY hire_date
    ) as seniority_rank
FROM employees;

Running Totals Per Category

SELECT 
    transaction_date,
    category,
    amount,
    SUM(amount) OVER (
        PARTITION BY category 
        ORDER BY transaction_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as running_total
FROM transactions
ORDER BY category, transaction_date;

This calculates separate running totals for each category. Without PARTITION BY, you’d get one running total across all categories.

Comparing to Partition Averages

SELECT 
    product_name,
    category,
    price,
    AVG(price) OVER (PARTITION BY category) as category_avg_price,
    price - AVG(price) OVER (PARTITION BY category) as price_vs_avg
FROM products;

Each row shows how its price compares to the average price in its category—useful for identifying outliers or premium products.

ORDER BY Within Window Functions

The ORDER BY clause inside OVER() serves two purposes: it determines ranking order and defines the window frame for accumulating functions.

-- ORDER BY affects which rows get which ranks
SELECT 
    order_date,
    order_total,
    ROW_NUMBER() OVER (ORDER BY order_date) as chronological,
    ROW_NUMBER() OVER (ORDER BY order_total DESC) as by_value
FROM orders;

For accumulating functions, ORDER BY defines the frame:

SELECT 
    sale_date,
    daily_revenue,
    SUM(daily_revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) as seven_day_total
FROM daily_sales;

Multiple Column Ordering

SELECT 
    region,
    salesperson,
    sales_amount,
    RANK() OVER (
        PARTITION BY region 
        ORDER BY sales_amount DESC, salesperson ASC
    ) as rank
FROM sales_data;

When sales_amount ties, salesperson name breaks the tie alphabetically.

Practical Use Cases

Deduplication: Keeping First or Last Occurrence

WITH numbered_records AS (
    SELECT 
        *,
        ROW_NUMBER() OVER (
            PARTITION BY email 
            ORDER BY created_at DESC
        ) as rn
    FROM user_signups
)
DELETE FROM user_signups
WHERE id IN (
    SELECT id FROM numbered_records WHERE rn > 1
);

This keeps the most recent signup for each email address.

Gap Detection in Sequential Data

WITH numbered_sequence AS (
    SELECT 
        invoice_number,
        ROW_NUMBER() OVER (ORDER BY invoice_number) as rn,
        invoice_number - ROW_NUMBER() OVER (ORDER BY invoice_number) as gap_group
    FROM invoices
)
SELECT 
    MIN(invoice_number) as gap_start,
    MAX(invoice_number) as gap_end
FROM numbered_sequence
GROUP BY gap_group
HAVING COUNT(*) > 1;

Identifies missing invoice numbers in a sequence.

Moving Averages with Partitions

SELECT 
    product_id,
    sale_date,
    daily_units,
    AVG(daily_units) OVER (
        PARTITION BY product_id
        ORDER BY sale_date
        ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
    ) as thirty_day_avg
FROM product_sales;

Calculates a 30-day moving average separately for each product.

Combining with LAG/LEAD

SELECT 
    employee_id,
    salary,
    hire_date,
    LAG(salary) OVER (ORDER BY hire_date) as previous_hire_salary,
    RANK() OVER (ORDER BY salary DESC) as salary_rank
FROM employees;

Shows each employee’s rank while also displaying what the previous hire was paid.

Performance Considerations

Window functions can be expensive. Here’s how to optimize them:

Index your ORDER BY and PARTITION BY columns. If you’re partitioning by department and ordering by salary, create an index on (department, salary).

CREATE INDEX idx_emp_dept_salary ON employees(department, salary DESC);

Avoid multiple passes when possible. Instead of separate queries for different window functions, combine them:

-- Good: Single pass
SELECT 
    employee_id,
    ROW_NUMBER() OVER (ORDER BY salary DESC) as rn,
    RANK() OVER (ORDER BY salary DESC) as rank
FROM employees;

-- Avoid: Multiple passes
SELECT employee_id, ROW_NUMBER() OVER (ORDER BY salary DESC) as rn FROM employees;
SELECT employee_id, RANK() OVER (ORDER BY salary DESC) as rank FROM employees;

Use CTEs or subqueries to filter early. Apply WHERE clauses before window functions when possible:

WITH recent_sales AS (
    SELECT * FROM sales WHERE sale_date >= '2024-01-01'
)
SELECT 
    product_id,
    RANK() OVER (PARTITION BY category ORDER BY revenue DESC) as rank
FROM recent_sales;

Consider materialized views for repeated calculations. If you’re constantly ranking the same dataset, materialize it.

Watch for memory usage with large partitions. Some databases load entire partitions into memory. Monitor your execution plans.

Window functions eliminate complex self-joins and correlated subqueries, making your SQL more readable and often more performant. Master ROW_NUMBER for pagination and deduplication, RANK/DENSE_RANK for competitive rankings, and PARTITION BY for per-group calculations. Combined, they solve most analytical SQL challenges you’ll encounter.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.