How to Use ROW_NUMBER in PostgreSQL

Window functions in PostgreSQL perform calculations across sets of rows related to the current row, without collapsing the result set like aggregate functions do. ROW_NUMBER() is one of the most...

Key Insights

  • ROW_NUMBER() assigns unique sequential integers to rows within a result set, making it essential for pagination, deduplication, and ranking tasks in PostgreSQL
  • The PARTITION BY clause resets row numbering for each group, enabling powerful per-category operations like finding top N records within each department or region
  • Unlike RANK() and DENSE_RANK(), ROW_NUMBER() always assigns unique numbers even when values are identical, which is critical for eliminating duplicates and ensuring deterministic ordering

Introduction to ROW_NUMBER()

Window functions in PostgreSQL perform calculations across sets of rows related to the current row, without collapsing the result set like aggregate functions do. ROW_NUMBER() is one of the most frequently used window functions, assigning a unique sequential integer to each row within a partition of a result set.

The basic syntax is straightforward:

ROW_NUMBER() OVER (
    [PARTITION BY partition_expression]
    ORDER BY sort_expression
)

The OVER() clause is mandatory and defines how rows are organized before numbering. You’ll use ROW_NUMBER() when you need to assign unique identifiers to rows, paginate results, remove duplicates, or find top N records per group.

Here’s a simple example using an employees table:

CREATE TABLE employees (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    department VARCHAR(50),
    salary NUMERIC(10, 2),
    hire_date DATE
);

INSERT INTO employees (name, department, salary, hire_date) VALUES
    ('Alice Johnson', 'Engineering', 95000, '2020-01-15'),
    ('Bob Smith', 'Engineering', 87000, '2021-03-22'),
    ('Carol White', 'Sales', 78000, '2019-07-10'),
    ('David Brown', 'Sales', 82000, '2020-11-05'),
    ('Eve Davis', 'Marketing', 72000, '2021-06-18');

SELECT 
    name,
    department,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) as row_num
FROM employees;

This query assigns sequential numbers based on salary, from highest to lowest.

Basic ROW_NUMBER() Usage

The ORDER BY clause within the OVER() specification determines the sequence of row numbering. This is separate from any ORDER BY clause applied to the final result set.

SELECT 
    name,
    department,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) as salary_rank
FROM employees
ORDER BY salary_rank;

Output:

name           | department   | salary  | salary_rank
---------------|--------------|---------|------------
Alice Johnson  | Engineering  | 95000   | 1
Bob Smith      | Engineering  | 87000   | 2
David Brown    | Sales        | 82000   | 3
Carol White    | Sales        | 78000   | 4
Eve Davis      | Marketing    | 72000   | 5

The OVER() clause is where the magic happens. It creates a “window” of rows over which the function operates. Without PARTITION BY, the entire result set is treated as one partition, so ROW_NUMBER() counts from 1 to the total number of rows.

You can order by multiple columns:

SELECT 
    name,
    department,
    hire_date,
    ROW_NUMBER() OVER (ORDER BY hire_date, name) as hire_sequence
FROM employees;

This numbers employees by hire date, using name as a tiebreaker.

Partitioning with PARTITION BY

PARTITION BY divides your result set into groups, and ROW_NUMBER() restarts counting at 1 for each group. This is incredibly useful for per-category analysis.

SELECT 
    name,
    department,
    salary,
    ROW_NUMBER() OVER (
        PARTITION BY department 
        ORDER BY salary DESC
    ) as dept_rank
FROM employees
ORDER BY department, dept_rank;

Output:

name           | department   | salary  | dept_rank
---------------|--------------|---------|----------
Alice Johnson  | Engineering  | 95000   | 1
Bob Smith      | Engineering  | 87000   | 2
Eve Davis      | Marketing    | 72000   | 1
David Brown    | Sales        | 82000   | 1
Carol White    | Sales        | 78000   | 2

Notice how dept_rank resets to 1 for each department. This pattern is essential for finding top performers per department, most recent orders per customer, or latest records per category.

Practical Applications

Deduplicating Data

ROW_NUMBER() excels at identifying and removing duplicate records. Suppose you have duplicate employee entries:

-- Create table with duplicates
CREATE TABLE employees_dupes AS
SELECT * FROM employees
UNION ALL
SELECT * FROM employees WHERE department = 'Sales';

-- Identify duplicates
SELECT 
    *,
    ROW_NUMBER() OVER (
        PARTITION BY name, department, salary 
        ORDER BY id
    ) as row_num
FROM employees_dupes;

-- Remove duplicates using CTE
WITH ranked_employees AS (
    SELECT 
        id,
        ROW_NUMBER() OVER (
            PARTITION BY name, department, salary 
            ORDER BY id
        ) as row_num
    FROM employees_dupes
)
DELETE FROM employees_dupes
WHERE id IN (
    SELECT id 
    FROM ranked_employees 
    WHERE row_num > 1
);

This approach keeps the first occurrence (lowest id) and deletes subsequent duplicates.

Implementing Pagination

ROW_NUMBER() provides an alternative to OFFSET/LIMIT that can perform better in certain scenarios:

-- Page 2, with 2 records per page
WITH numbered_employees AS (
    SELECT 
        *,
        ROW_NUMBER() OVER (ORDER BY name) as row_num
    FROM employees
)
SELECT name, department, salary
FROM numbered_employees
WHERE row_num BETWEEN 3 AND 4;

This is particularly useful when you need to know the total count alongside paginated results:

WITH numbered_employees AS (
    SELECT 
        *,
        ROW_NUMBER() OVER (ORDER BY name) as row_num,
        COUNT(*) OVER () as total_count
    FROM employees
)
SELECT name, department, salary, row_num, total_count
FROM numbered_employees
WHERE row_num BETWEEN 3 AND 4;

Finding Top N Records Per Group

Getting the top 3 highest-paid employees per department:

WITH ranked_employees AS (
    SELECT 
        name,
        department,
        salary,
        ROW_NUMBER() OVER (
            PARTITION BY department 
            ORDER BY salary DESC
        ) as rank
    FROM employees
)
SELECT name, department, salary
FROM ranked_employees
WHERE rank <= 3;

ROW_NUMBER() vs RANK() vs DENSE_RANK()

These three window functions behave differently when handling ties:

CREATE TABLE scores (
    player VARCHAR(50),
    score INTEGER
);

INSERT INTO scores VALUES
    ('Alice', 100),
    ('Bob', 95),
    ('Carol', 95),
    ('David', 90),
    ('Eve', 85);

SELECT 
    player,
    score,
    ROW_NUMBER() OVER (ORDER BY score DESC) as row_num,
    RANK() OVER (ORDER BY score DESC) as rank,
    DENSE_RANK() OVER (ORDER BY score DESC) as dense_rank
FROM scores;

Output:

player | score | row_num | rank | dense_rank
-------|-------|---------|------|------------
Alice  | 100   | 1       | 1    | 1
Bob    | 95    | 2       | 2    | 2
Carol  | 95    | 3       | 2    | 2
David  | 90    | 4       | 4    | 3
Eve    | 85    | 5       | 5    | 4

ROW_NUMBER() always assigns unique numbers, even for ties. Bob and Carol both get different numbers despite having the same score.

RANK() assigns the same number to ties but skips subsequent numbers. Both Bob and Carol get rank 2, then David gets rank 4 (skipping 3).

DENSE_RANK() assigns the same number to ties without skipping. Bob and Carol get rank 2, and David gets rank 3.

Use ROW_NUMBER() when you need unique identifiers regardless of ties, RANK() for traditional competition-style rankings, and DENSE_RANK() when you want consecutive rank numbers without gaps.

Performance Considerations

ROW_NUMBER() requires sorting data, which can be expensive on large datasets. PostgreSQL must materialize and sort the entire partition before assigning numbers.

To optimize performance:

Use indexes on ORDER BY columns:

CREATE INDEX idx_employees_salary ON employees(salary DESC);
CREATE INDEX idx_employees_dept_salary ON employees(department, salary DESC);

Limit the window size when possible:

-- Instead of numbering all rows then filtering
WITH numbered AS (
    SELECT *, ROW_NUMBER() OVER (ORDER BY salary DESC) as rn
    FROM employees
)
SELECT * FROM numbered WHERE rn <= 10;

-- Consider limiting the input first if you have WHERE conditions
WITH numbered AS (
    SELECT *, ROW_NUMBER() OVER (ORDER BY salary DESC) as rn
    FROM employees
    WHERE department = 'Engineering'  -- Filter before windowing
)
SELECT * FROM numbered WHERE rn <= 10;

Use subqueries strategically:

-- Efficient: Filter first, then apply window function
SELECT 
    name,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) as rank
FROM (
    SELECT name, salary
    FROM employees
    WHERE hire_date > '2020-01-01'
) recent_hires;

For very large datasets, monitor query plans with EXPLAIN ANALYZE to ensure PostgreSQL uses indexes effectively and doesn’t perform unnecessary sorts.

Conclusion

ROW_NUMBER() is an indispensable tool in PostgreSQL for assigning sequential numbers to rows. Master these key use cases:

  • Pagination: Alternative to OFFSET/LIMIT with better performance characteristics in some scenarios
  • Deduplication: Identify and remove duplicate records efficiently
  • Top N per group: Find highest/lowest values within categories using PARTITION BY
  • Unique identifiers: Assign sequential IDs to query results

Quick reference syntax:

ROW_NUMBER() OVER (
    [PARTITION BY column1, column2, ...]
    ORDER BY column3 [ASC|DESC], ...
)

Remember that ROW_NUMBER() always produces unique values, making it distinct from RANK() and DENSE_RANK(). Choose the right function based on whether you need to handle ties and whether gaps in numbering matter for your use case.

The combination of PARTITION BY and ORDER BY gives you fine-grained control over how rows are numbered, enabling sophisticated analytical queries that would be cumbersome or impossible with traditional GROUP BY aggregations.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.