How to Use DISTINCT ON in PostgreSQL

Think of it as 'group by these columns, but give me the whole row, not aggregates.'

Key Insights

  • DISTINCT ON is a PostgreSQL-specific extension that returns the first row from each group defined by specified columns, making it more efficient than window functions for simple “latest per group” queries
  • The ORDER BY clause must list DISTINCT ON columns first, or PostgreSQL will reject the query—this isn’t optional, it’s a syntax requirement
  • Proper indexing on DISTINCT ON and ORDER BY columns is critical for performance; without it, you’ll hit sequential scans on large tables

Understanding DISTINCT ON

DISTINCT ON is PostgreSQL’s answer to a common problem: getting one row per group based on some ordering criteria. While standard SQL’s DISTINCT removes duplicate rows across all columns, DISTINCT ON lets you specify which columns define uniqueness, then pick the “best” row from each group.

Think of it as “group by these columns, but give me the whole row, not aggregates.”

Here’s the difference in action:

-- Standard DISTINCT: removes duplicate combinations of ALL columns
SELECT DISTINCT customer_id, order_date, amount
FROM orders;

-- DISTINCT ON: one row per customer_id (the first after ordering)
SELECT DISTINCT ON (customer_id) customer_id, order_date, amount
FROM orders
ORDER BY customer_id, order_date DESC;

The first query only removes rows where all three columns match. The second returns one row per customer—specifically, their most recent order.

Basic Syntax and Ordering Requirements

The syntax is straightforward, but PostgreSQL enforces a strict rule:

SELECT DISTINCT ON (expression1, expression2, ...)
    column_list
FROM table_name
ORDER BY expression1 [ASC|DESC], expression2 [ASC|DESC], ...

Critical rule: The ORDER BY clause must begin with the same expressions in DISTINCT ON, in the same order. PostgreSQL needs this to determine which row is “first” in each group.

Let’s see practical examples:

-- Get the most recent order for each customer
SELECT DISTINCT ON (customer_id)
    customer_id,
    order_id,
    order_date,
    total_amount
FROM orders
ORDER BY customer_id, order_date DESC;

This returns one row per customer—their latest order. The ORDER BY customer_id, order_date DESC sorts by customer first (required), then by date descending (giving us the latest).

Here’s another common pattern:

-- Find the highest-paid employee in each department
SELECT DISTINCT ON (department_id)
    department_id,
    employee_id,
    first_name,
    last_name,
    salary
FROM employees
ORDER BY department_id, salary DESC;

Without DISTINCT ON, you’d need a window function or a correlated subquery. This is cleaner and often faster.

Common Use Cases and Patterns

Latest Status Per Entity

This is the killer use case. You have a history table and need current state:

-- Get the most recent status for each user account
SELECT DISTINCT ON (user_id)
    user_id,
    status,
    status_changed_at,
    changed_by
FROM user_status_history
ORDER BY user_id, status_changed_at DESC;

First Event Per Time Period

Deduplication with temporal grouping:

-- Find each user's first login per day
SELECT DISTINCT ON (user_id, DATE(login_timestamp))
    user_id,
    DATE(login_timestamp) as login_date,
    login_timestamp,
    ip_address
FROM user_logins
ORDER BY user_id, DATE(login_timestamp), login_timestamp ASC;

Notice we’re using DISTINCT ON with an expression (DATE(login_timestamp)), not just a column name. This works perfectly.

Current Price Lookup

E-commerce systems often store price history:

-- Get current price for each product
SELECT DISTINCT ON (product_id)
    product_id,
    price,
    effective_from,
    currency
FROM product_price_history
WHERE effective_from <= CURRENT_DATE
ORDER BY product_id, effective_from DESC;

The WHERE clause filters to prices that have taken effect, then we grab the most recent one per product.

Working with Multiple Columns

DISTINCT ON accepts multiple columns, creating groups based on unique combinations:

-- Latest transaction per customer per product category
SELECT DISTINCT ON (customer_id, category_id)
    customer_id,
    category_id,
    transaction_id,
    transaction_date,
    amount
FROM transactions t
JOIN products p ON t.product_id = p.product_id
ORDER BY customer_id, category_id, transaction_date DESC;

This returns one row for each customer-category pair. A customer who bought from three categories gets three rows.

The ordering matters beyond just the DISTINCT ON columns:

-- If transaction_date is the same, use amount as tiebreaker
SELECT DISTINCT ON (customer_id, category_id)
    customer_id,
    category_id,
    transaction_id,
    transaction_date,
    amount
FROM transactions t
JOIN products p ON t.product_id = p.product_id
ORDER BY customer_id, category_id, transaction_date DESC, amount DESC;

Always think about tiebreakers. Without them, you get non-deterministic results.

Performance Considerations

DISTINCT ON performs well with proper indexing. The database needs to:

  1. Sort rows by the ORDER BY clause
  2. Scan through keeping only the first row per group

Create indexes that match your ORDER BY:

-- For our customer orders query
CREATE INDEX idx_orders_customer_date 
ON orders (customer_id, order_date DESC);

-- For status history lookup
CREATE INDEX idx_status_history_user_time
ON user_status_history (user_id, status_changed_at DESC);

Let’s compare performance with a window function alternative:

-- DISTINCT ON approach
EXPLAIN ANALYZE
SELECT DISTINCT ON (customer_id)
    customer_id, order_id, order_date, total_amount
FROM orders
ORDER BY customer_id, order_date DESC;

-- Window function approach
EXPLAIN ANALYZE
SELECT customer_id, order_id, order_date, total_amount
FROM (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) as rn
    FROM orders
) sub
WHERE rn = 1;

On my test database with 1M orders and proper indexing:

  • DISTINCT ON: 145ms, Index Scan
  • Window function: 312ms, Sequential Scan + Sort

DISTINCT ON can use an index directly. Window functions typically require scanning all rows, computing row numbers, then filtering. For simple “one per group” queries, DISTINCT ON wins.

However, if you need the top 3 per group, window functions become necessary:

-- Top 3 orders per customer: window functions required
SELECT customer_id, order_id, order_date, total_amount
FROM (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) as rn
    FROM orders
) sub
WHERE rn <= 3;

Alternatives and When to Use Each

DISTINCT ON vs ROW_NUMBER()

Side-by-side comparison for the same result:

-- DISTINCT ON: cleaner, faster for "first per group"
SELECT DISTINCT ON (department_id)
    department_id, employee_id, salary
FROM employees
ORDER BY department_id, salary DESC;

-- ROW_NUMBER(): more portable, required for top-N
SELECT department_id, employee_id, salary
FROM (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) as rn
    FROM employees
) sub
WHERE rn = 1;

Use DISTINCT ON when:

  • You need exactly one row per group
  • Performance matters
  • You’re committed to PostgreSQL

Use window functions when:

  • You need top-N per group (N > 1)
  • You need multiple ranking methods (RANK, DENSE_RANK)
  • You need database portability

Lateral Joins

For complex filtering per group:

-- Get latest order with amount > 100 per customer
SELECT c.customer_id, c.name, o.order_id, o.order_date, o.total_amount
FROM customers c
CROSS JOIN LATERAL (
    SELECT order_id, order_date, total_amount
    FROM orders
    WHERE customer_id = c.customer_id
      AND total_amount > 100
    ORDER BY order_date DESC
    LIMIT 1
) o;

This is more flexible than DISTINCT ON for complex per-group logic.

Common Pitfalls and Best Practices

Incomplete ORDER BY Causes Non-Deterministic Results

This is wrong:

-- BAD: If multiple orders have the same date, which one is returned?
SELECT DISTINCT ON (customer_id)
    customer_id, order_id, order_date, total_amount
FROM orders
ORDER BY customer_id, order_date DESC;

If two orders share the same customer and date, PostgreSQL picks arbitrarily. Fix it:

-- GOOD: Tiebreaker ensures deterministic results
SELECT DISTINCT ON (customer_id)
    customer_id, order_id, order_date, total_amount
FROM orders
ORDER BY customer_id, order_date DESC, order_id DESC;

Always include a unique column (like primary key) at the end of ORDER BY.

Forgetting the ORDER BY Requirement

This fails:

-- ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
SELECT DISTINCT ON (customer_id)
    customer_id, order_id, order_date
FROM orders
ORDER BY order_date DESC;

The ORDER BY must start with customer_id.

Portability Concerns

DISTINCT ON is PostgreSQL-only. MySQL, SQL Server, and Oracle don’t support it. If you might migrate databases, use window functions instead. But if you’re in the PostgreSQL ecosystem, use the tool designed for the job.

The bottom line: DISTINCT ON is a powerful, efficient tool for “first per group” queries. Master the ordering requirements, index appropriately, and you’ll write cleaner, faster queries than the alternatives.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.