How to Use DISTINCT ON in PostgreSQL
Think of it as 'group by these columns, but give me the whole row, not aggregates.'
Key Insights
DISTINCT ONis a PostgreSQL-specific extension that returns the first row from each group defined by specified columns, making it more efficient than window functions for simple “latest per group” queries- The
ORDER BYclause must listDISTINCT ONcolumns first, or PostgreSQL will reject the query—this isn’t optional, it’s a syntax requirement - Proper indexing on
DISTINCT ONandORDER BYcolumns is critical for performance; without it, you’ll hit sequential scans on large tables
Understanding DISTINCT ON
DISTINCT ON is PostgreSQL’s answer to a common problem: getting one row per group based on some ordering criteria. While standard SQL’s DISTINCT removes duplicate rows across all columns, DISTINCT ON lets you specify which columns define uniqueness, then pick the “best” row from each group.
Think of it as “group by these columns, but give me the whole row, not aggregates.”
Here’s the difference in action:
-- Standard DISTINCT: removes duplicate combinations of ALL columns
SELECT DISTINCT customer_id, order_date, amount
FROM orders;
-- DISTINCT ON: one row per customer_id (the first after ordering)
SELECT DISTINCT ON (customer_id) customer_id, order_date, amount
FROM orders
ORDER BY customer_id, order_date DESC;
The first query only removes rows where all three columns match. The second returns one row per customer—specifically, their most recent order.
Basic Syntax and Ordering Requirements
The syntax is straightforward, but PostgreSQL enforces a strict rule:
SELECT DISTINCT ON (expression1, expression2, ...)
column_list
FROM table_name
ORDER BY expression1 [ASC|DESC], expression2 [ASC|DESC], ...
Critical rule: The ORDER BY clause must begin with the same expressions in DISTINCT ON, in the same order. PostgreSQL needs this to determine which row is “first” in each group.
Let’s see practical examples:
-- Get the most recent order for each customer
SELECT DISTINCT ON (customer_id)
customer_id,
order_id,
order_date,
total_amount
FROM orders
ORDER BY customer_id, order_date DESC;
This returns one row per customer—their latest order. The ORDER BY customer_id, order_date DESC sorts by customer first (required), then by date descending (giving us the latest).
Here’s another common pattern:
-- Find the highest-paid employee in each department
SELECT DISTINCT ON (department_id)
department_id,
employee_id,
first_name,
last_name,
salary
FROM employees
ORDER BY department_id, salary DESC;
Without DISTINCT ON, you’d need a window function or a correlated subquery. This is cleaner and often faster.
Common Use Cases and Patterns
Latest Status Per Entity
This is the killer use case. You have a history table and need current state:
-- Get the most recent status for each user account
SELECT DISTINCT ON (user_id)
user_id,
status,
status_changed_at,
changed_by
FROM user_status_history
ORDER BY user_id, status_changed_at DESC;
First Event Per Time Period
Deduplication with temporal grouping:
-- Find each user's first login per day
SELECT DISTINCT ON (user_id, DATE(login_timestamp))
user_id,
DATE(login_timestamp) as login_date,
login_timestamp,
ip_address
FROM user_logins
ORDER BY user_id, DATE(login_timestamp), login_timestamp ASC;
Notice we’re using DISTINCT ON with an expression (DATE(login_timestamp)), not just a column name. This works perfectly.
Current Price Lookup
E-commerce systems often store price history:
-- Get current price for each product
SELECT DISTINCT ON (product_id)
product_id,
price,
effective_from,
currency
FROM product_price_history
WHERE effective_from <= CURRENT_DATE
ORDER BY product_id, effective_from DESC;
The WHERE clause filters to prices that have taken effect, then we grab the most recent one per product.
Working with Multiple Columns
DISTINCT ON accepts multiple columns, creating groups based on unique combinations:
-- Latest transaction per customer per product category
SELECT DISTINCT ON (customer_id, category_id)
customer_id,
category_id,
transaction_id,
transaction_date,
amount
FROM transactions t
JOIN products p ON t.product_id = p.product_id
ORDER BY customer_id, category_id, transaction_date DESC;
This returns one row for each customer-category pair. A customer who bought from three categories gets three rows.
The ordering matters beyond just the DISTINCT ON columns:
-- If transaction_date is the same, use amount as tiebreaker
SELECT DISTINCT ON (customer_id, category_id)
customer_id,
category_id,
transaction_id,
transaction_date,
amount
FROM transactions t
JOIN products p ON t.product_id = p.product_id
ORDER BY customer_id, category_id, transaction_date DESC, amount DESC;
Always think about tiebreakers. Without them, you get non-deterministic results.
Performance Considerations
DISTINCT ON performs well with proper indexing. The database needs to:
- Sort rows by the
ORDER BYclause - Scan through keeping only the first row per group
Create indexes that match your ORDER BY:
-- For our customer orders query
CREATE INDEX idx_orders_customer_date
ON orders (customer_id, order_date DESC);
-- For status history lookup
CREATE INDEX idx_status_history_user_time
ON user_status_history (user_id, status_changed_at DESC);
Let’s compare performance with a window function alternative:
-- DISTINCT ON approach
EXPLAIN ANALYZE
SELECT DISTINCT ON (customer_id)
customer_id, order_id, order_date, total_amount
FROM orders
ORDER BY customer_id, order_date DESC;
-- Window function approach
EXPLAIN ANALYZE
SELECT customer_id, order_id, order_date, total_amount
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) as rn
FROM orders
) sub
WHERE rn = 1;
On my test database with 1M orders and proper indexing:
DISTINCT ON: 145ms, Index Scan- Window function: 312ms, Sequential Scan + Sort
DISTINCT ON can use an index directly. Window functions typically require scanning all rows, computing row numbers, then filtering. For simple “one per group” queries, DISTINCT ON wins.
However, if you need the top 3 per group, window functions become necessary:
-- Top 3 orders per customer: window functions required
SELECT customer_id, order_id, order_date, total_amount
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) as rn
FROM orders
) sub
WHERE rn <= 3;
Alternatives and When to Use Each
DISTINCT ON vs ROW_NUMBER()
Side-by-side comparison for the same result:
-- DISTINCT ON: cleaner, faster for "first per group"
SELECT DISTINCT ON (department_id)
department_id, employee_id, salary
FROM employees
ORDER BY department_id, salary DESC;
-- ROW_NUMBER(): more portable, required for top-N
SELECT department_id, employee_id, salary
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) as rn
FROM employees
) sub
WHERE rn = 1;
Use DISTINCT ON when:
- You need exactly one row per group
- Performance matters
- You’re committed to PostgreSQL
Use window functions when:
- You need top-N per group (N > 1)
- You need multiple ranking methods (RANK, DENSE_RANK)
- You need database portability
Lateral Joins
For complex filtering per group:
-- Get latest order with amount > 100 per customer
SELECT c.customer_id, c.name, o.order_id, o.order_date, o.total_amount
FROM customers c
CROSS JOIN LATERAL (
SELECT order_id, order_date, total_amount
FROM orders
WHERE customer_id = c.customer_id
AND total_amount > 100
ORDER BY order_date DESC
LIMIT 1
) o;
This is more flexible than DISTINCT ON for complex per-group logic.
Common Pitfalls and Best Practices
Incomplete ORDER BY Causes Non-Deterministic Results
This is wrong:
-- BAD: If multiple orders have the same date, which one is returned?
SELECT DISTINCT ON (customer_id)
customer_id, order_id, order_date, total_amount
FROM orders
ORDER BY customer_id, order_date DESC;
If two orders share the same customer and date, PostgreSQL picks arbitrarily. Fix it:
-- GOOD: Tiebreaker ensures deterministic results
SELECT DISTINCT ON (customer_id)
customer_id, order_id, order_date, total_amount
FROM orders
ORDER BY customer_id, order_date DESC, order_id DESC;
Always include a unique column (like primary key) at the end of ORDER BY.
Forgetting the ORDER BY Requirement
This fails:
-- ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
SELECT DISTINCT ON (customer_id)
customer_id, order_id, order_date
FROM orders
ORDER BY order_date DESC;
The ORDER BY must start with customer_id.
Portability Concerns
DISTINCT ON is PostgreSQL-only. MySQL, SQL Server, and Oracle don’t support it. If you might migrate databases, use window functions instead. But if you’re in the PostgreSQL ecosystem, use the tool designed for the job.
The bottom line: DISTINCT ON is a powerful, efficient tool for “first per group” queries. Master the ordering requirements, index appropriately, and you’ll write cleaner, faster queries than the alternatives.