How to Use PARTITION BY in PostgreSQL

Key Insights

PARTITION BY divides query results into groups for window functions while preserving all rows, unlike GROUP BY which collapses rows into aggregated results
Window functions with PARTITION BY enable complex analytics like running totals, rankings within categories, and comparative analysis without self-joins or subqueries
Proper use of ORDER BY within the OVER clause and frame specifications (ROWS/RANGE) is critical for accurate calculations like moving averages and cumulative sums

Understanding Window Functions and PARTITION BY

Window functions perform calculations across sets of rows related to the current row, but unlike aggregate functions with GROUP BY, they don’t collapse your result set. This distinction is crucial for understanding when and why to use PARTITION BY.

Consider a sales table where you want to see each sale alongside the total sales for that product category. With GROUP BY, you lose individual sale details:

-- GROUP BY collapses rows
SELECT category, SUM(amount) as total_sales
FROM sales
GROUP BY category;

-- Returns only 3 rows for 3 categories

With PARTITION BY, you maintain all rows while adding aggregate context:

-- PARTITION BY preserves all rows
SELECT 
    sale_id,
    category,
    amount,
    SUM(amount) OVER (PARTITION BY category) as category_total
FROM sales;

-- Returns all sales rows, each with its category total

This capability makes PARTITION BY invaluable for analytical queries where you need both detail and aggregate information simultaneously.

Basic PARTITION BY Syntax

The fundamental syntax follows this pattern: function() OVER (PARTITION BY column_name). The OVER clause defines the window, and PARTITION BY specifies how to divide that window into groups.

Here’s how to number rows within each department:

SELECT 
    employee_id,
    department,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;

This assigns sequential numbers to employees within each department, ordered by salary. The first employee is 1, the second is 2, regardless of which department they’re in.

For running totals per department:

SELECT 
    transaction_date,
    department,
    amount,
    SUM(amount) OVER (
        PARTITION BY department 
        ORDER BY transaction_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as running_total
FROM transactions
ORDER BY department, transaction_date;

The running total resets for each department and accumulates as you progress through dates.

Practical Use Cases

Let’s explore scenarios you’ll encounter in real applications.

Finding Top N Records Per Group

Identify your top 3 salespersons in each region:

WITH ranked_sales AS (
    SELECT 
        salesperson_id,
        name,
        region,
        total_sales,
        RANK() OVER (PARTITION BY region ORDER BY total_sales DESC) as rank
    FROM salesperson_performance
)
SELECT *
FROM ranked_sales
WHERE rank <= 3
ORDER BY region, rank;

Calculating Percentage of Category Total

Determine what percentage each product contributes to its category:

SELECT 
    product_name,
    category,
    revenue,
    SUM(revenue) OVER (PARTITION BY category) as category_revenue,
    ROUND(100.0 * revenue / SUM(revenue) OVER (PARTITION BY category), 2) as pct_of_category
FROM product_sales
ORDER BY category, revenue DESC;

Analyzing First and Last Values

Track customer ordering patterns:

SELECT DISTINCT
    customer_id,
    FIRST_VALUE(order_date) OVER (
        PARTITION BY customer_id 
        ORDER BY order_date
    ) as first_order,
    LAST_VALUE(order_date) OVER (
        PARTITION BY customer_id 
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) as last_order,
    COUNT(*) OVER (PARTITION BY customer_id) as total_orders
FROM orders;

Note the frame specification for LAST_VALUE—without it, you’d get unexpected results because the default frame ends at the current row.

Combining PARTITION BY with ORDER BY

ORDER BY within the OVER clause fundamentally changes window function behavior. It defines the sequence for calculations and determines which rows are included in frame-based operations.

Calculate a 3-month moving average of sales:

SELECT 
    month,
    department,
    monthly_sales,
    AVG(monthly_sales) OVER (
        PARTITION BY department 
        ORDER BY month
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) as moving_avg_3_months
FROM monthly_department_sales
ORDER BY department, month;

The frame specification ROWS BETWEEN 2 PRECEDING AND CURRENT ROW includes the current row plus the two previous rows, giving you a true 3-period moving average.

Compare different frame specifications:

SELECT 
    order_date,
    customer_id,
    order_amount,
    -- Total of all previous orders for this customer
    SUM(order_amount) OVER (
        PARTITION BY customer_id 
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
    ) as previous_total,
    -- Total including current order
    SUM(order_amount) OVER (
        PARTITION BY customer_id 
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as cumulative_total
FROM orders;

Multiple Partitions and Complex Queries

You can partition by multiple columns for hierarchical grouping:

SELECT 
    employee_id,
    region,
    department,
    salary,
    -- Rank within department across all regions
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank,
    -- Rank within region and department
    RANK() OVER (PARTITION BY region, department ORDER BY salary DESC) as regional_dept_rank,
    -- Average salary for this region and department
    AVG(salary) OVER (PARTITION BY region, department) as avg_dept_salary,
    -- Difference from department average
    salary - AVG(salary) OVER (PARTITION BY region, department) as diff_from_avg
FROM employees
ORDER BY region, department, salary DESC;

This query provides multiple analytical perspectives in a single pass, something that would require multiple subqueries or CTEs without window functions.

Performance Considerations and Best Practices

Window functions can be expensive. PostgreSQL must sort and potentially materialize data for each window operation. Here’s how to optimize:

Index Strategy

Create indexes that support your PARTITION BY and ORDER BY columns:

CREATE INDEX idx_sales_category_date ON sales(category, sale_date);

This index helps PostgreSQL efficiently process:

SELECT 
    sale_date,
    category,
    amount,
    SUM(amount) OVER (PARTITION BY category ORDER BY sale_date) as running_total
FROM sales;

When to Use PARTITION BY vs. Alternatives

Window functions aren’t always the answer. For simple aggregates where you don’t need row-level detail, GROUP BY is faster:

-- Faster for just totals
SELECT category, SUM(amount)
FROM sales
GROUP BY category;

-- Use window functions when you need detail rows
SELECT *, SUM(amount) OVER (PARTITION BY category)
FROM sales;

Reuse Window Definitions

Use the WINDOW clause to avoid repetition and improve maintainability:

SELECT 
    product_id,
    category,
    sales,
    AVG(sales) OVER w as avg_sales,
    MAX(sales) OVER w as max_sales,
    MIN(sales) OVER w as min_sales
FROM product_sales
WINDOW w AS (PARTITION BY category);

Common Pitfalls and Troubleshooting

Confusing Window Partitioning with Table Partitioning

PARTITION BY in window functions has nothing to do with PostgreSQL’s table partitioning feature. They’re completely separate concepts with unfortunate naming overlap.

NULL Handling

NULLs form their own partition group:

-- NULLs in category will be grouped together
SELECT 
    product_id,
    category,
    ROW_NUMBER() OVER (PARTITION BY category ORDER BY product_id) as rn
FROM products;

Handle NULLs explicitly if needed:

SELECT 
    product_id,
    COALESCE(category, 'UNCATEGORIZED') as category,
    ROW_NUMBER() OVER (PARTITION BY COALESCE(category, 'UNCATEGORIZED') ORDER BY product_id) as rn
FROM products;

Frame Specification Errors

A common mistake with LAST_VALUE:

-- WRONG: Only looks at current row
SELECT 
    customer_id,
    order_date,
    LAST_VALUE(order_date) OVER (
        PARTITION BY customer_id 
        ORDER BY order_date
    ) as last_order
FROM orders;

-- CORRECT: Extends frame to end of partition
SELECT 
    customer_id,
    order_date,
    LAST_VALUE(order_date) OVER (
        PARTITION BY customer_id 
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) as last_order
FROM orders;

The default frame for window functions with ORDER BY is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which rarely gives you what you want for LAST_VALUE.

Master PARTITION BY and you’ll write cleaner, more efficient analytical queries. It eliminates complex self-joins and correlated subqueries while making your intent explicit. Start with simple examples, understand how ORDER BY and frame specifications affect results, and gradually build to more complex analytics.