SQL - PARTITION BY Clause
The PARTITION BY clause defines logical boundaries within a result set for window functions. Unlike GROUP BY, which collapses rows into aggregate summaries, PARTITION BY maintains all original rows...
Key Insights
- PARTITION BY divides result sets into logical groups for window function calculations without collapsing rows like GROUP BY does, enabling row-level analysis alongside aggregate metrics
- Window functions with PARTITION BY execute calculations across partitions independently, making them essential for ranking, running totals, and comparative analytics within categories
- Performance optimization requires strategic indexing on partition columns and understanding execution plans, as poorly designed partitioned queries can cause significant performance degradation
Understanding PARTITION BY Fundamentals
The PARTITION BY clause defines logical boundaries within a result set for window functions. Unlike GROUP BY, which collapses rows into aggregate summaries, PARTITION BY maintains all original rows while computing calculations across defined segments.
-- Traditional GROUP BY - collapses rows
SELECT
department,
AVG(salary) as avg_salary
FROM employees
GROUP BY department;
-- PARTITION BY - preserves all rows
SELECT
employee_id,
name,
department,
salary,
AVG(salary) OVER (PARTITION BY department) as dept_avg_salary
FROM employees;
The second query returns every employee with their individual salary alongside their department’s average, enabling direct comparison without joining back to an aggregated subquery.
Ranking Within Partitions
Ranking functions become powerful when combined with PARTITION BY, allowing independent ranking within each partition.
SELECT
product_name,
category,
revenue,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) as rank_in_category,
RANK() OVER (PARTITION BY category ORDER BY revenue DESC) as rank_with_ties,
DENSE_RANK() OVER (PARTITION BY category ORDER BY revenue DESC) as dense_rank
FROM products;
Differences between ranking functions:
ROW_NUMBER(): Sequential numbering, no duplicates (1,2,3,4)RANK(): Ties receive same rank, gaps follow (1,2,2,4)DENSE_RANK(): Ties receive same rank, no gaps (1,2,2,3)
To retrieve only top performers per category:
WITH ranked_products AS (
SELECT
product_name,
category,
revenue,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) as rn
FROM products
)
SELECT product_name, category, revenue
FROM ranked_products
WHERE rn <= 3;
Running Totals and Moving Calculations
PARTITION BY enables cumulative calculations that reset at partition boundaries.
SELECT
order_date,
customer_id,
order_amount,
SUM(order_amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as running_total,
AVG(order_amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) as moving_avg_3_orders
FROM orders;
The frame clause (ROWS BETWEEN) defines which rows within the partition to include:
UNBOUNDED PRECEDING: From partition startCURRENT ROW: Current rown PRECEDING/FOLLOWING: n rows before/afterUNBOUNDED FOLLOWING: To partition end
Comparative Analytics
Access adjacent rows within partitions using LAG and LEAD functions.
SELECT
stock_symbol,
trade_date,
closing_price,
LAG(closing_price, 1) OVER (
PARTITION BY stock_symbol
ORDER BY trade_date
) as previous_close,
LEAD(closing_price, 1) OVER (
PARTITION BY stock_symbol
ORDER BY trade_date
) as next_close,
closing_price - LAG(closing_price, 1) OVER (
PARTITION BY stock_symbol
ORDER BY trade_date
) as daily_change
FROM stock_prices;
Calculate percentage changes with NULL handling:
SELECT
stock_symbol,
trade_date,
closing_price,
ROUND(
100.0 * (closing_price - LAG(closing_price) OVER (
PARTITION BY stock_symbol ORDER BY trade_date
)) / NULLIF(LAG(closing_price) OVER (
PARTITION BY stock_symbol ORDER BY trade_date
), 0),
2
) as pct_change
FROM stock_prices;
Multiple Partition Strategies
Combine multiple partition columns for granular analysis.
SELECT
sale_date,
region,
product_category,
sales_amount,
-- Regional totals
SUM(sales_amount) OVER (
PARTITION BY region
) as region_total,
-- Category totals within region
SUM(sales_amount) OVER (
PARTITION BY region, product_category
) as region_category_total,
-- Contribution percentage
ROUND(
100.0 * sales_amount / SUM(sales_amount) OVER (
PARTITION BY region, product_category
),
2
) as pct_of_region_category
FROM sales;
Filtering Window Function Results
Window functions execute after WHERE but before ORDER BY in query processing. Use CTEs or subqueries to filter on window function results.
-- Find employees earning above their department average
WITH employee_comparison AS (
SELECT
employee_id,
name,
department,
salary,
AVG(salary) OVER (PARTITION BY department) as dept_avg
FROM employees
)
SELECT *
FROM employee_comparison
WHERE salary > dept_avg;
Identify gaps in sequences within partitions:
WITH numbered_records AS (
SELECT
account_id,
transaction_date,
ROW_NUMBER() OVER (
PARTITION BY account_id
ORDER BY transaction_date
) as rn,
LAG(transaction_date) OVER (
PARTITION BY account_id
ORDER BY transaction_date
) as prev_date
FROM transactions
)
SELECT
account_id,
prev_date,
transaction_date,
transaction_date - prev_date as days_gap
FROM numbered_records
WHERE transaction_date - prev_date > 30;
Performance Considerations
Index partition columns and ORDER BY columns together for optimal performance.
-- Efficient index for partitioned window functions
CREATE INDEX idx_sales_region_date
ON sales(region, sale_date);
-- Supports this query efficiently
SELECT
region,
sale_date,
sales_amount,
SUM(sales_amount) OVER (
PARTITION BY region
ORDER BY sale_date
) as running_total
FROM sales;
Check execution plans to identify expensive operations:
EXPLAIN ANALYZE
SELECT
customer_id,
order_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date)
FROM orders;
Look for “WindowAgg” or “Window” nodes in the plan. Excessive sorting or memory spills indicate indexing opportunities.
Materialize complex window calculations when reused:
-- Instead of recalculating window functions multiple times
CREATE TEMPORARY TABLE customer_metrics AS
SELECT
customer_id,
order_date,
order_amount,
AVG(order_amount) OVER (PARTITION BY customer_id) as avg_order,
COUNT(*) OVER (PARTITION BY customer_id) as total_orders
FROM orders;
-- Query the materialized results
SELECT *
FROM customer_metrics
WHERE order_amount > avg_order * 1.5;
Practical Applications
Cohort analysis for user retention:
SELECT
user_id,
signup_date,
activity_date,
DATE_PART('day', activity_date - signup_date) as days_since_signup,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY activity_date
) as activity_sequence,
FIRST_VALUE(activity_date) OVER (
PARTITION BY user_id
ORDER BY activity_date
) as first_activity
FROM user_activities;
Deduplication with partition-based ranking:
WITH ranked_records AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY created_at DESC
) as rn
FROM user_registrations
)
DELETE FROM user_registrations
WHERE id IN (
SELECT id
FROM ranked_records
WHERE rn > 1
);
PARTITION BY transforms complex analytical queries into readable, performant SQL by eliminating self-joins and correlated subqueries. Master window functions with proper partitioning strategies to unlock advanced data analysis capabilities while maintaining query performance.