SQL - PARTITION BY Clause | Application Architect

Key Insights

PARTITION BY divides result sets into logical groups for window function calculations without collapsing rows like GROUP BY does, enabling row-level analysis alongside aggregate metrics
Window functions with PARTITION BY execute calculations across partitions independently, making them essential for ranking, running totals, and comparative analytics within categories
Performance optimization requires strategic indexing on partition columns and understanding execution plans, as poorly designed partitioned queries can cause significant performance degradation

Understanding PARTITION BY Fundamentals

The PARTITION BY clause defines logical boundaries within a result set for window functions. Unlike GROUP BY, which collapses rows into aggregate summaries, PARTITION BY maintains all original rows while computing calculations across defined segments.

-- Traditional GROUP BY - collapses rows
SELECT 
    department,
    AVG(salary) as avg_salary
FROM employees
GROUP BY department;

-- PARTITION BY - preserves all rows
SELECT 
    employee_id,
    name,
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) as dept_avg_salary
FROM employees;

The second query returns every employee with their individual salary alongside their department’s average, enabling direct comparison without joining back to an aggregated subquery.

Ranking Within Partitions

Ranking functions become powerful when combined with PARTITION BY, allowing independent ranking within each partition.

SELECT 
    product_name,
    category,
    revenue,
    ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) as rank_in_category,
    RANK() OVER (PARTITION BY category ORDER BY revenue DESC) as rank_with_ties,
    DENSE_RANK() OVER (PARTITION BY category ORDER BY revenue DESC) as dense_rank
FROM products;

Differences between ranking functions:

ROW_NUMBER(): Sequential numbering, no duplicates (1,2,3,4)
RANK(): Ties receive same rank, gaps follow (1,2,2,4)
DENSE_RANK(): Ties receive same rank, no gaps (1,2,2,3)

To retrieve only top performers per category:

WITH ranked_products AS (
    SELECT 
        product_name,
        category,
        revenue,
        ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) as rn
    FROM products
)
SELECT product_name, category, revenue
FROM ranked_products
WHERE rn <= 3;

Running Totals and Moving Calculations

PARTITION BY enables cumulative calculations that reset at partition boundaries.

SELECT 
    order_date,
    customer_id,
    order_amount,
    SUM(order_amount) OVER (
        PARTITION BY customer_id 
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as running_total,
    AVG(order_amount) OVER (
        PARTITION BY customer_id 
        ORDER BY order_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) as moving_avg_3_orders
FROM orders;

The frame clause (ROWS BETWEEN) defines which rows within the partition to include:

UNBOUNDED PRECEDING: From partition start
CURRENT ROW: Current row
n PRECEDING/FOLLOWING: n rows before/after
UNBOUNDED FOLLOWING: To partition end

Comparative Analytics

Access adjacent rows within partitions using LAG and LEAD functions.

SELECT 
    stock_symbol,
    trade_date,
    closing_price,
    LAG(closing_price, 1) OVER (
        PARTITION BY stock_symbol 
        ORDER BY trade_date
    ) as previous_close,
    LEAD(closing_price, 1) OVER (
        PARTITION BY stock_symbol 
        ORDER BY trade_date
    ) as next_close,
    closing_price - LAG(closing_price, 1) OVER (
        PARTITION BY stock_symbol 
        ORDER BY trade_date
    ) as daily_change
FROM stock_prices;

Calculate percentage changes with NULL handling:

SELECT 
    stock_symbol,
    trade_date,
    closing_price,
    ROUND(
        100.0 * (closing_price - LAG(closing_price) OVER (
            PARTITION BY stock_symbol ORDER BY trade_date
        )) / NULLIF(LAG(closing_price) OVER (
            PARTITION BY stock_symbol ORDER BY trade_date
        ), 0),
        2
    ) as pct_change
FROM stock_prices;

Multiple Partition Strategies

Combine multiple partition columns for granular analysis.

SELECT 
    sale_date,
    region,
    product_category,
    sales_amount,
    -- Regional totals
    SUM(sales_amount) OVER (
        PARTITION BY region
    ) as region_total,
    -- Category totals within region
    SUM(sales_amount) OVER (
        PARTITION BY region, product_category
    ) as region_category_total,
    -- Contribution percentage
    ROUND(
        100.0 * sales_amount / SUM(sales_amount) OVER (
            PARTITION BY region, product_category
        ),
        2
    ) as pct_of_region_category
FROM sales;

Filtering Window Function Results

Window functions execute after WHERE but before ORDER BY in query processing. Use CTEs or subqueries to filter on window function results.

-- Find employees earning above their department average
WITH employee_comparison AS (
    SELECT 
        employee_id,
        name,
        department,
        salary,
        AVG(salary) OVER (PARTITION BY department) as dept_avg
    FROM employees
)
SELECT *
FROM employee_comparison
WHERE salary > dept_avg;

Identify gaps in sequences within partitions:

WITH numbered_records AS (
    SELECT 
        account_id,
        transaction_date,
        ROW_NUMBER() OVER (
            PARTITION BY account_id 
            ORDER BY transaction_date
        ) as rn,
        LAG(transaction_date) OVER (
            PARTITION BY account_id 
            ORDER BY transaction_date
        ) as prev_date
    FROM transactions
)
SELECT 
    account_id,
    prev_date,
    transaction_date,
    transaction_date - prev_date as days_gap
FROM numbered_records
WHERE transaction_date - prev_date > 30;

Performance Considerations

Index partition columns and ORDER BY columns together for optimal performance.

-- Efficient index for partitioned window functions
CREATE INDEX idx_sales_region_date 
ON sales(region, sale_date);

-- Supports this query efficiently
SELECT 
    region,
    sale_date,
    sales_amount,
    SUM(sales_amount) OVER (
        PARTITION BY region 
        ORDER BY sale_date
    ) as running_total
FROM sales;

Check execution plans to identify expensive operations:

EXPLAIN ANALYZE
SELECT 
    customer_id,
    order_date,
    ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date)
FROM orders;

Look for “WindowAgg” or “Window” nodes in the plan. Excessive sorting or memory spills indicate indexing opportunities.

Materialize complex window calculations when reused:

-- Instead of recalculating window functions multiple times
CREATE TEMPORARY TABLE customer_metrics AS
SELECT 
    customer_id,
    order_date,
    order_amount,
    AVG(order_amount) OVER (PARTITION BY customer_id) as avg_order,
    COUNT(*) OVER (PARTITION BY customer_id) as total_orders
FROM orders;

-- Query the materialized results
SELECT *
FROM customer_metrics
WHERE order_amount > avg_order * 1.5;

Practical Applications

Cohort analysis for user retention:

SELECT 
    user_id,
    signup_date,
    activity_date,
    DATE_PART('day', activity_date - signup_date) as days_since_signup,
    ROW_NUMBER() OVER (
        PARTITION BY user_id 
        ORDER BY activity_date
    ) as activity_sequence,
    FIRST_VALUE(activity_date) OVER (
        PARTITION BY user_id 
        ORDER BY activity_date
    ) as first_activity
FROM user_activities;

Deduplication with partition-based ranking:

WITH ranked_records AS (
    SELECT 
        *,
        ROW_NUMBER() OVER (
            PARTITION BY email 
            ORDER BY created_at DESC
        ) as rn
    FROM user_registrations
)
DELETE FROM user_registrations
WHERE id IN (
    SELECT id 
    FROM ranked_records 
    WHERE rn > 1
);

PARTITION BY transforms complex analytical queries into readable, performant SQL by eliminating self-joins and correlated subqueries. Master window functions with proper partitioning strategies to unlock advanced data analysis capabilities while maintaining query performance.