SQL - Window Functions Complete Guide

Key Insights

Window functions perform calculations across table rows related to the current row without collapsing results like GROUP BY, enabling running totals, rankings, and moving averages in a single query
The PARTITION BY clause divides result sets into logical groups while ORDER BY determines calculation sequence, with frame specifications (ROWS/RANGE) controlling which rows participate in each calculation
Understanding the difference between aggregate window functions, ranking functions, and value functions is critical for choosing the right tool—each serves distinct analytical needs from cumulative sums to percentile calculations

What Are Window Functions

Window functions operate on a set of rows and return a single value for each row, unlike aggregate functions that collapse multiple rows into one. They’re called “window” functions because they perform calculations across a sliding window of rows related to the current row.

The basic syntax follows this pattern:

function_name([arguments]) OVER (
    [PARTITION BY partition_expression]
    [ORDER BY sort_expression [ASC|DESC]]
    [frame_specification]
)

The OVER clause is what distinguishes a window function from a regular function. Without it, you’re just using a standard aggregate or scalar function.

Aggregate Window Functions

Standard aggregate functions (SUM, AVG, COUNT, MIN, MAX) can be used as window functions. Here’s how they differ from their grouped counterparts:

-- Sample sales data
CREATE TABLE sales (
    sale_id INT,
    product VARCHAR(50),
    category VARCHAR(50),
    amount DECIMAL(10,2),
    sale_date DATE
);

-- Regular GROUP BY - collapses rows
SELECT 
    category,
    SUM(amount) as total_sales
FROM sales
GROUP BY category;

-- Window function - preserves all rows
SELECT 
    sale_id,
    product,
    category,
    amount,
    SUM(amount) OVER (PARTITION BY category) as category_total,
    amount / SUM(amount) OVER (PARTITION BY category) * 100 as pct_of_category
FROM sales;

The window function version returns every row while adding calculated columns. This is invaluable when you need both detail and aggregate information.

Running totals demonstrate the power of ORDER BY within the OVER clause:

SELECT 
    sale_date,
    amount,
    SUM(amount) OVER (ORDER BY sale_date) as running_total,
    AVG(amount) OVER (ORDER BY sale_date) as running_average
FROM sales
ORDER BY sale_date;

Ranking Functions

Ranking functions assign positions to rows within partitions. The four main ranking functions handle ties differently:

-- Setup test data
CREATE TABLE employees (
    emp_id INT,
    department VARCHAR(50),
    salary DECIMAL(10,2)
);

SELECT 
    emp_id,
    department,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as row_num,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank,
    DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dense_rank,
    PERCENT_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as pct_rank
FROM employees;

ROW_NUMBER() assigns unique sequential integers, even for ties. RANK() gives the same rank to ties but skips subsequent ranks. DENSE_RANK() also assigns the same rank to ties but doesn’t skip ranks. PERCENT_RANK() returns the relative rank as a percentage (0 to 1).

Practical example - finding the top 3 earners per department:

WITH ranked_employees AS (
    SELECT 
        emp_id,
        department,
        salary,
        DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank
    FROM employees
)
SELECT emp_id, department, salary
FROM ranked_employees
WHERE rank <= 3;

Value Functions

Value functions access data from other rows without self-joins. The most common are LAG, LEAD, FIRST_VALUE, LAST_VALUE, and NTH_VALUE.

-- Stock price analysis
CREATE TABLE stock_prices (
    ticker VARCHAR(10),
    price_date DATE,
    close_price DECIMAL(10,2)
);

SELECT 
    ticker,
    price_date,
    close_price,
    LAG(close_price, 1) OVER (PARTITION BY ticker ORDER BY price_date) as prev_close,
    LEAD(close_price, 1) OVER (PARTITION BY ticker ORDER BY price_date) as next_close,
    close_price - LAG(close_price, 1) OVER (PARTITION BY ticker ORDER BY price_date) as daily_change,
    FIRST_VALUE(close_price) OVER (PARTITION BY ticker ORDER BY price_date) as period_open,
    LAST_VALUE(close_price) OVER (
        PARTITION BY ticker 
        ORDER BY price_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) as period_close
FROM stock_prices;

Note the frame specification for LAST_VALUE. Without it, the default frame is “RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW”, which doesn’t include future rows.

Frame Specifications

Frame specifications define exactly which rows participate in the window calculation. The syntax:

{ROWS | RANGE} BETWEEN frame_start AND frame_end

Frame boundaries can be:

UNBOUNDED PRECEDING: first row of partition
n PRECEDING: n rows before current
CURRENT ROW: the current row
n FOLLOWING: n rows after current
UNBOUNDED FOLLOWING: last row of partition

-- 7-day moving average
SELECT 
    price_date,
    close_price,
    AVG(close_price) OVER (
        ORDER BY price_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) as moving_avg_7day,
    AVG(close_price) OVER (
        ORDER BY price_date
        ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
    ) as centered_avg_7day
FROM stock_prices
WHERE ticker = 'AAPL';

ROWS vs RANGE matters when you have duplicate values in the ORDER BY column. ROWS counts physical rows, while RANGE includes all rows with the same ORDER BY value.

Practical Patterns

Gap and Island Detection - Finding consecutive sequences:

-- Find consecutive login streaks
WITH login_groups AS (
    SELECT 
        user_id,
        login_date,
        login_date - ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date) as group_id
    FROM user_logins
)
SELECT 
    user_id,
    MIN(login_date) as streak_start,
    MAX(login_date) as streak_end,
    COUNT(*) as streak_length
FROM login_groups
GROUP BY user_id, group_id
HAVING COUNT(*) >= 5;

Percentile Calculations:

SELECT 
    product,
    amount,
    NTILE(4) OVER (ORDER BY amount) as quartile,
    PERCENT_RANK() OVER (ORDER BY amount) as percentile,
    CUME_DIST() OVER (ORDER BY amount) as cumulative_dist
FROM sales;

Year-over-Year Comparisons:

SELECT 
    EXTRACT(YEAR FROM sale_date) as year,
    EXTRACT(MONTH FROM sale_date) as month,
    SUM(amount) as monthly_sales,
    LAG(SUM(amount), 12) OVER (ORDER BY EXTRACT(YEAR FROM sale_date), EXTRACT(MONTH FROM sale_date)) as same_month_last_year,
    (SUM(amount) - LAG(SUM(amount), 12) OVER (ORDER BY EXTRACT(YEAR FROM sale_date), EXTRACT(MONTH FROM sale_date))) 
        / LAG(SUM(amount), 12) OVER (ORDER BY EXTRACT(YEAR FROM sale_date), EXTRACT(MONTH FROM sale_date)) * 100 as yoy_growth_pct
FROM sales
GROUP BY EXTRACT(YEAR FROM sale_date), EXTRACT(MONTH FROM sale_date)
ORDER BY year, month;

Performance Considerations

Window functions can be expensive. Optimize by:

Reusing window definitions:

SELECT 
    product,
    amount,
    SUM(amount) OVER w as total,
    AVG(amount) OVER w as average,
    COUNT(*) OVER w as count
FROM sales
WINDOW w AS (PARTITION BY category ORDER BY sale_date);

Filtering before windowing:

-- Good: Filter first
SELECT * FROM (
    SELECT 
        product,
        amount,
        ROW_NUMBER() OVER (PARTITION BY category ORDER BY amount DESC) as rn
    FROM sales
    WHERE sale_date >= '2024-01-01'
) ranked
WHERE rn <= 10;

Using appropriate indexes on PARTITION BY and ORDER BY columns.
Limiting frame size when possible - smaller frames calculate faster.

Window functions eliminate complex self-joins and subqueries, making SQL more readable and often more performant. Master these patterns and you’ll handle most analytical queries with elegant, efficient code.