SQL: Window Functions Explained

Window functions operate on a set of rows related to the current row, performing calculations while preserving individual row identity. Unlike aggregate functions that collapse multiple rows into a...

Key Insights

  • Window functions perform calculations across related rows without collapsing them like aggregate functions do, making them essential for rankings, running totals, and row-to-row comparisons.
  • The OVER() clause defines the window through PARTITION BY (grouping), ORDER BY (sorting), and frame specifications (which rows to include in calculations).
  • Window functions typically outperform equivalent self-joins and correlated subqueries, but understanding frame defaults and NULL behavior prevents subtle bugs.

What Are Window Functions?

Window functions operate on a set of rows related to the current row, performing calculations while preserving individual row identity. Unlike aggregate functions that collapse multiple rows into a single result, window functions return a value for every input row.

Consider the difference:

-- Aggregate function: collapses to one row per department
SELECT department, SUM(salary) as total_salary
FROM employees
GROUP BY department;

-- Window function: keeps all rows, adds running calculation
SELECT 
    employee_name,
    department,
    salary,
    SUM(salary) OVER (PARTITION BY department) as dept_total
FROM employees;

The aggregate version returns one row per department. The window function version returns every employee with their individual salary plus the department total. This distinction makes window functions invaluable for analytics where you need both detail and aggregate context simultaneously.

The Anatomy of a Window Function

Every window function includes an OVER() clause that defines its “window”—the set of rows used for calculation. This clause has three key components:

PARTITION BY divides rows into groups. The window function resets for each partition:

SELECT 
    employee_name,
    department,
    salary,
    AVG(salary) OVER (PARTITION BY department) as dept_avg
FROM employees;

Without PARTITION BY, the window spans all rows. With it, calculations occur independently within each partition.

ORDER BY establishes row sequence within partitions, crucial for ranking and cumulative calculations:

SELECT 
    employee_name,
    department,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as salary_rank
FROM employees;

Frame specification defines exactly which rows within the partition to include. The default frame depends on whether ORDER BY is present:

-- Default frame: all rows in partition (without ORDER BY)
SUM(salary) OVER (PARTITION BY department)

-- Default frame: start of partition to current row (with ORDER BY)
SUM(salary) OVER (PARTITION BY department ORDER BY hire_date)

-- Explicit frame: current row and two preceding
SUM(salary) OVER (
    PARTITION BY department 
    ORDER BY hire_date 
    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
)

Frame specifications use ROWS (physical row count) or RANGE (logical value ranges). Most use cases need ROWS.

Common Window Functions in Practice

Window functions fall into three categories: ranking, aggregate, and value functions.

Ranking functions assign positions within ordered partitions:

CREATE TABLE employees (
    employee_id INTEGER PRIMARY KEY,
    employee_name TEXT,
    department TEXT,
    salary INTEGER,
    hire_date DATE
);

-- Find top 3 earners per department
SELECT 
    employee_name,
    department,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as row_num,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank,
    DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dense_rank
FROM employees;

The differences matter: ROW_NUMBER() always produces unique values (1, 2, 3, 4). RANK() creates gaps after ties (1, 2, 2, 4). DENSE_RANK() has no gaps (1, 2, 2, 3).

Aggregate functions as window functions provide running calculations:

SELECT 
    employee_name,
    department,
    salary,
    hire_date,
    SUM(salary) OVER (
        PARTITION BY department 
        ORDER BY hire_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as running_total,
    COUNT(*) OVER (PARTITION BY department) as dept_size
FROM employees
ORDER BY department, hire_date;

Value functions access data from other rows relative to the current row:

SELECT 
    employee_name,
    salary,
    hire_date,
    LAG(salary, 1) OVER (ORDER BY hire_date) as previous_hire_salary,
    LEAD(salary, 1) OVER (ORDER BY hire_date) as next_hire_salary,
    salary - LAG(salary, 1) OVER (ORDER BY hire_date) as salary_change
FROM employees;

LAG() looks backward, LEAD() looks forward. The second parameter specifies how many rows to offset (default: 1).

Practical Use Cases

Window functions excel at time-series analysis and comparative metrics. Consider sales analytics:

CREATE TABLE sales (
    sale_id INTEGER PRIMARY KEY,
    sale_date DATE,
    product TEXT,
    amount DECIMAL(10,2)
);

-- Running total, moving average, and YoY comparison
SELECT 
    sale_date,
    product,
    amount,
    -- Running total for the year
    SUM(amount) OVER (
        PARTITION BY product, strftime('%Y', sale_date)
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as ytd_total,
    -- 3-month moving average
    AVG(amount) OVER (
        PARTITION BY product
        ORDER BY sale_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) as three_month_avg,
    -- Previous year same period
    LAG(amount, 12) OVER (
        PARTITION BY product
        ORDER BY sale_date
    ) as same_period_last_year,
    -- YoY growth percentage
    ROUND(
        (amount - LAG(amount, 12) OVER (PARTITION BY product ORDER BY sale_date)) 
        / LAG(amount, 12) OVER (PARTITION BY product ORDER BY sale_date) * 100,
        2
    ) as yoy_growth_pct
FROM sales
ORDER BY product, sale_date;

This single query replaces what would require multiple self-joins or correlated subqueries, improving both readability and performance.

For percentile calculations:

SELECT 
    employee_name,
    salary,
    NTILE(4) OVER (ORDER BY salary) as salary_quartile,
    PERCENT_RANK() OVER (ORDER BY salary) as percentile
FROM employees;

NTILE(n) divides rows into n equal groups. PERCENT_RANK() returns the relative rank as a percentage (0 to 1).

Performance Considerations

Window functions generally outperform equivalent self-joins because they scan the table once:

-- Window function approach (efficient)
SELECT 
    e1.employee_name,
    e1.salary,
    AVG(e2.salary) as dept_avg
FROM employees e1
JOIN employees e2 ON e1.department = e2.department
GROUP BY e1.employee_id, e1.employee_name, e1.salary;

-- Better: Window function
SELECT 
    employee_name,
    salary,
    AVG(salary) OVER (PARTITION BY department) as dept_avg
FROM employees;

Check execution plans to verify:

EXPLAIN QUERY PLAN
SELECT 
    employee_name,
    salary,
    AVG(salary) OVER (PARTITION BY department) as dept_avg
FROM employees;

SQLite optimizes window functions well, but indexes on PARTITION BY and ORDER BY columns help, especially with large datasets. However, SQLite has limitations: it doesn’t support all frame specifications that PostgreSQL or SQL Server offer, and complex window functions may not parallelize.

For very large result sets, consider whether you need all rows. Sometimes filtering before applying window functions dramatically improves performance:

-- Filter first, then apply window function
SELECT * FROM (
    SELECT 
        employee_name,
        salary,
        RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank
    FROM employees
    WHERE hire_date >= date('now', '-5 years')
)
WHERE rank <= 10;

Common Pitfalls and Best Practices

The default frame specification catches many developers off guard. With ORDER BY, the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, not all rows in the partition:

-- This might not do what you expect
SELECT 
    sale_date,
    amount,
    AVG(amount) OVER (ORDER BY sale_date) as avg_so_far
FROM sales;

-- Be explicit about what you want
SELECT 
    sale_date,
    amount,
    -- Average of all sales (entire partition)
    AVG(amount) OVER () as overall_avg,
    -- Average up to current row
    AVG(amount) OVER (
        ORDER BY sale_date 
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as cumulative_avg
FROM sales;

NULL values require attention. Window functions handle NULLs differently:

-- LAG returns NULL if no previous row exists
SELECT 
    sale_date,
    amount,
    LAG(amount, 1, 0) OVER (ORDER BY sale_date) as previous_amount,
    amount - COALESCE(LAG(amount, 1) OVER (ORDER BY sale_date), 0) as change
FROM sales;

The third parameter to LAG() and LEAD() provides a default value when the offset goes beyond available rows.

When using multiple window functions with identical OVER clauses, define them once with WINDOW:

SELECT 
    employee_name,
    salary,
    AVG(salary) OVER w as dept_avg,
    MAX(salary) OVER w as dept_max,
    RANK() OVER w as salary_rank
FROM employees
WINDOW w AS (PARTITION BY department ORDER BY salary DESC);

This improves readability and potentially performance, as SQLite may optimize repeated window definitions.

Window functions transform complex analytical queries from multi-step operations with temporary tables into single, readable statements. Master the OVER() clause components, understand frame specifications, and you’ll handle most reporting and analytics requirements efficiently.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.