SQL - Window Functions Complete Guide
Window functions operate on a set of rows and return a single value for each row, unlike aggregate functions that collapse multiple rows into one. They're called 'window' functions because they...
Key Insights
- Window functions perform calculations across table rows related to the current row without collapsing results like GROUP BY, enabling running totals, rankings, and moving averages in a single query
- The PARTITION BY clause divides result sets into logical groups while ORDER BY determines calculation sequence, with frame specifications (ROWS/RANGE) controlling which rows participate in each calculation
- Understanding the difference between aggregate window functions, ranking functions, and value functions is critical for choosing the right tool—each serves distinct analytical needs from cumulative sums to percentile calculations
What Are Window Functions
Window functions operate on a set of rows and return a single value for each row, unlike aggregate functions that collapse multiple rows into one. They’re called “window” functions because they perform calculations across a sliding window of rows related to the current row.
The basic syntax follows this pattern:
function_name([arguments]) OVER (
[PARTITION BY partition_expression]
[ORDER BY sort_expression [ASC|DESC]]
[frame_specification]
)
The OVER clause is what distinguishes a window function from a regular function. Without it, you’re just using a standard aggregate or scalar function.
Aggregate Window Functions
Standard aggregate functions (SUM, AVG, COUNT, MIN, MAX) can be used as window functions. Here’s how they differ from their grouped counterparts:
-- Sample sales data
CREATE TABLE sales (
sale_id INT,
product VARCHAR(50),
category VARCHAR(50),
amount DECIMAL(10,2),
sale_date DATE
);
-- Regular GROUP BY - collapses rows
SELECT
category,
SUM(amount) as total_sales
FROM sales
GROUP BY category;
-- Window function - preserves all rows
SELECT
sale_id,
product,
category,
amount,
SUM(amount) OVER (PARTITION BY category) as category_total,
amount / SUM(amount) OVER (PARTITION BY category) * 100 as pct_of_category
FROM sales;
The window function version returns every row while adding calculated columns. This is invaluable when you need both detail and aggregate information.
Running totals demonstrate the power of ORDER BY within the OVER clause:
SELECT
sale_date,
amount,
SUM(amount) OVER (ORDER BY sale_date) as running_total,
AVG(amount) OVER (ORDER BY sale_date) as running_average
FROM sales
ORDER BY sale_date;
Ranking Functions
Ranking functions assign positions to rows within partitions. The four main ranking functions handle ties differently:
-- Setup test data
CREATE TABLE employees (
emp_id INT,
department VARCHAR(50),
salary DECIMAL(10,2)
);
SELECT
emp_id,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as row_num,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dense_rank,
PERCENT_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as pct_rank
FROM employees;
ROW_NUMBER() assigns unique sequential integers, even for ties. RANK() gives the same rank to ties but skips subsequent ranks. DENSE_RANK() also assigns the same rank to ties but doesn’t skip ranks. PERCENT_RANK() returns the relative rank as a percentage (0 to 1).
Practical example - finding the top 3 earners per department:
WITH ranked_employees AS (
SELECT
emp_id,
department,
salary,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank
FROM employees
)
SELECT emp_id, department, salary
FROM ranked_employees
WHERE rank <= 3;
Value Functions
Value functions access data from other rows without self-joins. The most common are LAG, LEAD, FIRST_VALUE, LAST_VALUE, and NTH_VALUE.
-- Stock price analysis
CREATE TABLE stock_prices (
ticker VARCHAR(10),
price_date DATE,
close_price DECIMAL(10,2)
);
SELECT
ticker,
price_date,
close_price,
LAG(close_price, 1) OVER (PARTITION BY ticker ORDER BY price_date) as prev_close,
LEAD(close_price, 1) OVER (PARTITION BY ticker ORDER BY price_date) as next_close,
close_price - LAG(close_price, 1) OVER (PARTITION BY ticker ORDER BY price_date) as daily_change,
FIRST_VALUE(close_price) OVER (PARTITION BY ticker ORDER BY price_date) as period_open,
LAST_VALUE(close_price) OVER (
PARTITION BY ticker
ORDER BY price_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) as period_close
FROM stock_prices;
Note the frame specification for LAST_VALUE. Without it, the default frame is “RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW”, which doesn’t include future rows.
Frame Specifications
Frame specifications define exactly which rows participate in the window calculation. The syntax:
{ROWS | RANGE} BETWEEN frame_start AND frame_end
Frame boundaries can be:
- UNBOUNDED PRECEDING: first row of partition
- n PRECEDING: n rows before current
- CURRENT ROW: the current row
- n FOLLOWING: n rows after current
- UNBOUNDED FOLLOWING: last row of partition
-- 7-day moving average
SELECT
price_date,
close_price,
AVG(close_price) OVER (
ORDER BY price_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as moving_avg_7day,
AVG(close_price) OVER (
ORDER BY price_date
ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
) as centered_avg_7day
FROM stock_prices
WHERE ticker = 'AAPL';
ROWS vs RANGE matters when you have duplicate values in the ORDER BY column. ROWS counts physical rows, while RANGE includes all rows with the same ORDER BY value.
Practical Patterns
Gap and Island Detection - Finding consecutive sequences:
-- Find consecutive login streaks
WITH login_groups AS (
SELECT
user_id,
login_date,
login_date - ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date) as group_id
FROM user_logins
)
SELECT
user_id,
MIN(login_date) as streak_start,
MAX(login_date) as streak_end,
COUNT(*) as streak_length
FROM login_groups
GROUP BY user_id, group_id
HAVING COUNT(*) >= 5;
Percentile Calculations:
SELECT
product,
amount,
NTILE(4) OVER (ORDER BY amount) as quartile,
PERCENT_RANK() OVER (ORDER BY amount) as percentile,
CUME_DIST() OVER (ORDER BY amount) as cumulative_dist
FROM sales;
Year-over-Year Comparisons:
SELECT
EXTRACT(YEAR FROM sale_date) as year,
EXTRACT(MONTH FROM sale_date) as month,
SUM(amount) as monthly_sales,
LAG(SUM(amount), 12) OVER (ORDER BY EXTRACT(YEAR FROM sale_date), EXTRACT(MONTH FROM sale_date)) as same_month_last_year,
(SUM(amount) - LAG(SUM(amount), 12) OVER (ORDER BY EXTRACT(YEAR FROM sale_date), EXTRACT(MONTH FROM sale_date)))
/ LAG(SUM(amount), 12) OVER (ORDER BY EXTRACT(YEAR FROM sale_date), EXTRACT(MONTH FROM sale_date)) * 100 as yoy_growth_pct
FROM sales
GROUP BY EXTRACT(YEAR FROM sale_date), EXTRACT(MONTH FROM sale_date)
ORDER BY year, month;
Performance Considerations
Window functions can be expensive. Optimize by:
- Reusing window definitions:
SELECT
product,
amount,
SUM(amount) OVER w as total,
AVG(amount) OVER w as average,
COUNT(*) OVER w as count
FROM sales
WINDOW w AS (PARTITION BY category ORDER BY sale_date);
- Filtering before windowing:
-- Good: Filter first
SELECT * FROM (
SELECT
product,
amount,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY amount DESC) as rn
FROM sales
WHERE sale_date >= '2024-01-01'
) ranked
WHERE rn <= 10;
-
Using appropriate indexes on PARTITION BY and ORDER BY columns.
-
Limiting frame size when possible - smaller frames calculate faster.
Window functions eliminate complex self-joins and subqueries, making SQL more readable and often more performant. Master these patterns and you’ll handle most analytical queries with elegant, efficient code.