How to Use Window Functions in SQLite
Window functions transform how you write analytical queries in SQLite. Unlike aggregate functions that collapse multiple rows into a single result, window functions calculate values across a set of...
Key Insights
- Window functions perform calculations across rows without collapsing results like GROUP BY does, making them essential for rankings, running totals, and row-to-row comparisons in a single query
- SQLite has supported window functions since version 3.25.0 (2018), providing all standard SQL window functions including ROW_NUMBER(), LAG(), LEAD(), and aggregate functions with OVER() clauses
- Proper use of PARTITION BY and frame specifications (ROWS BETWEEN) allows you to control exactly which rows participate in each calculation, enabling complex analytics that would otherwise require self-joins or subqueries
Introduction to Window Functions
Window functions transform how you write analytical queries in SQLite. Unlike aggregate functions that collapse multiple rows into a single result, window functions calculate values across a set of rows while preserving each individual row in your result set.
Consider calculating a running total of sales. With traditional GROUP BY, you’d need multiple queries or a self-join. Window functions solve this elegantly:
-- Traditional approach: requires self-join or subquery
SELECT
date,
amount,
(SELECT SUM(amount)
FROM sales s2
WHERE s2.date <= s1.date) AS running_total
FROM sales s1;
-- Window function approach: clean and efficient
SELECT
date,
amount,
SUM(amount) OVER (ORDER BY date) AS running_total
FROM sales;
The window function version is not only more readable but also typically more efficient. SQLite processes the window function in a single pass rather than executing a correlated subquery for each row.
Basic Window Function Syntax
Every window function uses the OVER() clause to define its “window” - the set of rows used for calculation. The basic syntax includes three optional components:
function_name() OVER (
[PARTITION BY partition_expression]
[ORDER BY sort_expression]
[frame_specification]
)
Here’s how each component works:
-- Basic window: operates on entire result set
SELECT
product_name,
price,
AVG(price) OVER () AS avg_price
FROM products;
-- PARTITION BY: separate windows per category
SELECT
category,
product_name,
price,
AVG(price) OVER (PARTITION BY category) AS category_avg_price
FROM products;
-- ORDER BY: creates ordered frame for calculations
SELECT
order_date,
amount,
SUM(amount) OVER (ORDER BY order_date) AS cumulative_total
FROM orders;
Frame specifications control which rows within the window participate in the calculation:
-- Default frame: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
SELECT
date,
sales,
SUM(sales) OVER (
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM daily_sales;
-- Moving average: last 7 days including current
SELECT
date,
sales,
AVG(sales) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7day
FROM daily_sales;
Ranking Functions
Ranking functions assign positions to rows based on ordering criteria. SQLite provides four ranking functions, each with different behavior for ties:
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT,
department TEXT,
salary INTEGER
);
-- ROW_NUMBER: unique sequential number (arbitrary order for ties)
-- RANK: same rank for ties, gaps in sequence
-- DENSE_RANK: same rank for ties, no gaps
-- NTILE: divides rows into N buckets
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank,
NTILE(4) OVER (PARTITION BY department ORDER BY salary DESC) AS quartile
FROM employees;
A practical use case is finding the top N items per category:
-- Top 3 highest-paid employees per department
WITH ranked_employees AS (
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
)
SELECT name, department, salary
FROM ranked_employees
WHERE rn <= 3;
NTILE() is particularly useful for percentile analysis:
-- Categorize products into price quartiles
SELECT
product_name,
price,
NTILE(4) OVER (ORDER BY price) AS price_quartile,
CASE NTILE(4) OVER (ORDER BY price)
WHEN 1 THEN 'Budget'
WHEN 2 THEN 'Economy'
WHEN 3 THEN 'Premium'
WHEN 4 THEN 'Luxury'
END AS price_category
FROM products;
Aggregate Window Functions
Standard aggregate functions (SUM, AVG, COUNT, MIN, MAX) become window functions when you add an OVER() clause. This enables running calculations and moving windows:
-- Running total with cumulative count
SELECT
order_date,
order_id,
amount,
SUM(amount) OVER (ORDER BY order_date, order_id) AS running_total,
COUNT(*) OVER (ORDER BY order_date, order_id) AS order_count,
AVG(amount) OVER (ORDER BY order_date, order_id) AS running_avg
FROM orders;
Moving windows are essential for time-series analysis:
-- 7-day moving average and 30-day moving sum
SELECT
date,
daily_revenue,
AVG(daily_revenue) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7day,
SUM(daily_revenue) OVER (
ORDER BY date
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
) AS moving_sum_30day
FROM daily_revenue
ORDER BY date;
You can combine PARTITION BY with frame specifications for grouped moving calculations:
-- Moving average per product category
SELECT
date,
category,
sales,
AVG(sales) OVER (
PARTITION BY category
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS category_moving_avg
FROM product_sales;
Value Functions
Value functions access data from other rows relative to the current row. LAG() and LEAD() are the most commonly used:
-- Month-over-month comparison
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS previous_month,
revenue - LAG(revenue) OVER (ORDER BY month) AS mom_change,
ROUND(100.0 * (revenue - LAG(revenue) OVER (ORDER BY month)) /
LAG(revenue) OVER (ORDER BY month), 2) AS mom_change_pct
FROM monthly_revenue;
LEAD() looks forward instead of backward:
-- Compare current price with next price change
SELECT
product_id,
effective_date,
price,
LEAD(price) OVER (PARTITION BY product_id ORDER BY effective_date) AS next_price,
LEAD(effective_date) OVER (PARTITION BY product_id ORDER BY effective_date) AS next_change_date
FROM price_history;
FIRST_VALUE() and LAST_VALUE() extract boundary values from windows:
-- Compare each sale to first and last sale of the day
SELECT
sale_time,
amount,
FIRST_VALUE(amount) OVER (
PARTITION BY DATE(sale_time)
ORDER BY sale_time
) AS first_sale_of_day,
LAST_VALUE(amount) OVER (
PARTITION BY DATE(sale_time)
ORDER BY sale_time
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_sale_of_day
FROM sales;
Note the frame specification for LAST_VALUE() - without it, the default frame only extends to the current row, not the actual last row.
Practical Use Cases
Window functions shine in real-world analytics. Here’s a comprehensive sales dashboard query:
WITH sales_analytics AS (
SELECT
product_id,
product_name,
category,
sale_date,
quantity,
revenue,
-- Rankings
RANK() OVER (PARTITION BY category ORDER BY revenue DESC) AS category_rank,
-- Running totals
SUM(revenue) OVER (PARTITION BY product_id ORDER BY sale_date) AS cumulative_revenue,
-- Period comparisons
LAG(revenue) OVER (PARTITION BY product_id ORDER BY sale_date) AS prev_period_revenue,
-- Moving averages
AVG(revenue) OVER (
PARTITION BY product_id
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7periods,
-- Percentage of category total
100.0 * revenue / SUM(revenue) OVER (PARTITION BY category) AS pct_of_category
FROM sales
)
SELECT * FROM sales_analytics
WHERE category_rank <= 10
ORDER BY category, category_rank;
Deduplication is another practical application:
-- Keep only the most recent record per customer
DELETE FROM customer_data
WHERE rowid NOT IN (
SELECT rowid
FROM (
SELECT
rowid,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY updated_at DESC) AS rn
FROM customer_data
)
WHERE rn = 1
);
Performance Considerations
Window functions are efficient, but understanding their performance characteristics helps you write better queries:
- Indexing matters: Create indexes on columns used in PARTITION BY and ORDER BY clauses:
-- This query benefits from an index
CREATE INDEX idx_sales_category_date ON sales(category, sale_date);
SELECT
category,
sale_date,
revenue,
SUM(revenue) OVER (PARTITION BY category ORDER BY sale_date) AS running_total
FROM sales;
- Avoid redundant window definitions: Use the WINDOW clause to define windows once:
-- Inefficient: repeating window definition
SELECT
product_id,
SUM(revenue) OVER (PARTITION BY category ORDER BY sale_date),
AVG(revenue) OVER (PARTITION BY category ORDER BY sale_date)
FROM sales;
-- Better: define window once
SELECT
product_id,
SUM(revenue) OVER w,
AVG(revenue) OVER w
FROM sales
WINDOW w AS (PARTITION BY category ORDER BY sale_date);
- Consider materialization for complex windows: If you’re repeatedly querying the same window calculations, materialize them:
CREATE TABLE sales_with_analytics AS
SELECT
*,
SUM(revenue) OVER (PARTITION BY product_id ORDER BY sale_date) AS cumulative_revenue,
RANK() OVER (PARTITION BY category ORDER BY revenue DESC) AS category_rank
FROM sales;
CREATE INDEX idx_analytics ON sales_with_analytics(category, category_rank);
Window functions are a powerful tool in SQLite’s analytical arsenal. They eliminate complex self-joins, enable elegant time-series analysis, and make ranking queries straightforward. Master the syntax, understand frame specifications, and you’ll write cleaner, more efficient SQL.