How to Use LAG and LEAD in PostgreSQL
Window functions in PostgreSQL perform calculations across sets of rows related to the current row, without collapsing results like aggregate functions do. LAG and LEAD are two of the most practical...
Key Insights
- LAG and LEAD window functions eliminate the need for complex self-joins when comparing values across adjacent rows, making queries simpler and often more performant
- These functions require an ORDER BY clause within the OVER() specification to define row sequence, and benefit significantly from PARTITION BY for grouped comparisons
- Proper indexing on partition and order columns is critical for performance—window functions can be expensive on large datasets without appropriate indexes
Introduction to Window Functions
Window functions in PostgreSQL perform calculations across sets of rows related to the current row, without collapsing results like aggregate functions do. LAG and LEAD are two of the most practical window functions, designed specifically for accessing data from previous or following rows within a result set.
The problem they solve is straightforward but common: you need to compare a value in the current row with values from adjacent rows. Before window functions, you’d write self-joins—joining a table to itself with complex conditions to align the “current” row with “previous” or “next” rows. These queries become unwieldy fast, especially when you need multiple comparisons or work with partitioned data.
LAG retrieves values from preceding rows, while LEAD retrieves values from following rows. Both operate within the ordered set defined by your query, making row-to-row comparisons clean and efficient.
LAG Function Basics
The LAG function accesses data from a previous row in the result set. Its syntax is:
LAG(column_name, offset, default_value) OVER (
[PARTITION BY partition_column]
ORDER BY order_column
)
Parameters:
- column_name: The column value you want to retrieve from the previous row
- offset: How many rows back to look (default is 1)
- default_value: What to return when no previous row exists (default is NULL)
Here’s a practical example using monthly sales data:
CREATE TABLE monthly_sales (
sale_month DATE,
revenue DECIMAL(10,2)
);
INSERT INTO monthly_sales VALUES
('2024-01-01', 45000.00),
('2024-02-01', 52000.00),
('2024-03-01', 48000.00),
('2024-04-01', 61000.00);
SELECT
sale_month,
revenue,
LAG(revenue) OVER (ORDER BY sale_month) AS previous_month_revenue,
revenue - LAG(revenue) OVER (ORDER BY sale_month) AS revenue_change
FROM monthly_sales
ORDER BY sale_month;
Results:
sale_month | revenue | previous_month_revenue | revenue_change
------------+----------+------------------------+----------------
2024-01-01 | 45000.00 | NULL | NULL
2024-02-01 | 52000.00 | 45000.00 | 7000.00
2024-03-01 | 48000.00 | 52000.00 | -4000.00
2024-04-01 | 61000.00 | 48000.00 | 13000.00
The ORDER BY clause is mandatory—it defines what “previous” means. Without it, PostgreSQL doesn’t know which row comes before which.
LEAD Function Basics
LEAD works identically to LAG but looks forward instead of backward. The syntax is:
LEAD(column_name, offset, default_value) OVER (
[PARTITION BY partition_column]
ORDER BY order_column
)
Here’s an example with employee promotions:
CREATE TABLE employee_positions (
employee_id INT,
position_title VARCHAR(100),
start_date DATE,
salary DECIMAL(10,2)
);
INSERT INTO employee_positions VALUES
(101, 'Junior Developer', '2022-01-15', 65000),
(101, 'Developer', '2023-03-01', 80000),
(101, 'Senior Developer', '2024-06-01', 95000);
SELECT
employee_id,
position_title,
start_date,
salary,
LEAD(position_title) OVER (ORDER BY start_date) AS next_position,
LEAD(salary) OVER (ORDER BY start_date) AS next_salary,
LEAD(salary) OVER (ORDER BY start_date) - salary AS salary_increase
FROM employee_positions
WHERE employee_id = 101
ORDER BY start_date;
This shows each position alongside the next one, calculating the salary increase for each promotion.
Practical Use Cases
Calculating Day-Over-Day Changes
Stock price analysis is a classic use case:
CREATE TABLE stock_prices (
ticker VARCHAR(10),
price_date DATE,
closing_price DECIMAL(8,2)
);
SELECT
ticker,
price_date,
closing_price,
LAG(closing_price) OVER (
PARTITION BY ticker
ORDER BY price_date
) AS previous_close,
ROUND(
((closing_price - LAG(closing_price) OVER (
PARTITION BY ticker
ORDER BY price_date
)) / LAG(closing_price) OVER (
PARTITION BY ticker
ORDER BY price_date
) * 100), 2
) AS percent_change
FROM stock_prices
WHERE ticker = 'AAPL'
AND price_date >= '2024-01-01'
ORDER BY price_date;
The PARTITION BY ensures each stock ticker is analyzed independently—you’re not comparing Apple’s Monday price to Microsoft’s Friday price.
Finding Gaps in Time-Series Data
Identifying missing records in sequential data:
CREATE TABLE sensor_readings (
sensor_id INT,
reading_time TIMESTAMP,
temperature DECIMAL(5,2)
);
SELECT
sensor_id,
reading_time,
LEAD(reading_time) OVER (
PARTITION BY sensor_id
ORDER BY reading_time
) AS next_reading_time,
LEAD(reading_time) OVER (
PARTITION BY sensor_id
ORDER BY reading_time
) - reading_time AS time_gap
FROM sensor_readings
WHERE LEAD(reading_time) OVER (
PARTITION BY sensor_id
ORDER BY reading_time
) - reading_time > INTERVAL '1 hour'
ORDER BY sensor_id, reading_time;
This identifies any gaps longer than one hour between consecutive sensor readings.
Comparing User Sessions
Analyzing user behavior across consecutive sessions:
CREATE TABLE user_sessions (
user_id INT,
session_start TIMESTAMP,
pages_viewed INT
);
SELECT
user_id,
session_start,
pages_viewed,
LAG(pages_viewed) OVER (
PARTITION BY user_id
ORDER BY session_start
) AS previous_session_pages,
pages_viewed > LAG(pages_viewed) OVER (
PARTITION BY user_id
ORDER BY session_start
) AS engagement_increased
FROM user_sessions
ORDER BY user_id, session_start;
Advanced Techniques
Custom Offsets and Multiple Partitions
You can look back or forward multiple rows and partition by multiple columns:
CREATE TABLE product_sales (
product_category VARCHAR(50),
product_name VARCHAR(100),
sale_date DATE,
units_sold INT
);
SELECT
product_category,
product_name,
sale_date,
units_sold,
LAG(units_sold, 1, 0) OVER w AS prev_day,
LAG(units_sold, 7, 0) OVER w AS prev_week,
LEAD(units_sold, 1, 0) OVER w AS next_day,
units_sold - LAG(units_sold, 7, 0) OVER w AS week_over_week_change
FROM product_sales
WINDOW w AS (
PARTITION BY product_category, product_name
ORDER BY sale_date
)
ORDER BY product_category, product_name, sale_date;
The WINDOW clause defines the window specification once and reuses it, keeping the query cleaner. The offset of 7 with a default value of 0 lets you compare week-over-week even when there’s no data from seven days ago.
Performance Considerations
Window functions can be expensive. The key to performance is proper indexing on the columns used in PARTITION BY and ORDER BY:
CREATE INDEX idx_product_sales_category_name_date
ON product_sales(product_category, product_name, sale_date);
Compare the query plans:
-- Without index
EXPLAIN ANALYZE
SELECT
product_name,
sale_date,
units_sold,
LAG(units_sold) OVER (
PARTITION BY product_name
ORDER BY sale_date
)
FROM product_sales;
-- With index
-- Same query after creating the index
Without the index, PostgreSQL must sort the entire dataset for each partition. With the index, data is already ordered, and the window function can process rows sequentially. On a table with millions of rows, this difference can be seconds versus minutes.
Window functions generally outperform self-joins for this type of comparison, but self-joins can be faster when you need only a small subset of rows and can leverage highly selective WHERE clauses before the join.
Common Pitfalls and Best Practices
NULL Handling
Always consider what should happen when LAG or LEAD returns NULL:
SELECT
sale_date,
revenue,
revenue - LAG(revenue, 1, revenue) OVER (ORDER BY sale_date) AS change
FROM monthly_sales;
Setting the default to revenue means the first row shows zero change instead of NULL, which might be more meaningful for your analysis.
Partition Boundaries
LAG and LEAD respect partition boundaries—they never reach across partitions:
CREATE TABLE multi_store_sales (
store_id INT,
sale_date DATE,
revenue DECIMAL(10,2)
);
SELECT
store_id,
sale_date,
revenue,
LAG(revenue) OVER (
PARTITION BY store_id
ORDER BY sale_date
) AS previous_revenue
FROM multi_store_sales
ORDER BY store_id, sale_date;
The first row for each store_id will have NULL for previous_revenue, even if another store had sales the day before. This is correct behavior—you’re comparing each store to itself, not to other stores.
Ordering Requirements
Forgetting ORDER BY is a syntax error with LAG and LEAD. But using the wrong ORDER BY is a logic error that’s harder to catch:
-- Wrong: ordering by revenue instead of date
LAG(revenue) OVER (PARTITION BY store_id ORDER BY revenue)
-- Right: ordering by date to get temporal sequence
LAG(revenue) OVER (PARTITION BY store_id ORDER BY sale_date)
Always verify your ORDER BY matches your analytical intent.
Reusing Window Definitions
When using LAG and LEAD multiple times with the same window specification, use the WINDOW clause to avoid repetition and potential inconsistencies:
SELECT
sale_date,
revenue,
LAG(revenue) OVER w AS prev_revenue,
LEAD(revenue) OVER w AS next_revenue,
revenue - LAG(revenue) OVER w AS change_from_prev
FROM monthly_sales
WINDOW w AS (ORDER BY sale_date);
This ensures all window functions use identical specifications and makes the query easier to maintain.
LAG and LEAD transform complex temporal comparisons from multi-join nightmares into readable, performant queries. Master these functions, index appropriately, and you’ll handle row-to-row analysis with confidence.