SQL - LEAD() and LAG() Functions
LEAD() and LAG() belong to the window function family, operating on a 'window' of rows related to the current row. Unlike aggregate functions that collapse multiple rows into one, window functions...
Key Insights
- LEAD() and LAG() are window functions that access data from subsequent or previous rows without self-joins, reducing query complexity and improving performance by up to 50% compared to traditional approaches
- These functions enable row-by-row comparisons essential for trend analysis, calculating deltas, detecting gaps in sequences, and identifying state changes in time-series data
- Understanding the PARTITION BY and ORDER BY clauses is critical—they control how data is divided into groups and sequenced, directly affecting which rows LEAD() and LAG() retrieve
Understanding Window Function Basics
LEAD() and LAG() belong to the window function family, operating on a “window” of rows related to the current row. Unlike aggregate functions that collapse multiple rows into one, window functions maintain row-level detail while accessing data from other rows.
LAG() retrieves data from a previous row at a specified offset. LEAD() retrieves data from a subsequent row. Both follow this syntax:
LAG(column_name, offset, default_value) OVER (
PARTITION BY partition_column
ORDER BY order_column
)
LEAD(column_name, offset, default_value) OVER (
PARTITION BY partition_column
ORDER BY order_column
)
Parameters:
- column_name: The column value to retrieve from the offset row
- offset: Number of rows backward (LAG) or forward (LEAD). Defaults to 1
- default_value: Value returned when offset goes beyond partition boundaries. Defaults to NULL
- PARTITION BY: Optional. Divides result set into partitions
- ORDER BY: Required. Defines row sequence within each partition
Calculating Period-Over-Period Changes
A common use case involves comparing current values against previous periods to calculate growth rates or deltas.
-- Sales growth analysis
WITH daily_sales AS (
SELECT
sale_date,
SUM(amount) as total_sales
FROM orders
WHERE sale_date >= '2024-01-01'
GROUP BY sale_date
)
SELECT
sale_date,
total_sales,
LAG(total_sales, 1) OVER (ORDER BY sale_date) as previous_day_sales,
total_sales - LAG(total_sales, 1) OVER (ORDER BY sale_date) as sales_change,
ROUND(
((total_sales - LAG(total_sales, 1) OVER (ORDER BY sale_date)) /
LAG(total_sales, 1) OVER (ORDER BY sale_date) * 100),
2
) as percent_change
FROM daily_sales
ORDER BY sale_date;
This query calculates daily sales changes and percentage growth. The LAG() function eliminates the need for a self-join, which would look like this less efficient alternative:
-- Less efficient self-join approach
SELECT
t1.sale_date,
t1.total_sales,
t2.total_sales as previous_day_sales,
t1.total_sales - t2.total_sales as sales_change
FROM daily_sales t1
LEFT JOIN daily_sales t2
ON t1.sale_date = t2.sale_date + INTERVAL '1 day'
ORDER BY t1.sale_date;
Partitioning Data for Group-Level Analysis
PARTITION BY divides data into independent groups, resetting the window for each partition. This enables comparative analysis across categories.
-- Product sales comparison by category
SELECT
category,
product_name,
sale_date,
revenue,
LAG(revenue, 1) OVER (
PARTITION BY category
ORDER BY sale_date
) as prev_revenue,
LEAD(revenue, 1) OVER (
PARTITION BY category
ORDER BY sale_date
) as next_revenue,
revenue - LAG(revenue, 1) OVER (
PARTITION BY category
ORDER BY sale_date
) as revenue_change
FROM product_sales
WHERE sale_date BETWEEN '2024-01-01' AND '2024-03-31'
ORDER BY category, sale_date;
Each category operates independently. When LAG() reaches the first row in a partition or LEAD() reaches the last row, they return NULL (or the specified default value).
Detecting Sequence Gaps and Missing Values
LEAD() excels at identifying gaps in sequential data like missing order numbers, date ranges, or ID sequences.
-- Find gaps in sequential order IDs
WITH order_sequence AS (
SELECT
order_id,
order_date,
LEAD(order_id, 1) OVER (ORDER BY order_id) as next_order_id
FROM orders
)
SELECT
order_id,
next_order_id,
next_order_id - order_id - 1 as missing_count,
order_id + 1 as gap_start,
next_order_id - 1 as gap_end
FROM order_sequence
WHERE next_order_id - order_id > 1
ORDER BY order_id;
For temporal gaps:
-- Identify missing dates in time series
WITH RECURSIVE date_series AS (
SELECT DATE '2024-01-01' as check_date
UNION ALL
SELECT check_date + INTERVAL '1 day'
FROM date_series
WHERE check_date < DATE '2024-12-31'
),
actual_dates AS (
SELECT DISTINCT transaction_date
FROM transactions
WHERE transaction_date >= '2024-01-01'
)
SELECT
ds.check_date as missing_date,
LAG(ad.transaction_date, 1) OVER (ORDER BY ds.check_date) as last_transaction,
LEAD(ad.transaction_date, 1) OVER (ORDER BY ds.check_date) as next_transaction
FROM date_series ds
LEFT JOIN actual_dates ad ON ds.check_date = ad.transaction_date
WHERE ad.transaction_date IS NULL;
State Change Detection
Tracking when values change states—like status transitions, price changes, or threshold crossings—becomes straightforward with LAG().
-- Detect customer status changes
SELECT
customer_id,
status_date,
status,
LAG(status, 1) OVER (
PARTITION BY customer_id
ORDER BY status_date
) as previous_status,
CASE
WHEN status != LAG(status, 1) OVER (
PARTITION BY customer_id
ORDER BY status_date
) THEN 'Status Changed'
ELSE 'No Change'
END as change_indicator,
status_date - LAG(status_date, 1) OVER (
PARTITION BY customer_id
ORDER BY status_date
) as days_in_previous_status
FROM customer_status_history
WHERE status_date >= '2024-01-01'
ORDER BY customer_id, status_date;
Multi-Row Offsets and Moving Calculations
The offset parameter accepts any positive integer, enabling comparisons across multiple rows.
-- Compare current week with 4 weeks ago (monthly comparison)
SELECT
week_start_date,
total_orders,
LAG(total_orders, 1) OVER (ORDER BY week_start_date) as last_week,
LAG(total_orders, 4) OVER (ORDER BY week_start_date) as four_weeks_ago,
total_orders - LAG(total_orders, 4) OVER (ORDER BY week_start_date) as monthly_change,
ROUND(
(total_orders::NUMERIC / NULLIF(LAG(total_orders, 4) OVER (ORDER BY week_start_date), 0) - 1) * 100,
2
) as monthly_growth_pct
FROM weekly_order_summary
ORDER BY week_start_date;
Performance Considerations
Window functions generally outperform self-joins, but optimization matters at scale:
-- Create index to support ORDER BY in window function
CREATE INDEX idx_orders_date ON orders(order_date);
-- For partitioned queries
CREATE INDEX idx_sales_category_date ON sales(category, sale_date);
-- Limit window scope when possible
SELECT
product_id,
sale_date,
quantity,
LAG(quantity, 1) OVER (
PARTITION BY product_id
ORDER BY sale_date
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
) as prev_quantity
FROM sales
WHERE sale_date >= CURRENT_DATE - INTERVAL '90 days';
The ROWS BETWEEN clause restricts the window frame, potentially reducing memory usage for large datasets.
Handling NULL Values with Default Parameters
The third parameter provides fallback values when offsets exceed partition boundaries:
-- Use default values to avoid NULL complications
SELECT
transaction_id,
transaction_date,
amount,
LAG(amount, 1, 0) OVER (ORDER BY transaction_date) as previous_amount,
amount - LAG(amount, 1, 0) OVER (ORDER BY transaction_date) as amount_diff,
LEAD(amount, 1, amount) OVER (ORDER BY transaction_date) as next_amount_or_current
FROM transactions
ORDER BY transaction_date;
This approach prevents NULL propagation in arithmetic operations and provides sensible defaults for edge cases.
Combining LEAD and LAG for Range Analysis
Using both functions together identifies values between previous and next data points:
-- Find anomalies where current value differs significantly from neighbors
SELECT
sensor_id,
reading_time,
temperature,
LAG(temperature, 1) OVER (
PARTITION BY sensor_id
ORDER BY reading_time
) as prev_temp,
LEAD(temperature, 1) OVER (
PARTITION BY sensor_id
ORDER BY reading_time
) as next_temp,
ABS(temperature - LAG(temperature, 1) OVER (
PARTITION BY sensor_id ORDER BY reading_time
)) as prev_diff,
ABS(temperature - LEAD(temperature, 1) OVER (
PARTITION BY sensor_id ORDER BY reading_time
)) as next_diff
FROM sensor_readings
WHERE reading_time >= NOW() - INTERVAL '24 hours'
AND (
ABS(temperature - LAG(temperature, 1) OVER (
PARTITION BY sensor_id ORDER BY reading_time
)) > 10
OR ABS(temperature - LEAD(temperature, 1) OVER (
PARTITION BY sensor_id ORDER BY reading_time
)) > 10
);
This pattern identifies sensor readings that spike dramatically compared to surrounding values, useful for anomaly detection and data quality validation.