How to Use LAG and LEAD in MySQL
Window functions arrived in MySQL 8.0 as a game-changer for analytical queries. Before them, comparing a row's value with previous or subsequent rows required self-joins—verbose, error-prone SQL that...
Key Insights
- LAG and LEAD window functions eliminate complex self-joins when comparing row values, reducing query complexity and often improving performance in MySQL 8.0+
- The PARTITION BY clause segments your data into logical groups where LAG/LEAD operate independently, essential for multi-category or multi-entity comparisons
- Offset parameters let you look multiple rows ahead or behind, while default values prevent NULL pollution when accessing non-existent rows at partition boundaries
Understanding Window Functions in MySQL
Window functions arrived in MySQL 8.0 as a game-changer for analytical queries. Before them, comparing a row’s value with previous or subsequent rows required self-joins—verbose, error-prone SQL that often confused the query optimizer. LAG and LEAD are two of the most practical window functions, letting you access data from other rows relative to the current row without JOIN gymnastics.
LAG looks backward in your result set; LEAD looks forward. Both operate within a “window” of rows defined by PARTITION BY and ORDER BY clauses. This makes them perfect for time-series analysis, trend detection, and calculating deltas between sequential records.
The LAG Function Explained
LAG retrieves a value from a previous row in the result set. The syntax is straightforward:
LAG(column_name, offset, default_value) OVER (
PARTITION BY partition_column
ORDER BY order_column
)
- column_name: The column value you want to retrieve from a previous row
- offset: How many rows back to look (default is 1)
- default_value: What to return when no previous row exists (default is NULL)
Here’s a practical example using monthly sales data:
CREATE TABLE monthly_sales (
sale_month DATE,
revenue DECIMAL(10,2)
);
INSERT INTO monthly_sales VALUES
('2024-01-01', 45000.00),
('2024-02-01', 52000.00),
('2024-03-01', 48000.00),
('2024-04-01', 61000.00);
SELECT
sale_month,
revenue,
LAG(revenue) OVER (ORDER BY sale_month) AS previous_month_revenue,
revenue - LAG(revenue) OVER (ORDER BY sale_month) AS revenue_change
FROM monthly_sales;
Results:
+------------+----------+------------------------+----------------+
| sale_month | revenue | previous_month_revenue | revenue_change |
+------------+----------+------------------------+----------------+
| 2024-01-01 | 45000.00 | NULL | NULL |
| 2024-02-01 | 52000.00 | 45000.00 | 7000.00 |
| 2024-03-01 | 48000.00 | 52000.00 | -4000.00 |
| 2024-04-01 | 61000.00 | 48000.00 | 13000.00 |
+------------+----------+------------------------+----------------+
Notice the first row returns NULL for previous_month_revenue because there’s no preceding row. This is where default values become useful.
The LEAD Function Explained
LEAD works identically to LAG but looks forward instead of backward. The syntax mirrors LAG:
LEAD(column_name, offset, default_value) OVER (
PARTITION BY partition_column
ORDER BY order_column
)
Here’s an employee salary tracking example:
CREATE TABLE salary_history (
employee_id INT,
effective_date DATE,
salary DECIMAL(10,2)
);
INSERT INTO salary_history VALUES
(101, '2023-01-01', 65000.00),
(101, '2023-07-01', 68000.00),
(101, '2024-01-01', 72000.00),
(101, '2024-07-01', 75000.00);
SELECT
employee_id,
effective_date,
salary,
LEAD(effective_date) OVER (ORDER BY effective_date) AS next_increase_date,
LEAD(salary) OVER (ORDER BY effective_date) AS next_salary,
DATEDIFF(
LEAD(effective_date) OVER (ORDER BY effective_date),
effective_date
) AS days_until_next_increase
FROM salary_history
WHERE employee_id = 101;
This shows each salary record alongside when the next increase happens and how many days until that increase—information impossible to get cleanly without LEAD or a self-join.
Mastering Offset and Default Parameters
The offset parameter unlocks more sophisticated comparisons. Want to compare quarterly data? Use offset=3 on monthly records. Need year-over-year analysis? Use offset=12.
CREATE TABLE quarterly_metrics (
quarter_date DATE,
customer_count INT
);
INSERT INTO quarterly_metrics VALUES
('2023-Q1', 1200),
('2023-Q2', 1350),
('2023-Q3', 1425),
('2023-Q4', 1580),
('2024-Q1', 1650);
SELECT
quarter_date,
customer_count,
LAG(customer_count, 1, 0) OVER (ORDER BY quarter_date) AS prev_quarter,
LAG(customer_count, 4, 0) OVER (ORDER BY quarter_date) AS same_quarter_last_year,
customer_count - LAG(customer_count, 4, customer_count) OVER (ORDER BY quarter_date) AS yoy_growth
FROM quarterly_metrics;
The default parameter (third argument) prevents NULL values. Here we use 0 as a default, but you could use the current row’s value, an average, or any sensible fallback. Alternatively, wrap LAG in COALESCE:
SELECT
quarter_date,
customer_count,
COALESCE(LAG(customer_count, 4) OVER (ORDER BY quarter_date), customer_count) AS comparison_value
FROM quarterly_metrics;
Partitioning Data with PARTITION BY
PARTITION BY is where LAG and LEAD become truly powerful. It segments your data into independent groups, resetting the window function for each partition. This prevents comparisons across unrelated entities.
CREATE TABLE product_inventory (
product_id INT,
category VARCHAR(50),
check_date DATE,
stock_level INT
);
INSERT INTO product_inventory VALUES
(1, 'Electronics', '2024-01-01', 150),
(1, 'Electronics', '2024-01-02', 145),
(1, 'Electronics', '2024-01-03', 132),
(2, 'Clothing', '2024-01-01', 300),
(2, 'Clothing', '2024-01-02', 285),
(2, 'Clothing', '2024-01-03', 290);
SELECT
product_id,
category,
check_date,
stock_level,
LAG(stock_level) OVER (
PARTITION BY product_id
ORDER BY check_date
) AS previous_stock,
stock_level - LAG(stock_level) OVER (
PARTITION BY product_id
ORDER BY check_date
) AS daily_change
FROM product_inventory
ORDER BY product_id, check_date;
Without PARTITION BY, the LAG for product 2’s first record would incorrectly reference product 1’s last record. PARTITION BY ensures each product’s inventory is analyzed independently.
Real-World Use Cases
Calculating Percentage Changes:
SELECT
visit_date,
page_views,
LAG(page_views) OVER (ORDER BY visit_date) AS previous_day_views,
ROUND(
((page_views - LAG(page_views) OVER (ORDER BY visit_date)) /
LAG(page_views) OVER (ORDER BY visit_date) * 100),
2
) AS percent_change
FROM daily_traffic
WHERE visit_date >= DATE_SUB(CURDATE(), INTERVAL 30 DAY);
Detecting Sequence Gaps:
SELECT
order_id,
LEAD(order_id) OVER (ORDER BY order_id) AS next_order_id,
LEAD(order_id) OVER (ORDER BY order_id) - order_id AS gap
FROM orders
HAVING gap > 1;
This immediately identifies missing order IDs, useful for auditing or detecting data integrity issues.
Session Analysis:
SELECT
user_id,
event_timestamp,
LEAD(event_timestamp) OVER (
PARTITION BY user_id
ORDER BY event_timestamp
) AS next_event,
TIMESTAMPDIFF(
MINUTE,
event_timestamp,
LEAD(event_timestamp) OVER (PARTITION BY user_id ORDER BY event_timestamp)
) AS minutes_to_next_event
FROM user_events
WHERE minutes_to_next_event > 30 OR minutes_to_next_event IS NULL;
This identifies session boundaries by finding gaps exceeding 30 minutes between user actions.
Performance Considerations
Window functions generally outperform equivalent self-joins, but they’re not free. MySQL must sort data according to your ORDER BY clause and maintain partition boundaries. Proper indexing is critical.
Index your ORDER BY columns:
CREATE INDEX idx_sales_date ON monthly_sales(sale_month);
CREATE INDEX idx_inventory_product_date ON product_inventory(product_id, check_date);
Compare execution plans:
-- Using LAG
EXPLAIN SELECT
sale_month,
revenue,
LAG(revenue) OVER (ORDER BY sale_month) AS prev_revenue
FROM monthly_sales;
-- Equivalent self-join
EXPLAIN SELECT
s1.sale_month,
s1.revenue,
s2.revenue AS prev_revenue
FROM monthly_sales s1
LEFT JOIN monthly_sales s2
ON s2.sale_month = DATE_SUB(s1.sale_month, INTERVAL 1 MONTH);
The LAG version typically shows fewer table accesses and simpler execution. However, for very large datasets with complex partitioning, monitor query performance and consider materialized views if these calculations run frequently.
Avoid repeated window definitions. This query recalculates the window twice:
-- Inefficient
SELECT
revenue,
LAG(revenue) OVER (ORDER BY sale_month) AS prev,
revenue - LAG(revenue) OVER (ORDER BY sale_month) AS change
FROM monthly_sales;
Use a CTE or derived table instead:
-- Efficient
WITH lagged AS (
SELECT
sale_month,
revenue,
LAG(revenue) OVER (ORDER BY sale_month) AS prev_revenue
FROM monthly_sales
)
SELECT
sale_month,
revenue,
prev_revenue,
revenue - prev_revenue AS change
FROM lagged;
LAG and LEAD transform complex analytical queries into readable, maintainable SQL. Master these functions and you’ll write cleaner code while often gaining performance improvements over traditional approaches.