SQL - LEAD() and LAG() Functions | Application Architect

Key Insights

LEAD() and LAG() are window functions that access data from subsequent or previous rows without self-joins, reducing query complexity and improving performance by up to 50% compared to traditional approaches
These functions enable row-by-row comparisons essential for trend analysis, calculating deltas, detecting gaps in sequences, and identifying state changes in time-series data
Understanding the PARTITION BY and ORDER BY clauses is critical—they control how data is divided into groups and sequenced, directly affecting which rows LEAD() and LAG() retrieve

Understanding Window Function Basics

LEAD() and LAG() belong to the window function family, operating on a “window” of rows related to the current row. Unlike aggregate functions that collapse multiple rows into one, window functions maintain row-level detail while accessing data from other rows.

LAG() retrieves data from a previous row at a specified offset. LEAD() retrieves data from a subsequent row. Both follow this syntax:

LAG(column_name, offset, default_value) OVER (
    PARTITION BY partition_column
    ORDER BY order_column
)

LEAD(column_name, offset, default_value) OVER (
    PARTITION BY partition_column
    ORDER BY order_column
)

Parameters:

column_name: The column value to retrieve from the offset row
offset: Number of rows backward (LAG) or forward (LEAD). Defaults to 1
default_value: Value returned when offset goes beyond partition boundaries. Defaults to NULL
PARTITION BY: Optional. Divides result set into partitions
ORDER BY: Required. Defines row sequence within each partition

Calculating Period-Over-Period Changes

A common use case involves comparing current values against previous periods to calculate growth rates or deltas.

-- Sales growth analysis
WITH daily_sales AS (
    SELECT 
        sale_date,
        SUM(amount) as total_sales
    FROM orders
    WHERE sale_date >= '2024-01-01'
    GROUP BY sale_date
)
SELECT 
    sale_date,
    total_sales,
    LAG(total_sales, 1) OVER (ORDER BY sale_date) as previous_day_sales,
    total_sales - LAG(total_sales, 1) OVER (ORDER BY sale_date) as sales_change,
    ROUND(
        ((total_sales - LAG(total_sales, 1) OVER (ORDER BY sale_date)) / 
         LAG(total_sales, 1) OVER (ORDER BY sale_date) * 100), 
        2
    ) as percent_change
FROM daily_sales
ORDER BY sale_date;

This query calculates daily sales changes and percentage growth. The LAG() function eliminates the need for a self-join, which would look like this less efficient alternative:

-- Less efficient self-join approach
SELECT 
    t1.sale_date,
    t1.total_sales,
    t2.total_sales as previous_day_sales,
    t1.total_sales - t2.total_sales as sales_change
FROM daily_sales t1
LEFT JOIN daily_sales t2 
    ON t1.sale_date = t2.sale_date + INTERVAL '1 day'
ORDER BY t1.sale_date;

Partitioning Data for Group-Level Analysis

PARTITION BY divides data into independent groups, resetting the window for each partition. This enables comparative analysis across categories.

-- Product sales comparison by category
SELECT 
    category,
    product_name,
    sale_date,
    revenue,
    LAG(revenue, 1) OVER (
        PARTITION BY category 
        ORDER BY sale_date
    ) as prev_revenue,
    LEAD(revenue, 1) OVER (
        PARTITION BY category 
        ORDER BY sale_date
    ) as next_revenue,
    revenue - LAG(revenue, 1) OVER (
        PARTITION BY category 
        ORDER BY sale_date
    ) as revenue_change
FROM product_sales
WHERE sale_date BETWEEN '2024-01-01' AND '2024-03-31'
ORDER BY category, sale_date;

Each category operates independently. When LAG() reaches the first row in a partition or LEAD() reaches the last row, they return NULL (or the specified default value).

Detecting Sequence Gaps and Missing Values

LEAD() excels at identifying gaps in sequential data like missing order numbers, date ranges, or ID sequences.

-- Find gaps in sequential order IDs
WITH order_sequence AS (
    SELECT 
        order_id,
        order_date,
        LEAD(order_id, 1) OVER (ORDER BY order_id) as next_order_id
    FROM orders
)
SELECT 
    order_id,
    next_order_id,
    next_order_id - order_id - 1 as missing_count,
    order_id + 1 as gap_start,
    next_order_id - 1 as gap_end
FROM order_sequence
WHERE next_order_id - order_id > 1
ORDER BY order_id;

For temporal gaps:

-- Identify missing dates in time series
WITH RECURSIVE date_series AS (
    SELECT DATE '2024-01-01' as check_date
    UNION ALL
    SELECT check_date + INTERVAL '1 day'
    FROM date_series
    WHERE check_date < DATE '2024-12-31'
),
actual_dates AS (
    SELECT DISTINCT transaction_date
    FROM transactions
    WHERE transaction_date >= '2024-01-01'
)
SELECT 
    ds.check_date as missing_date,
    LAG(ad.transaction_date, 1) OVER (ORDER BY ds.check_date) as last_transaction,
    LEAD(ad.transaction_date, 1) OVER (ORDER BY ds.check_date) as next_transaction
FROM date_series ds
LEFT JOIN actual_dates ad ON ds.check_date = ad.transaction_date
WHERE ad.transaction_date IS NULL;

State Change Detection

Tracking when values change states—like status transitions, price changes, or threshold crossings—becomes straightforward with LAG().

-- Detect customer status changes
SELECT 
    customer_id,
    status_date,
    status,
    LAG(status, 1) OVER (
        PARTITION BY customer_id 
        ORDER BY status_date
    ) as previous_status,
    CASE 
        WHEN status != LAG(status, 1) OVER (
            PARTITION BY customer_id 
            ORDER BY status_date
        ) THEN 'Status Changed'
        ELSE 'No Change'
    END as change_indicator,
    status_date - LAG(status_date, 1) OVER (
        PARTITION BY customer_id 
        ORDER BY status_date
    ) as days_in_previous_status
FROM customer_status_history
WHERE status_date >= '2024-01-01'
ORDER BY customer_id, status_date;

Multi-Row Offsets and Moving Calculations

The offset parameter accepts any positive integer, enabling comparisons across multiple rows.

-- Compare current week with 4 weeks ago (monthly comparison)
SELECT 
    week_start_date,
    total_orders,
    LAG(total_orders, 1) OVER (ORDER BY week_start_date) as last_week,
    LAG(total_orders, 4) OVER (ORDER BY week_start_date) as four_weeks_ago,
    total_orders - LAG(total_orders, 4) OVER (ORDER BY week_start_date) as monthly_change,
    ROUND(
        (total_orders::NUMERIC / NULLIF(LAG(total_orders, 4) OVER (ORDER BY week_start_date), 0) - 1) * 100,
        2
    ) as monthly_growth_pct
FROM weekly_order_summary
ORDER BY week_start_date;

Performance Considerations

Window functions generally outperform self-joins, but optimization matters at scale:

-- Create index to support ORDER BY in window function
CREATE INDEX idx_orders_date ON orders(order_date);

-- For partitioned queries
CREATE INDEX idx_sales_category_date ON sales(category, sale_date);

-- Limit window scope when possible
SELECT 
    product_id,
    sale_date,
    quantity,
    LAG(quantity, 1) OVER (
        PARTITION BY product_id 
        ORDER BY sale_date 
        ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
    ) as prev_quantity
FROM sales
WHERE sale_date >= CURRENT_DATE - INTERVAL '90 days';

The ROWS BETWEEN clause restricts the window frame, potentially reducing memory usage for large datasets.

Handling NULL Values with Default Parameters

The third parameter provides fallback values when offsets exceed partition boundaries:

-- Use default values to avoid NULL complications
SELECT 
    transaction_id,
    transaction_date,
    amount,
    LAG(amount, 1, 0) OVER (ORDER BY transaction_date) as previous_amount,
    amount - LAG(amount, 1, 0) OVER (ORDER BY transaction_date) as amount_diff,
    LEAD(amount, 1, amount) OVER (ORDER BY transaction_date) as next_amount_or_current
FROM transactions
ORDER BY transaction_date;

This approach prevents NULL propagation in arithmetic operations and provides sensible defaults for edge cases.

Combining LEAD and LAG for Range Analysis

Using both functions together identifies values between previous and next data points:

-- Find anomalies where current value differs significantly from neighbors
SELECT 
    sensor_id,
    reading_time,
    temperature,
    LAG(temperature, 1) OVER (
        PARTITION BY sensor_id 
        ORDER BY reading_time
    ) as prev_temp,
    LEAD(temperature, 1) OVER (
        PARTITION BY sensor_id 
        ORDER BY reading_time
    ) as next_temp,
    ABS(temperature - LAG(temperature, 1) OVER (
        PARTITION BY sensor_id ORDER BY reading_time
    )) as prev_diff,
    ABS(temperature - LEAD(temperature, 1) OVER (
        PARTITION BY sensor_id ORDER BY reading_time
    )) as next_diff
FROM sensor_readings
WHERE reading_time >= NOW() - INTERVAL '24 hours'
    AND (
        ABS(temperature - LAG(temperature, 1) OVER (
            PARTITION BY sensor_id ORDER BY reading_time
        )) > 10
        OR ABS(temperature - LEAD(temperature, 1) OVER (
            PARTITION BY sensor_id ORDER BY reading_time
        )) > 10
    );

This pattern identifies sensor readings that spike dramatically compared to surrounding values, useful for anomaly detection and data quality validation.