SQL - AVG() as Window Function (Moving Average)

• Window functions with AVG() calculate moving averages without collapsing rows, unlike GROUP BY aggregates that reduce result sets

Key Insights

• Window functions with AVG() calculate moving averages without collapsing rows, unlike GROUP BY aggregates that reduce result sets • The ROWS and RANGE clauses define frame boundaries for calculating averages over specific subsets of ordered data • Moving averages smooth out data fluctuations and reveal trends in time-series data, essential for financial analysis, inventory management, and performance monitoring

Understanding AVG() as a Window Function

The AVG() window function computes averages across a sliding window of rows while preserving individual row details. Unlike traditional GROUP BY aggregates, window functions don’t collapse your result set—each row remains intact with its calculated average.

Basic syntax:

AVG(expression) OVER (
    [PARTITION BY partition_expression]
    [ORDER BY sort_expression]
    [ROWS or RANGE frame_specification]
)

The OVER clause transforms AVG() from a simple aggregate into a window function. Without PARTITION BY, it calculates across the entire result set. ORDER BY determines row sequence, while frame specifications control which rows participate in each calculation.

Simple Moving Average

Calculate a 3-day moving average of daily sales:

CREATE TABLE daily_sales (
    sale_date DATE,
    revenue DECIMAL(10,2)
);

INSERT INTO daily_sales VALUES
    ('2024-01-01', 1200.00),
    ('2024-01-02', 1500.00),
    ('2024-01-03', 1800.00),
    ('2024-01-04', 1300.00),
    ('2024-01-05', 1600.00),
    ('2024-01-06', 2000.00),
    ('2024-01-07', 1700.00);

SELECT 
    sale_date,
    revenue,
    AVG(revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3day
FROM daily_sales
ORDER BY sale_date;

Results:

sale_date   | revenue | moving_avg_3day
------------|---------|----------------
2024-01-01  | 1200.00 | 1200.00
2024-01-02  | 1500.00 | 1350.00
2024-01-03  | 1800.00 | 1500.00
2024-01-04  | 1300.00 | 1533.33
2024-01-05  | 1600.00 | 1566.67
2024-01-06  | 2000.00 | 1633.33
2024-01-07  | 1700.00 | 1766.67

The first row averages only itself (1200). The second averages two rows (1200, 1500). From the third row onward, each average includes exactly three rows.

ROWS vs RANGE Frame Specifications

ROWS operates on physical row positions, while RANGE handles logical value ranges. This distinction matters when dealing with duplicate values or gaps in your data.

CREATE TABLE stock_prices (
    price_date DATE,
    symbol VARCHAR(10),
    closing_price DECIMAL(8,2)
);

INSERT INTO stock_prices VALUES
    ('2024-01-01', 'AAPL', 150.00),
    ('2024-01-01', 'AAPL', 150.00), -- duplicate date
    ('2024-01-02', 'AAPL', 152.00),
    ('2024-01-03', 'AAPL', 151.00),
    ('2024-01-04', 'AAPL', 153.00);

-- ROWS: counts physical rows
SELECT 
    price_date,
    closing_price,
    AVG(closing_price) OVER (
        ORDER BY price_date
        ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
    ) AS rows_avg
FROM stock_prices
WHERE symbol = 'AAPL';

-- RANGE: groups by logical value
SELECT 
    price_date,
    closing_price,
    AVG(closing_price) OVER (
        ORDER BY price_date
        RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND CURRENT ROW
    ) AS range_avg
FROM stock_prices
WHERE symbol = 'AAPL';

ROWS treats each physical row separately, even duplicates. RANGE groups rows with identical ORDER BY values together, treating them as peers.

Partitioned Moving Averages

Analyze multiple products independently using PARTITION BY:

CREATE TABLE product_metrics (
    metric_date DATE,
    product_id INT,
    units_sold INT
);

INSERT INTO product_metrics VALUES
    ('2024-01-01', 101, 50),
    ('2024-01-01', 102, 75),
    ('2024-01-02', 101, 60),
    ('2024-01-02', 102, 80),
    ('2024-01-03', 101, 55),
    ('2024-01-03', 102, 70),
    ('2024-01-04', 101, 65),
    ('2024-01-04', 102, 85);

SELECT 
    metric_date,
    product_id,
    units_sold,
    AVG(units_sold) OVER (
        PARTITION BY product_id
        ORDER BY metric_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3day,
    AVG(units_sold) OVER (
        PARTITION BY product_id
        ORDER BY metric_date
    ) AS cumulative_avg
FROM product_metrics
ORDER BY product_id, metric_date;

Each partition calculates independently. Product 101’s moving average never includes Product 102’s data. The cumulative average expands from the first row to the current row within each partition.

Centered Moving Average

Center the window around the current row for balanced smoothing:

SELECT 
    sale_date,
    revenue,
    AVG(revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
    ) AS centered_avg
FROM daily_sales
ORDER BY sale_date;

This creates a 3-row window: one before, current, and one after. The first and last rows have incomplete windows (2 rows instead of 3), which may require special handling depending on your requirements.

Weighted Moving Average Alternative

While AVG() provides equal weighting, you can simulate weighted averages using SUM() and window functions:

SELECT 
    sale_date,
    revenue,
    AVG(revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS simple_avg,
    (
        SUM(revenue * weight) OVER (
            ORDER BY sale_date
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ) / 
        SUM(weight) OVER (
            ORDER BY sale_date
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        )
    ) AS weighted_avg
FROM (
    SELECT 
        sale_date,
        revenue,
        ROW_NUMBER() OVER (ORDER BY sale_date DESC) AS weight
    FROM daily_sales
) weighted_data
ORDER BY sale_date;

This applies higher weights to more recent values, making the average more responsive to recent changes.

Handling NULL Values

AVG() ignores NULL values in its calculations:

CREATE TABLE incomplete_data (
    data_date DATE,
    measurement DECIMAL(8,2)
);

INSERT INTO incomplete_data VALUES
    ('2024-01-01', 100.00),
    ('2024-01-02', NULL),
    ('2024-01-03', 120.00),
    ('2024-01-04', 110.00);

SELECT 
    data_date,
    measurement,
    AVG(measurement) OVER (
        ORDER BY data_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg,
    COUNT(measurement) OVER (
        ORDER BY data_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS non_null_count
FROM incomplete_data
ORDER BY data_date;

The NULL on 2024-01-02 doesn’t contribute to averages. The moving average for 2024-01-03 uses only two values (100.00 and 120.00), not three.

Performance Considerations

Window functions can be resource-intensive on large datasets. Optimize with these techniques:

-- Create index on ORDER BY columns
CREATE INDEX idx_sales_date ON daily_sales(sale_date);

-- Materialize results for frequently accessed moving averages
CREATE MATERIALIZED VIEW sales_moving_avg AS
SELECT 
    sale_date,
    revenue,
    AVG(revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS weekly_avg
FROM daily_sales;

-- Refresh periodically instead of calculating on every query
REFRESH MATERIALIZED VIEW sales_moving_avg;

For real-time dashboards querying massive tables, consider pre-computing moving averages during ETL processes rather than calculating them on-demand.

Practical Application: Anomaly Detection

Identify sales spikes by comparing daily revenue to moving averages:

WITH sales_with_avg AS (
    SELECT 
        sale_date,
        revenue,
        AVG(revenue) OVER (
            ORDER BY sale_date
            ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING
        ) AS prev_week_avg
    FROM daily_sales
)
SELECT 
    sale_date,
    revenue,
    prev_week_avg,
    ROUND(((revenue - prev_week_avg) / prev_week_avg * 100), 2) AS pct_deviation
FROM sales_with_avg
WHERE ABS((revenue - prev_week_avg) / prev_week_avg) > 0.20
ORDER BY sale_date;

This identifies days where revenue deviates more than 20% from the previous week’s average, flagging potential anomalies for investigation.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.