How to Use GROUPING SETS in PostgreSQL

When building reporting queries, you often need aggregations at multiple levels: product-level sales, regional totals, and a grand total. The traditional approach requires writing separate GROUP BY...

Key Insights

  • GROUPING SETS eliminates the need for multiple queries or UNION ALL statements when generating subtotals and grand totals, executing a single table scan instead of multiple passes over your data.
  • ROLLUP and CUBE are syntactic shortcuts for common GROUPING SETS patterns—ROLLUP creates hierarchical subtotals while CUBE generates all possible dimensional combinations.
  • The GROUPING() function distinguishes aggregate rows from detail rows, enabling you to replace NULL values in subtotal rows with meaningful labels like “All Regions” or “Total”.

Introduction to GROUPING SETS

When building reporting queries, you often need aggregations at multiple levels: product-level sales, regional totals, and a grand total. The traditional approach requires writing separate GROUP BY queries and combining them with UNION ALL, which is verbose and inefficient—PostgreSQL scans your table multiple times.

GROUPING SETS solves this by allowing multiple grouping dimensions in a single query. Instead of three table scans, you get one. Instead of managing multiple queries, you write one clean statement.

Here’s the difference:

-- Traditional approach: Multiple queries with UNION ALL
SELECT product, region, SUM(revenue) as total_revenue
FROM sales
GROUP BY product, region

UNION ALL

SELECT product, NULL, SUM(revenue)
FROM sales
GROUP BY product

UNION ALL

SELECT NULL, NULL, SUM(revenue)
FROM sales;

Compare that to the GROUPING SETS approach:

-- GROUPING SETS: Single query, one table scan
SELECT product, region, SUM(revenue) as total_revenue
FROM sales
GROUP BY GROUPING SETS (
    (product, region),
    (product),
    ()
);

The second query is cleaner, faster, and easier to maintain. PostgreSQL optimizes it internally, performing calculations for all grouping levels in a single pass.

Basic GROUPING SETS Syntax

The syntax is straightforward: list each grouping dimension as a tuple within GROUPING SETS. An empty tuple () represents the grand total with no grouping columns.

Let’s work with a realistic sales table:

CREATE TABLE sales (
    sale_id SERIAL PRIMARY KEY,
    product VARCHAR(50),
    region VARCHAR(50),
    sale_date DATE,
    revenue DECIMAL(10,2)
);

INSERT INTO sales (product, region, sale_date, revenue) VALUES
    ('Laptop', 'East', '2024-01-15', 1200.00),
    ('Laptop', 'West', '2024-01-20', 1150.00),
    ('Mouse', 'East', '2024-01-18', 25.00),
    ('Mouse', 'West', '2024-01-22', 30.00),
    ('Keyboard', 'East', '2024-02-10', 75.00),
    ('Keyboard', 'West', '2024-02-12', 80.00);

Now generate multiple aggregation levels:

SELECT 
    product,
    region,
    SUM(revenue) as total_revenue,
    COUNT(*) as sale_count
FROM sales
GROUP BY GROUPING SETS (
    (product, region),  -- Sales by product AND region
    (product),          -- Sales by product only
    (region),           -- Sales by region only
    ()                  -- Grand total
)
ORDER BY product NULLS LAST, region NULLS LAST;

This produces rows at four different aggregation levels in one result set. Notice how NULL appears in columns that aren’t part of a particular grouping—this is how PostgreSQL indicates an aggregated dimension.

ROLLUP and CUBE Shortcuts

Writing out every combination in GROUPING SETS becomes tedious for hierarchical data. PostgreSQL provides two shortcuts: ROLLUP and CUBE.

ROLLUP creates a hierarchy of groupings, moving from detailed to summary. It’s perfect for time-based hierarchies or organizational structures:

SELECT 
    EXTRACT(YEAR FROM sale_date) as year,
    EXTRACT(QUARTER FROM sale_date) as quarter,
    EXTRACT(MONTH FROM sale_date) as month,
    SUM(revenue) as total_revenue
FROM sales
GROUP BY ROLLUP (
    EXTRACT(YEAR FROM sale_date),
    EXTRACT(QUARTER FROM sale_date),
    EXTRACT(MONTH FROM sale_date)
)
ORDER BY year, quarter, month;

ROLLUP with three columns generates these groupings:

  • (year, quarter, month)
  • (year, quarter)
  • (year)
  • () — grand total

Each level rolls up into the next, creating subtotals at each hierarchy level.

CUBE generates all possible combinations of the specified columns:

SELECT 
    product,
    region,
    EXTRACT(YEAR FROM sale_date) as year,
    SUM(revenue) as total_revenue
FROM sales
GROUP BY CUBE (product, region, EXTRACT(YEAR FROM sale_date))
ORDER BY product NULLS LAST, region NULLS LAST, year NULLS LAST;

CUBE with three columns produces 8 groupings (2³): every possible combination including the grand total. This is powerful for multi-dimensional analysis where you need to slice data across any combination of dimensions.

The GROUPING() Function

NULL values in result columns create ambiguity: is this NULL because the data was NULL, or because this column is aggregated? The GROUPING() function resolves this.

GROUPING() returns 1 if a column is aggregated in the current row, 0 if it’s part of the grouping:

SELECT 
    CASE 
        WHEN GROUPING(product) = 1 THEN 'All Products'
        ELSE product 
    END as product,
    CASE 
        WHEN GROUPING(region) = 1 THEN 'All Regions'
        ELSE region 
    END as region,
    SUM(revenue) as total_revenue
FROM sales
GROUP BY GROUPING SETS (
    (product, region),
    (product),
    (region),
    ()
)
ORDER BY 
    GROUPING(product),
    GROUPING(region),
    product,
    region;

This transforms confusing NULL values into clear labels. The grand total row shows “All Products” and “All Regions”, making the report immediately readable.

For multiple columns, use GROUPING_ID(), which returns a bitmask:

SELECT 
    product,
    region,
    SUM(revenue) as total_revenue,
    GROUPING_ID(product, region) as grouping_level
FROM sales
GROUP BY GROUPING SETS ((product, region), (product), (region), ())
ORDER BY grouping_level, product, region;

The grouping_level helps you categorize rows: 0 for detail, 1-2 for subtotals, 3 for grand total.

Practical Use Cases

GROUPING SETS shines in reporting scenarios. Here’s a complete sales dashboard query:

SELECT 
    CASE 
        WHEN GROUPING(product) = 1 THEN '*** ALL PRODUCTS ***'
        ELSE product 
    END as product,
    CASE 
        WHEN GROUPING(region) = 1 THEN '*** ALL REGIONS ***'
        ELSE region 
    END as region,
    COUNT(*) as transaction_count,
    SUM(revenue) as total_revenue,
    ROUND(AVG(revenue), 2) as avg_revenue,
    MIN(revenue) as min_revenue,
    MAX(revenue) as max_revenue
FROM sales
GROUP BY GROUPING SETS (
    (product, region),  -- Detail level
    (product),          -- Product subtotals
    (region),           -- Regional subtotals
    ()                  -- Grand total
)
ORDER BY 
    GROUPING(product),
    GROUPING(region),
    product,
    region;

This single query replaces what would traditionally require four separate queries with UNION ALL. You get product-region details, product totals, regional totals, and a grand total—all optimally calculated in one pass.

Another common use case is replacing inefficient UNION ALL queries in existing codebases. If you find queries combining multiple GROUP BY statements, GROUPING SETS is almost always faster and cleaner.

Performance Considerations

GROUPING SETS typically outperforms UNION ALL for multiple aggregation levels because it scans the table once. Here’s how to verify:

EXPLAIN ANALYZE
SELECT product, region, SUM(revenue)
FROM sales
GROUP BY GROUPING SETS ((product, region), (product), ());

-- Compare with:
EXPLAIN ANALYZE
SELECT product, region, SUM(revenue) FROM sales GROUP BY product, region
UNION ALL
SELECT product, NULL, SUM(revenue) FROM sales GROUP BY product
UNION ALL
SELECT NULL, NULL, SUM(revenue) FROM sales;

The GROUPING SETS query shows a single sequential scan or index scan, while UNION ALL shows multiple scans with an Append node combining results.

For optimal performance with large tables:

  1. Index the grouping columns: Create indexes on columns used in GROUPING SETS
  2. Use partial indexes for filtered reports: If you frequently filter by date ranges, create partial indexes
  3. Consider materialized views for expensive recurring reports
CREATE INDEX idx_sales_product_region ON sales(product, region);
CREATE INDEX idx_sales_date ON sales(sale_date);

Common Pitfalls and Best Practices

NULL handling is the biggest source of confusion. Always use GROUPING() to distinguish between NULL data and NULL from aggregation:

-- WRONG: Ambiguous NULLs
SELECT product, region, SUM(revenue)
FROM sales
GROUP BY GROUPING SETS ((product), (region));

-- RIGHT: Clear labels
SELECT 
    COALESCE(product, 'All Products') as product,
    COALESCE(region, 'All Regions') as region,
    SUM(revenue)
FROM sales
WHERE sale_date >= '2024-01-01'
GROUP BY GROUPING SETS ((product), (region))
HAVING SUM(revenue) > 100
ORDER BY GROUPING(product), GROUPING(region), SUM(revenue) DESC;

Filtering with WHERE and HAVING works as expected—WHERE filters before grouping, HAVING filters after. Just remember that HAVING applies to all grouping levels.

Ordering results requires thought. Use GROUPING() in ORDER BY to control whether subtotals appear before or after details:

SELECT 
    product,
    SUM(revenue) as total
FROM sales
GROUP BY GROUPING SETS ((product), ())
ORDER BY 
    GROUPING(product) DESC,  -- Grand total first
    total DESC;              -- Then products by revenue

GROUPING SETS transforms complex reporting queries from maintenance nightmares into elegant, performant SQL. Master this feature, and you’ll write cleaner analytics queries that execute faster and scale better than traditional approaches.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.