SQL - GROUPING SETS | Application Architect

Key Insights

GROUPING SETS let you compute multiple aggregation levels in a single query, eliminating the need for verbose UNION ALL statements and multiple table scans.
The GROUPING() function is essential for distinguishing between actual NULL values in your data and NULLs that represent “all values” in subtotal rows.
ROLLUP and CUBE are convenient shortcuts for common GROUPING SETS patterns—use ROLLUP for hierarchical drill-downs and CUBE when you need every possible combination.

Introduction to GROUPING SETS

GROUPING SETS solve a common analytical problem: you need aggregations at multiple levels in a single result set. Think sales totals by region, by product, by region and product combined, and a grand total—all in one query.

Before GROUPING SETS, you’d write multiple queries and glue them together with UNION ALL. This approach is verbose, error-prone, and forces the database to scan your tables multiple times. GROUPING SETS collapse all these aggregations into a single, declarative statement that the query optimizer can handle efficiently.

Most major databases support GROUPING SETS: PostgreSQL, SQL Server, Oracle, DB2, and recent versions of MySQL (8.0+). SQLite remains the notable exception. If you’re writing analytical queries or building reporting systems, GROUPING SETS should be in your toolkit.

The Problem with Multiple GROUP BY Queries

Let’s start with a typical scenario. You have a sales table and need a report showing totals at different granularities: by region, by product category, by both, and an overall grand total.

Here’s the traditional approach:

-- Sales by region and product
SELECT 
    region,
    product_category,
    SUM(amount) AS total_sales,
    COUNT(*) AS transaction_count
FROM sales
WHERE sale_date >= '2024-01-01'
GROUP BY region, product_category

UNION ALL

-- Sales by region only
SELECT 
    region,
    NULL AS product_category,
    SUM(amount) AS total_sales,
    COUNT(*) AS transaction_count
FROM sales
WHERE sale_date >= '2024-01-01'
GROUP BY region

UNION ALL

-- Sales by product only
SELECT 
    NULL AS region,
    product_category,
    SUM(amount) AS total_sales,
    COUNT(*) AS transaction_count
FROM sales
WHERE sale_date >= '2024-01-01'
GROUP BY product_category

UNION ALL

-- Grand total
SELECT 
    NULL AS region,
    NULL AS product_category,
    SUM(amount) AS total_sales,
    COUNT(*) AS transaction_count
FROM sales
WHERE sale_date >= '2024-01-01';

This works, but it’s painful. You’re repeating the same table reference, the same WHERE clause, and the same aggregate expressions four times. If someone adds a filter condition and forgets to update all four branches, you get inconsistent results. The database also scans the sales table four separate times, which matters when you’re dealing with millions of rows.

GROUPING SETS Syntax and Basic Usage

GROUPING SETS let you specify multiple grouping combinations in a single GROUP BY clause. Each set in parentheses defines one aggregation level:

SELECT 
    region,
    product_category,
    SUM(amount) AS total_sales,
    COUNT(*) AS transaction_count
FROM sales
WHERE sale_date >= '2024-01-01'
GROUP BY GROUPING SETS (
    (region, product_category),  -- Detail level
    (region),                     -- Subtotal by region
    (product_category),           -- Subtotal by product
    ()                            -- Grand total
);

This single query produces the same result as the four-query UNION ALL version. The empty parentheses () represent the grand total—no grouping columns, just aggregate the entire filtered dataset.

The output might look like this:

region	product_category	total_sales	transaction_count
North	Electronics	45000	150
North	Clothing	32000	280
South	Electronics	38000	120
South	Clothing	41000	310
North	NULL	77000	430
South	NULL	79000	430
NULL	Electronics	83000	270
NULL	Clothing	73000	590
NULL	NULL	156000	860

Notice the NULL values. When a column isn’t part of the current grouping set, it appears as NULL in the result. This brings us to an important problem.

The GROUPING() Function

Those NULLs in the result set are ambiguous. What if your data actually contains NULL values in the region or product_category columns? You can’t tell whether a NULL means “this is a subtotal row” or “this row represents sales with unknown region.”

The GROUPING() function solves this. It returns 1 when the column is aggregated (not part of the current grouping set) and 0 when it’s an actual grouping column:

SELECT 
    CASE 
        WHEN GROUPING(region) = 1 THEN 'All Regions'
        ELSE COALESCE(region, 'Unknown')
    END AS region_label,
    CASE 
        WHEN GROUPING(product_category) = 1 THEN 'All Products'
        ELSE COALESCE(product_category, 'Unknown')
    END AS product_label,
    region,
    product_category,
    GROUPING(region) AS region_grouped,
    GROUPING(product_category) AS product_grouped,
    SUM(amount) AS total_sales
FROM sales
WHERE sale_date >= '2024-01-01'
GROUP BY GROUPING SETS (
    (region, product_category),
    (region),
    (product_category),
    ()
)
ORDER BY 
    GROUPING(region),
    GROUPING(product_category),
    region,
    product_category;

Now your result set has clean, human-readable labels:

region_label	product_label	total_sales
North	Clothing	32000
North	Electronics	45000
South	Clothing	41000
South	Electronics	38000
North	All Products	77000
South	All Products	79000
All Regions	Clothing	73000
All Regions	Electronics	83000
All Regions	All Products	156000

Some databases also support GROUPING_ID(), which returns a bitmap representing which columns are grouped. This is useful when you need to filter or sort by aggregation level programmatically.

ROLLUP and CUBE as GROUPING SETS Shortcuts

Writing out every grouping combination gets tedious for common patterns. ROLLUP and CUBE provide shortcuts.

ROLLUP produces hierarchical subtotals, removing columns from right to left:

-- These two queries are equivalent
GROUP BY ROLLUP(year, quarter, month)

GROUP BY GROUPING SETS (
    (year, quarter, month),
    (year, quarter),
    (year),
    ()
)

ROLLUP is perfect for time-based hierarchies or organizational structures where you want drill-down totals: year → quarter → month, or company → department → team.

CUBE produces every possible combination of the specified columns:

-- These two queries are equivalent
GROUP BY CUBE(region, product)

GROUP BY GROUPING SETS (
    (region, product),
    (region),
    (product),
    ()
)

With three columns, CUBE generates 8 grouping sets (2³). With four columns, you get 16. Use CUBE when you need a complete cross-tabulation, but be aware that the number of result rows grows exponentially.

You can also combine these with regular GROUP BY columns:

-- Partial CUBE: always group by year, cube region and product
GROUP BY year, CUBE(region, product)

-- Equivalent to:
GROUP BY GROUPING SETS (
    (year, region, product),
    (year, region),
    (year, product),
    (year)
)

Performance Considerations

GROUPING SETS typically outperform equivalent UNION ALL queries because the database can optimize the entire operation as a unit. Most query planners will:

Scan the source table once
Build hash tables or sort the data once
Compute all grouping levels in a single pass

However, performance depends on your specific database and query. PostgreSQL’s planner is particularly good at optimizing GROUPING SETS. SQL Server handles them well but may choose different strategies based on statistics. Always check your execution plan.

Index usage matters. If your WHERE clause filters on indexed columns, that index helps all grouping sets equally. But if you’re grouping by non-indexed columns on a large table, consider whether a covering index would help.

Watch out for cardinality explosion with CUBE on high-cardinality columns. CUBE(customer_id, product_id, date) on a table with thousands of customers, products, and dates will generate an enormous result set. In these cases, stick with GROUPING SETS and specify only the combinations you actually need.

Practical Use Cases

Here’s a complete example for an e-commerce reporting dashboard. This query generates a summary with multiple drill-down levels that a front-end application can render as an expandable table:

WITH sales_summary AS (
    SELECT 
        EXTRACT(YEAR FROM order_date) AS sale_year,
        EXTRACT(QUARTER FROM order_date) AS sale_quarter,
        category_name,
        region,
        SUM(line_total) AS revenue,
        SUM(quantity) AS units_sold,
        COUNT(DISTINCT order_id) AS order_count,
        COUNT(DISTINCT customer_id) AS unique_customers,
        GROUPING(EXTRACT(YEAR FROM order_date)) AS year_agg,
        GROUPING(EXTRACT(QUARTER FROM order_date)) AS quarter_agg,
        GROUPING(category_name) AS category_agg,
        GROUPING(region) AS region_agg
    FROM orders o
    JOIN order_lines ol ON o.order_id = ol.order_id
    JOIN products p ON ol.product_id = p.product_id
    JOIN customers c ON o.customer_id = c.customer_id
    WHERE order_date >= '2023-01-01' 
      AND order_date < '2025-01-01'
      AND order_status = 'completed'
    GROUP BY GROUPING SETS (
        (EXTRACT(YEAR FROM order_date), EXTRACT(QUARTER FROM order_date), category_name, region),
        (EXTRACT(YEAR FROM order_date), EXTRACT(QUARTER FROM order_date), category_name),
        (EXTRACT(YEAR FROM order_date), EXTRACT(QUARTER FROM order_date)),
        (EXTRACT(YEAR FROM order_date), category_name),
        (EXTRACT(YEAR FROM order_date)),
        (category_name, region),
        (category_name),
        ()
    )
)
SELECT 
    COALESCE(sale_year::text, 'All Years') AS year,
    CASE WHEN quarter_agg = 1 THEN 'All' ELSE 'Q' || sale_quarter END AS quarter,
    COALESCE(category_name, 'All Categories') AS category,
    COALESCE(region, 'All Regions') AS region,
    revenue,
    units_sold,
    order_count,
    unique_customers,
    ROUND(revenue / NULLIF(order_count, 0), 2) AS avg_order_value,
    -- Aggregation level for front-end grouping logic
    year_agg + quarter_agg + category_agg + region_agg AS agg_level
FROM sales_summary
ORDER BY 
    year_agg, sale_year,
    quarter_agg, sale_quarter,
    category_agg, category_name,
    region_agg, region;

This single query replaces what would otherwise be eight separate queries. The agg_level column tells the front-end how many dimensions are rolled up, making it easy to render the data as a hierarchical report with expandable sections.

GROUPING SETS are one of those SQL features that feel obscure until you need them—then they’re indispensable. If you’re building reports, dashboards, or any analytical queries that need multiple aggregation levels, stop writing UNION ALL chains and let the database do the work efficiently.