How to Use GROUP BY ROLLUP in PostgreSQL
When building reports that require subtotals and grand totals, you typically face two options: write multiple GROUP BY queries and combine them with UNION ALL, or perform aggregation in application...
Key Insights
- ROLLUP generates hierarchical subtotals and grand totals in a single query, eliminating the need for UNION ALL statements to combine multiple aggregation levels
- The GROUPING() function distinguishes between NULL values from actual data versus NULL values generated by ROLLUP aggregation, enabling proper labeling of total rows
- ROLLUP processes columns right-to-left, creating a hierarchy where each level removes the rightmost column from the grouping, making column order critical to results
Introduction to ROLLUP
When building reports that require subtotals and grand totals, you typically face two options: write multiple GROUP BY queries and combine them with UNION ALL, or perform aggregation in application code. Both approaches are tedious and error-prone. PostgreSQL’s ROLLUP extension to GROUP BY solves this by generating multiple grouping levels in a single query.
ROLLUP creates a hierarchy of aggregations, starting with the most granular grouping and progressively removing columns from right to left until it produces a grand total. This is exactly what you need for financial reports, sales dashboards, and any scenario where you want to show data at multiple aggregation levels simultaneously.
Let’s start with a sample sales table that we’ll use throughout this article:
CREATE TABLE sales (
id SERIAL PRIMARY KEY,
sale_date DATE,
region VARCHAR(50),
category VARCHAR(50),
product VARCHAR(100),
quantity INTEGER,
amount DECIMAL(10, 2)
);
INSERT INTO sales (sale_date, region, category, product, quantity, amount) VALUES
('2024-01-15', 'North', 'Electronics', 'Laptop', 5, 5000.00),
('2024-01-20', 'North', 'Electronics', 'Phone', 10, 8000.00),
('2024-01-25', 'North', 'Furniture', 'Desk', 3, 900.00),
('2024-02-10', 'South', 'Electronics', 'Laptop', 8, 8000.00),
('2024-02-15', 'South', 'Electronics', 'Tablet', 12, 6000.00),
('2024-02-20', 'South', 'Furniture', 'Chair', 20, 2000.00);
Basic ROLLUP Syntax
The syntax for ROLLUP is straightforward—you simply add ROLLUP() around the columns in your GROUP BY clause:
SELECT
region,
SUM(amount) as total_sales,
COUNT(*) as num_transactions
FROM sales
GROUP BY ROLLUP(region)
ORDER BY region NULLS LAST;
This query produces three types of rows:
- Sales grouped by each region
- A grand total row where region is NULL
The output looks like this:
region | total_sales | num_transactions
--------+-------------+------------------
North | 13900.00 | 3
South | 16000.00 | 3
NULL | 29900.00 | 6
The NULL row represents the grand total across all regions. This single query replaces what would otherwise require two separate queries combined with UNION ALL.
Multi-Column ROLLUP
ROLLUP’s real power emerges when you use multiple columns. It creates a hierarchy by progressively removing columns from right to left:
SELECT
region,
category,
product,
SUM(amount) as total_sales
FROM sales
GROUP BY ROLLUP(region, category, product)
ORDER BY region NULLS LAST, category NULLS LAST, product NULLS LAST;
This generates multiple aggregation levels:
- region + category + product (most granular)
- region + category (product becomes NULL)
- region (category and product become NULL)
- Grand total (all columns become NULL)
The output shows the hierarchy clearly:
region | category | product | total_sales
--------+-------------+---------+-------------
North | Electronics | Laptop | 5000.00
North | Electronics | Phone | 8000.00
North | Electronics | NULL | 13000.00
North | Furniture | Desk | 900.00
North | Furniture | NULL | 900.00
North | NULL | NULL | 13900.00
South | Electronics | Laptop | 8000.00
South | Electronics | Tablet | 6000.00
South | Electronics | NULL | 14000.00
South | Furniture | Chair | 2000.00
South | Furniture | NULL | 2000.00
South | NULL | NULL | 16000.00
NULL | NULL | NULL | 29900.00
Notice how ROLLUP creates subtotals for each category within each region, then subtotals for each region, and finally a grand total. The column order matters—changing it changes the hierarchy.
Identifying Grouping Levels with GROUPING()
The NULL values in ROLLUP output create ambiguity: is NULL an actual data value or a placeholder for “all values”? The GROUPING() function solves this by returning 1 when a column is aggregated away by ROLLUP, and 0 when it’s part of the grouping:
SELECT
CASE
WHEN GROUPING(region) = 1 THEN 'GRAND TOTAL'
ELSE region
END as region,
CASE
WHEN GROUPING(category) = 1 AND GROUPING(region) = 0 THEN 'Region Total'
WHEN GROUPING(category) = 0 THEN category
ELSE NULL
END as category,
product,
SUM(amount) as total_sales,
GROUPING(region) as region_grouped,
GROUPING(category) as category_grouped,
GROUPING(product) as product_grouped
FROM sales
GROUP BY ROLLUP(region, category, product)
ORDER BY region NULLS LAST, category NULLS LAST, product NULLS LAST;
This query adds descriptive labels and shows the GROUPING() values for each column. The GROUPING() function is essential for creating readable reports where you want to label subtotal and total rows differently from detail rows.
You can also use GROUPING() in WHERE clauses to filter for specific aggregation levels:
-- Only show region-level subtotals
SELECT
region,
SUM(amount) as total_sales
FROM sales
GROUP BY ROLLUP(region, category)
HAVING GROUPING(region) = 0 AND GROUPING(category) = 1;
Practical Use Cases
ROLLUP excels in reporting scenarios. Here’s a complete quarterly sales report that demonstrates real-world usage:
SELECT
CASE
WHEN GROUPING(quarter) = 1 THEN 'TOTAL'
ELSE 'Q' || quarter::TEXT
END as period,
CASE
WHEN GROUPING(region) = 1 AND GROUPING(quarter) = 0 THEN 'All Regions'
WHEN GROUPING(region) = 1 THEN ''
ELSE region
END as region,
CASE
WHEN GROUPING(category) = 1 AND GROUPING(region) = 0 THEN 'All Categories'
WHEN GROUPING(category) = 1 THEN ''
ELSE category
END as category,
TO_CHAR(SUM(amount), '$999,999.99') as total_sales,
SUM(quantity) as units_sold,
TO_CHAR(AVG(amount), '$999,999.99') as avg_transaction
FROM (
SELECT
EXTRACT(QUARTER FROM sale_date) as quarter,
region,
category,
amount,
quantity
FROM sales
) subquery
GROUP BY ROLLUP(quarter, region, category)
ORDER BY quarter NULLS LAST, region NULLS LAST, category NULLS LAST;
This query produces a professional report with properly labeled totals, formatted currency, and multiple aggregation levels. It’s exactly the kind of output you’d want for a dashboard or executive summary.
ROLLUP vs CUBE vs GROUPING SETS
PostgreSQL offers three GROUP BY extensions, each serving different purposes:
- ROLLUP: Creates hierarchical subtotals (right-to-left)
- CUBE: Creates all possible combinations of groupings
- GROUPING SETS: Lets you specify exactly which groupings you want
Here’s a comparison:
-- ROLLUP: Creates hierarchy
SELECT region, category, SUM(amount) as total
FROM sales
GROUP BY ROLLUP(region, category);
-- Produces: (region, category), (region), ()
-- CUBE: All combinations
SELECT region, category, SUM(amount) as total
FROM sales
GROUP BY CUBE(region, category);
-- Produces: (region, category), (region), (category), ()
-- GROUPING SETS: Explicit control
SELECT region, category, SUM(amount) as total
FROM sales
GROUP BY GROUPING SETS ((region, category), (region), ());
-- Produces: Only the specified groupings
Use ROLLUP when you have a natural hierarchy (time periods, geographic regions, organizational structure). Use CUBE when you need to analyze data from all possible angles. Use GROUPING SETS when you want specific combinations without the full CUBE overhead.
Performance Considerations and Best Practices
ROLLUP is generally more efficient than multiple queries with UNION ALL because PostgreSQL can compute all aggregation levels in a single pass. However, you should still consider indexing:
-- Create indexes on ROLLUP columns
CREATE INDEX idx_sales_region_category_product
ON sales(region, category, product);
-- Include aggregated columns for covering indexes
CREATE INDEX idx_sales_rollup_covering
ON sales(region, category, product)
INCLUDE (amount, quantity);
Compare the performance:
-- ROLLUP approach
EXPLAIN ANALYZE
SELECT region, category, SUM(amount)
FROM sales
GROUP BY ROLLUP(region, category);
-- Traditional approach with UNION ALL
EXPLAIN ANALYZE
SELECT region, category, SUM(amount)
FROM sales
GROUP BY region, category
UNION ALL
SELECT region, NULL, SUM(amount)
FROM sales
GROUP BY region
UNION ALL
SELECT NULL, NULL, SUM(amount)
FROM sales;
In most cases, ROLLUP shows better performance because it scans the table once rather than multiple times. The query planner can also optimize ROLLUP queries more effectively.
Key best practices:
- Order ROLLUP columns from least granular to most granular (left to right) to match your reporting hierarchy
- Always use GROUPING() to distinguish NULL values in production queries
- Consider materialized views for frequently-run ROLLUP queries on large datasets
- Use partial ROLLUP (e.g.,
GROUP BY region, ROLLUP(category, product)) when you don’t need all hierarchy levels - Add appropriate indexes on ROLLUP columns in the same order
ROLLUP transforms complex reporting queries into simple, maintainable SQL. Once you understand its hierarchical nature and the GROUPING() function, you’ll find countless opportunities to simplify your aggregation logic.