How to Use GROUP BY CUBE in PostgreSQL

PostgreSQL's CUBE extension to GROUP BY solves a common reporting problem: generating aggregates across multiple dimensions simultaneously. When you need sales totals by region, by product, by both...

Key Insights

  • CUBE generates all possible grouping combinations from your specified columns, creating a multidimensional analysis in a single query that would otherwise require multiple GROUP BY statements and UNION operations.
  • The GROUPING() and GROUPING_ID() functions are essential for distinguishing between actual NULL values in your data and NULL values that represent subtotal rows generated by CUBE.
  • While powerful for analytical queries, CUBE can produce exponentially growing result sets (2^n combinations) and should be used strategically with proper indexing and filtering on large datasets.

Understanding CUBE and Why It Matters

PostgreSQL’s CUBE extension to GROUP BY solves a common reporting problem: generating aggregates across multiple dimensions simultaneously. When you need sales totals by region, by product, by both region and product, and an overall total, you’d typically write four separate queries and combine them with UNION. CUBE does this in one pass.

Here’s the difference in action:

-- Traditional approach: multiple queries
SELECT region, product, SUM(amount) as total
FROM sales
GROUP BY region, product
UNION ALL
SELECT region, NULL, SUM(amount)
FROM sales
GROUP BY region
UNION ALL
SELECT NULL, product, SUM(amount)
FROM sales
GROUP BY product
UNION ALL
SELECT NULL, NULL, SUM(amount)
FROM sales;

-- CUBE approach: single query
SELECT region, product, SUM(amount) as total
FROM sales
GROUP BY CUBE(region, product);

Both produce identical results, but CUBE is cleaner, faster, and easier to maintain. The database engine optimizes the single CUBE query better than multiple UNION operations.

Basic CUBE Syntax and Mechanics

CUBE generates the power set of all grouping combinations. With two columns, you get 2² = 4 combinations. Three columns yield 2³ = 8 combinations. The formula is always 2^n where n is the number of columns in your CUBE clause.

Let’s build a practical example with sample data:

CREATE TABLE sales (
    region VARCHAR(50),
    product VARCHAR(50),
    quarter INT,
    amount DECIMAL(10,2)
);

INSERT INTO sales VALUES
    ('North', 'Widget', 1, 1000),
    ('North', 'Widget', 2, 1500),
    ('North', 'Gadget', 1, 2000),
    ('South', 'Widget', 1, 800),
    ('South', 'Gadget', 1, 1200);

-- Basic CUBE with two dimensions
SELECT 
    region,
    product,
    SUM(amount) as total_sales
FROM sales
GROUP BY CUBE(region, product)
ORDER BY region NULLS LAST, product NULLS LAST;

Results show all combinations:

  • Specific region + product combinations
  • Totals per region (product = NULL)
  • Totals per product (region = NULL)
  • Grand total (both NULL)

The challenge: how do you distinguish between NULL values from CUBE and actual NULLs in your data? Use the GROUPING() function:

SELECT 
    region,
    product,
    SUM(amount) as total_sales,
    GROUPING(region) as region_grouping,
    GROUPING(product) as product_grouping
FROM sales
GROUP BY CUBE(region, product)
ORDER BY region NULLS LAST, product NULLS LAST;

GROUPING() returns 1 when the column is aggregated (NULL represents a subtotal) and 0 when it’s an actual grouped value. This lets you format output appropriately:

SELECT 
    COALESCE(region, 'All Regions') as region,
    COALESCE(product, 'All Products') as product,
    SUM(amount) as total_sales
FROM sales
GROUP BY CUBE(region, product);

Practical Use Cases for CUBE

CUBE excels in business intelligence scenarios where stakeholders need to slice data across multiple dimensions. Here’s a realistic sales analysis:

CREATE TABLE quarterly_sales (
    year INT,
    quarter INT,
    category VARCHAR(50),
    product VARCHAR(50),
    revenue DECIMAL(12,2)
);

-- Multi-dimensional revenue analysis
SELECT 
    COALESCE(year::TEXT, 'All Years') as year,
    COALESCE(quarter::TEXT, 'All Quarters') as quarter,
    COALESCE(category, 'All Categories') as category,
    SUM(revenue) as total_revenue,
    COUNT(*) as transaction_count,
    ROUND(AVG(revenue), 2) as avg_revenue
FROM quarterly_sales
GROUP BY CUBE(year, quarter, category)
HAVING SUM(revenue) > 10000  -- Only show significant segments
ORDER BY 
    year NULLS LAST,
    quarter NULLS LAST,
    category NULLS LAST;

This single query provides:

  • Revenue by year/quarter/category combinations
  • Annual totals by category
  • Quarterly totals across all categories
  • Category totals across all time periods
  • Complete grand totals

For financial reporting, CUBE simplifies variance analysis:

SELECT 
    COALESCE(department, 'Total') as department,
    COALESCE(expense_type, 'All Types') as expense_type,
    SUM(CASE WHEN year = 2023 THEN amount ELSE 0 END) as y2023,
    SUM(CASE WHEN year = 2024 THEN amount ELSE 0 END) as y2024,
    SUM(CASE WHEN year = 2024 THEN amount ELSE 0 END) - 
    SUM(CASE WHEN year = 2023 THEN amount ELSE 0 END) as variance
FROM expenses
WHERE year IN (2023, 2024)
GROUP BY CUBE(department, expense_type);

Combining CUBE with Aggregate Functions

CUBE works with any aggregate function. The key is understanding how to filter at different stages:

SELECT 
    region,
    product,
    SUM(amount) as total_sales,
    COUNT(*) as num_transactions,
    ROUND(AVG(amount), 2) as avg_transaction,
    MAX(amount) as largest_sale,
    MIN(amount) as smallest_sale
FROM sales
WHERE amount > 100  -- Pre-aggregation filter
GROUP BY CUBE(region, product)
HAVING SUM(amount) > 1000  -- Post-aggregation filter
ORDER BY total_sales DESC;

The WHERE clause filters rows before grouping. The HAVING clause filters after aggregation. This distinction matters for performance and correctness.

For percentage calculations across dimensions:

WITH sales_cube AS (
    SELECT 
        region,
        product,
        SUM(amount) as total_sales,
        GROUPING(region) as r_grp,
        GROUPING(product) as p_grp
    FROM sales
    GROUP BY CUBE(region, product)
)
SELECT 
    region,
    product,
    total_sales,
    CASE 
        WHEN r_grp = 0 AND p_grp = 0 THEN 
            ROUND(100.0 * total_sales / 
                  (SELECT total_sales FROM sales_cube 
                   WHERE r_grp = 1 AND p_grp = 1), 2)
    END as pct_of_total
FROM sales_cube
WHERE r_grp = 0 AND p_grp = 0;  -- Only detail rows

Advanced CUBE Techniques

Partial CUBE lets you combine regular grouping with CUBE operations:

-- CUBE only some dimensions
SELECT 
    year,  -- Always grouped
    region,
    product,
    SUM(amount) as total_sales
FROM sales
GROUP BY year, CUBE(region, product)
ORDER BY year, region NULLS LAST, product NULLS LAST;

This generates all combinations of region and product for each year separately, reducing the result set size while maintaining dimensional analysis.

Use GROUPING_ID() to identify specific subtotal levels with a bitmask:

SELECT 
    region,
    product,
    quarter,
    SUM(amount) as total_sales,
    GROUPING_ID(region, product, quarter) as grouping_level
FROM sales
GROUP BY CUBE(region, product, quarter)
ORDER BY grouping_level, region, product, quarter;

GROUPING_ID() returns a binary representation: 0 (000) means no grouping, 7 (111) means all columns are grouped (grand total). This helps when you need to extract specific subtotal types programmatically.

For performance, index your grouping columns:

CREATE INDEX idx_sales_cube ON sales(region, product, quarter);
ANALYZE sales;

EXPLAIN ANALYZE
SELECT region, product, SUM(amount)
FROM sales
GROUP BY CUBE(region, product);

Common Pitfalls and Best Practices

Pitfall 1: Misinterpreting NULL values

Always use GROUPING() when NULLs might exist in your source data:

-- Wrong: Can't distinguish between NULL region and subtotal
SELECT region, SUM(amount)
FROM sales
GROUP BY CUBE(region);

-- Right: Clear distinction
SELECT 
    CASE 
        WHEN GROUPING(region) = 1 THEN 'Grand Total'
        WHEN region IS NULL THEN 'Unknown Region'
        ELSE region
    END as region,
    SUM(amount)
FROM sales
GROUP BY CUBE(region);

Pitfall 2: Exponential result growth

CUBE with five columns produces 32 rows per unique combination. On large datasets, this explodes quickly. Consider GROUPING SETS for specific combinations:

-- Instead of full CUBE
GROUP BY CUBE(a, b, c, d, e)  -- 32 combinations

-- Use specific sets
GROUP BY GROUPING SETS (
    (a, b, c),
    (a, b),
    (a),
    ()
);  -- Only 4 combinations

Pitfall 3: Performance without proper filtering

Always filter before aggregation when possible:

-- Good: Filter first
SELECT region, SUM(amount)
FROM sales
WHERE date >= '2024-01-01'  -- Reduces rows early
GROUP BY CUBE(region);

-- Less efficient
SELECT region, SUM(amount)
FROM sales
GROUP BY CUBE(region)
HAVING SUM(CASE WHEN date >= '2024-01-01' THEN amount ELSE 0 END) > 0;

CUBE is powerful for analytical queries but overkill for simple aggregations. Use standard GROUP BY when you don’t need multidimensional analysis. Reserve CUBE for dashboards, reports, and BI tools where users need to pivot data across multiple dimensions. With proper indexing and thoughtful query design, CUBE transforms complex reporting requirements into maintainable, performant SQL.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.