SQL - CUBE with Examples | Application Architect

Key Insights

CUBE generates all possible grouping combinations for specified columns, producing 2^n grouping sets for n columns—making it ideal for multidimensional analysis but potentially expensive for many columns.
Use the GROUPING() function to distinguish between NULL values in your actual data and NULL values that represent aggregated “all” rows in CUBE output.
CUBE is overkill when you only need hierarchical subtotals (use ROLLUP) or specific combinations (use GROUPING SETS)—choose the right tool for your aggregation needs.

Introduction to CUBE

CUBE is a GROUP BY extension that generates subtotals for all possible combinations of columns you specify. If you’ve ever built a pivot table in Excel or created a report that shows totals by region, by product, by both, and a grand total, you’ve done manually what CUBE does automatically.

The operator comes from OLAP (Online Analytical Processing) concepts where analysts need to slice and dice data across multiple dimensions. Instead of writing multiple queries with different GROUP BY clauses and UNION-ing them together, CUBE handles it in a single pass.

Most major databases support CUBE: PostgreSQL, SQL Server, Oracle, and newer versions of MySQL (8.0+). SQLite doesn’t support it natively, so you’ll need workarounds there.

CUBE Syntax and How It Works

The syntax is straightforward—you wrap your grouping columns in the CUBE function:

SELECT 
    column1,
    column2,
    aggregate_function(measure_column)
FROM table_name
GROUP BY CUBE(column1, column2);

The key insight is understanding the output. For n columns in CUBE, you get 2^n grouping sets. With two columns, that’s 4 grouping sets:

Group by both column1 and column2
Group by column1 only (column2 is NULL)
Group by column2 only (column1 is NULL)
Grand total (both columns are NULL)

With three columns, you get 8 grouping sets. With four, you get 16. This exponential growth matters for performance, which we’ll cover later.

Understanding CUBE Output with a Simple Example

Let’s work through a concrete example. Here’s a sales table:

CREATE TABLE sales_data (
    region VARCHAR(50),
    product VARCHAR(50),
    sales DECIMAL(10, 2)
);

INSERT INTO sales_data VALUES
('North', 'Widget', 100),
('North', 'Widget', 150),
('North', 'Gadget', 200),
('South', 'Widget', 300),
('South', 'Gadget', 250),
('South', 'Gadget', 175);

Now apply CUBE:

SELECT 
    region,
    product,
    SUM(sales) AS total_sales
FROM sales_data
GROUP BY CUBE(region, product)
ORDER BY region, product;

The result set looks like this:

region	product	total_sales
North	Gadget	200
North	Widget	250
North	NULL	450
South	Gadget	425
South	Widget	300
South	NULL	725
NULL	Gadget	625
NULL	Widget	550
NULL	NULL	1175

Each NULL represents “all values” for that dimension. The row with North, NULL shows total sales for the North region across all products. The row with NULL, Widget shows total Widget sales across all regions. The final NULL, NULL row is the grand total.

CUBE vs ROLLUP vs GROUPING SETS

These three operators serve different purposes. Understanding when to use each saves you from generating unnecessary data.

ROLLUP creates hierarchical subtotals. It assumes your columns have a natural hierarchy (year > quarter > month, or country > state > city). It produces n+1 grouping sets for n columns:

-- ROLLUP: Hierarchical subtotals (3 grouping sets)
SELECT 
    region,
    product,
    SUM(sales) AS total_sales
FROM sales_data
GROUP BY ROLLUP(region, product);

This produces:

Group by region and product
Group by region only (subtotal per region)
Grand total

It does NOT produce a “by product only” grouping because that breaks the hierarchy.

CUBE produces all combinations (4 grouping sets for 2 columns):

-- CUBE: All combinations (4 grouping sets)
SELECT 
    region,
    product,
    SUM(sales) AS total_sales
FROM sales_data
GROUP BY CUBE(region, product);

GROUPING SETS lets you specify exactly which combinations you want:

-- GROUPING SETS: Custom selection
SELECT 
    region,
    product,
    SUM(sales) AS total_sales
FROM sales_data
GROUP BY GROUPING SETS(
    (region, product),  -- detailed rows
    (region),           -- subtotal by region
    ()                  -- grand total
);

This is equivalent to ROLLUP in this case. You can express both CUBE and ROLLUP as GROUPING SETS:

-- CUBE(a, b) is equivalent to:
GROUP BY GROUPING SETS(
    (a, b),
    (a),
    (b),
    ()
)

-- ROLLUP(a, b) is equivalent to:
GROUP BY GROUPING SETS(
    (a, b),
    (a),
    ()
)

Use GROUPING SETS when you need specific combinations that don’t fit CUBE or ROLLUP patterns.

Using GROUPING() and GROUPING_ID() Functions

Here’s a problem: what if your data legitimately contains NULL values? How do you distinguish between “this region is NULL in the source data” and “this NULL means all regions”?

The GROUPING() function solves this. It returns 1 if the column is aggregated (the NULL represents “all”), and 0 if it’s a regular grouped value:

SELECT 
    region,
    product,
    SUM(sales) AS total_sales,
    GROUPING(region) AS region_is_aggregated,
    GROUPING(product) AS product_is_aggregated
FROM sales_data
GROUP BY CUBE(region, product);

More practically, use GROUPING() to create readable labels:

SELECT 
    CASE 
        WHEN GROUPING(region) = 1 THEN 'All Regions'
        ELSE region 
    END AS region,
    CASE 
        WHEN GROUPING(product) = 1 THEN 'All Products'
        ELSE product 
    END AS product,
    SUM(sales) AS total_sales
FROM sales_data
GROUP BY CUBE(region, product)
ORDER BY GROUPING(region), GROUPING(product), region, product;

Now your output is human-readable:

region	product	total_sales
North	Gadget	200
North	Widget	250
North	All Products	450
South	Gadget	425
South	Widget	300
South	All Products	725
All Regions	Gadget	625
All Regions	Widget	550
All Regions	All Products	1175

GROUPING_ID() combines multiple GROUPING() values into a single bitmap integer, useful for filtering specific grouping levels:

SELECT 
    region,
    product,
    SUM(sales) AS total_sales,
    GROUPING_ID(region, product) AS grouping_level
FROM sales_data
GROUP BY CUBE(region, product);

The grouping_level values: 0 = both grouped, 1 = product aggregated, 2 = region aggregated, 3 = both aggregated (grand total).

Real-World Use Case: Sales Analytics Dashboard

Let’s build a realistic report for a retail analytics dashboard. We need sales summaries across date, category, and store dimensions:

CREATE TABLE retail_sales (
    sale_date DATE,
    category VARCHAR(50),
    store_id VARCHAR(10),
    revenue DECIMAL(12, 2),
    units_sold INT
);

-- Sample data
INSERT INTO retail_sales VALUES
('2024-01-15', 'Electronics', 'S001', 1500.00, 3),
('2024-01-15', 'Electronics', 'S002', 2200.00, 5),
('2024-01-15', 'Clothing', 'S001', 800.00, 12),
('2024-01-16', 'Electronics', 'S001', 1800.00, 4),
('2024-01-16', 'Clothing', 'S002', 950.00, 15),
('2024-01-16', 'Clothing', 'S001', 600.00, 8);

Here’s a production-ready CUBE query with proper formatting:

SELECT 
    COALESCE(
        CASE WHEN GROUPING(sale_date) = 1 THEN 'All Dates' END,
        CAST(sale_date AS VARCHAR(10))
    ) AS sale_date,
    CASE 
        WHEN GROUPING(category) = 1 THEN 'All Categories'
        ELSE category 
    END AS category,
    CASE 
        WHEN GROUPING(store_id) = 1 THEN 'All Stores'
        ELSE store_id 
    END AS store,
    SUM(revenue) AS total_revenue,
    SUM(units_sold) AS total_units,
    ROUND(SUM(revenue) / NULLIF(SUM(units_sold), 0), 2) AS avg_price_per_unit,
    GROUPING_ID(sale_date, category, store_id) AS summary_level
FROM retail_sales
GROUP BY CUBE(sale_date, category, store_id)
ORDER BY 
    GROUPING(sale_date),
    GROUPING(category), 
    GROUPING(store_id),
    sale_date,
    category,
    store_id;

This single query generates every subtotal combination your dashboard might need: by date, by category, by store, by date+category, by date+store, by category+store, fully detailed, and grand total. The summary_level column helps your application layer filter to specific views.

Performance Considerations and Best Practices

CUBE’s exponential growth is its biggest liability. Three columns produce 8 grouping sets. Six columns produce 64. Ten columns produce 1,024 grouping sets, each requiring aggregation across your entire dataset.

Practical limits: Keep CUBE to 4-5 columns maximum in most cases. Beyond that, query times become unpredictable and results become unwieldy.

Use GROUPING SETS when you don’t need everything: If your report only needs 5 specific combinations out of 16 possible, write them explicitly:

-- Don't do this if you only need specific combinations
GROUP BY CUBE(a, b, c, d)  -- 16 grouping sets

-- Do this instead
GROUP BY GROUPING SETS(
    (a, b, c, d),
    (a, b),
    (c, d),
    (a),
    ()
)  -- 5 grouping sets

Indexing strategy: Create composite indexes that match your most common CUBE column combinations. The database can use index scans for grouping operations when indexes align with grouping columns.

Materialize for dashboards: If your dashboard hits the same CUBE query repeatedly, materialize the results into a summary table and refresh it periodically. Real-time CUBE queries against large tables will eventually cause problems.

Filter before CUBE, not after: Apply WHERE clauses to reduce the dataset before CUBE processes it. Filtering CUBE results after aggregation wastes computation.

-- Good: Filter first
SELECT region, product, SUM(sales)
FROM sales_data
WHERE sale_date >= '2024-01-01'
GROUP BY CUBE(region, product);

-- Bad: Aggregate everything, then filter
SELECT * FROM (
    SELECT region, product, SUM(sales)
    FROM sales_data
    GROUP BY CUBE(region, product)
) sub
WHERE region = 'North' OR region IS NULL;

CUBE is a powerful tool for analytical queries, but it’s not a default choice. Use it when you genuinely need all dimension combinations. For hierarchical reports, ROLLUP is lighter. For specific combinations, GROUPING SETS gives you control. Match the operator to your actual requirements.