SQL - CUBE with Examples
CUBE is a GROUP BY extension that generates subtotals for all possible combinations of columns you specify. If you've ever built a pivot table in Excel or created a report that shows totals by...
Key Insights
- CUBE generates all possible grouping combinations for specified columns, producing 2^n grouping sets for n columns—making it ideal for multidimensional analysis but potentially expensive for many columns.
- Use the GROUPING() function to distinguish between NULL values in your actual data and NULL values that represent aggregated “all” rows in CUBE output.
- CUBE is overkill when you only need hierarchical subtotals (use ROLLUP) or specific combinations (use GROUPING SETS)—choose the right tool for your aggregation needs.
Introduction to CUBE
CUBE is a GROUP BY extension that generates subtotals for all possible combinations of columns you specify. If you’ve ever built a pivot table in Excel or created a report that shows totals by region, by product, by both, and a grand total, you’ve done manually what CUBE does automatically.
The operator comes from OLAP (Online Analytical Processing) concepts where analysts need to slice and dice data across multiple dimensions. Instead of writing multiple queries with different GROUP BY clauses and UNION-ing them together, CUBE handles it in a single pass.
Most major databases support CUBE: PostgreSQL, SQL Server, Oracle, and newer versions of MySQL (8.0+). SQLite doesn’t support it natively, so you’ll need workarounds there.
CUBE Syntax and How It Works
The syntax is straightforward—you wrap your grouping columns in the CUBE function:
SELECT
column1,
column2,
aggregate_function(measure_column)
FROM table_name
GROUP BY CUBE(column1, column2);
The key insight is understanding the output. For n columns in CUBE, you get 2^n grouping sets. With two columns, that’s 4 grouping sets:
- Group by both column1 and column2
- Group by column1 only (column2 is NULL)
- Group by column2 only (column1 is NULL)
- Grand total (both columns are NULL)
With three columns, you get 8 grouping sets. With four, you get 16. This exponential growth matters for performance, which we’ll cover later.
Understanding CUBE Output with a Simple Example
Let’s work through a concrete example. Here’s a sales table:
CREATE TABLE sales_data (
region VARCHAR(50),
product VARCHAR(50),
sales DECIMAL(10, 2)
);
INSERT INTO sales_data VALUES
('North', 'Widget', 100),
('North', 'Widget', 150),
('North', 'Gadget', 200),
('South', 'Widget', 300),
('South', 'Gadget', 250),
('South', 'Gadget', 175);
Now apply CUBE:
SELECT
region,
product,
SUM(sales) AS total_sales
FROM sales_data
GROUP BY CUBE(region, product)
ORDER BY region, product;
The result set looks like this:
| region | product | total_sales |
|---|---|---|
| North | Gadget | 200 |
| North | Widget | 250 |
| North | NULL | 450 |
| South | Gadget | 425 |
| South | Widget | 300 |
| South | NULL | 725 |
| NULL | Gadget | 625 |
| NULL | Widget | 550 |
| NULL | NULL | 1175 |
Each NULL represents “all values” for that dimension. The row with North, NULL shows total sales for the North region across all products. The row with NULL, Widget shows total Widget sales across all regions. The final NULL, NULL row is the grand total.
CUBE vs ROLLUP vs GROUPING SETS
These three operators serve different purposes. Understanding when to use each saves you from generating unnecessary data.
ROLLUP creates hierarchical subtotals. It assumes your columns have a natural hierarchy (year > quarter > month, or country > state > city). It produces n+1 grouping sets for n columns:
-- ROLLUP: Hierarchical subtotals (3 grouping sets)
SELECT
region,
product,
SUM(sales) AS total_sales
FROM sales_data
GROUP BY ROLLUP(region, product);
This produces:
- Group by region and product
- Group by region only (subtotal per region)
- Grand total
It does NOT produce a “by product only” grouping because that breaks the hierarchy.
CUBE produces all combinations (4 grouping sets for 2 columns):
-- CUBE: All combinations (4 grouping sets)
SELECT
region,
product,
SUM(sales) AS total_sales
FROM sales_data
GROUP BY CUBE(region, product);
GROUPING SETS lets you specify exactly which combinations you want:
-- GROUPING SETS: Custom selection
SELECT
region,
product,
SUM(sales) AS total_sales
FROM sales_data
GROUP BY GROUPING SETS(
(region, product), -- detailed rows
(region), -- subtotal by region
() -- grand total
);
This is equivalent to ROLLUP in this case. You can express both CUBE and ROLLUP as GROUPING SETS:
-- CUBE(a, b) is equivalent to:
GROUP BY GROUPING SETS(
(a, b),
(a),
(b),
()
)
-- ROLLUP(a, b) is equivalent to:
GROUP BY GROUPING SETS(
(a, b),
(a),
()
)
Use GROUPING SETS when you need specific combinations that don’t fit CUBE or ROLLUP patterns.
Using GROUPING() and GROUPING_ID() Functions
Here’s a problem: what if your data legitimately contains NULL values? How do you distinguish between “this region is NULL in the source data” and “this NULL means all regions”?
The GROUPING() function solves this. It returns 1 if the column is aggregated (the NULL represents “all”), and 0 if it’s a regular grouped value:
SELECT
region,
product,
SUM(sales) AS total_sales,
GROUPING(region) AS region_is_aggregated,
GROUPING(product) AS product_is_aggregated
FROM sales_data
GROUP BY CUBE(region, product);
More practically, use GROUPING() to create readable labels:
SELECT
CASE
WHEN GROUPING(region) = 1 THEN 'All Regions'
ELSE region
END AS region,
CASE
WHEN GROUPING(product) = 1 THEN 'All Products'
ELSE product
END AS product,
SUM(sales) AS total_sales
FROM sales_data
GROUP BY CUBE(region, product)
ORDER BY GROUPING(region), GROUPING(product), region, product;
Now your output is human-readable:
| region | product | total_sales |
|---|---|---|
| North | Gadget | 200 |
| North | Widget | 250 |
| North | All Products | 450 |
| South | Gadget | 425 |
| South | Widget | 300 |
| South | All Products | 725 |
| All Regions | Gadget | 625 |
| All Regions | Widget | 550 |
| All Regions | All Products | 1175 |
GROUPING_ID() combines multiple GROUPING() values into a single bitmap integer, useful for filtering specific grouping levels:
SELECT
region,
product,
SUM(sales) AS total_sales,
GROUPING_ID(region, product) AS grouping_level
FROM sales_data
GROUP BY CUBE(region, product);
The grouping_level values: 0 = both grouped, 1 = product aggregated, 2 = region aggregated, 3 = both aggregated (grand total).
Real-World Use Case: Sales Analytics Dashboard
Let’s build a realistic report for a retail analytics dashboard. We need sales summaries across date, category, and store dimensions:
CREATE TABLE retail_sales (
sale_date DATE,
category VARCHAR(50),
store_id VARCHAR(10),
revenue DECIMAL(12, 2),
units_sold INT
);
-- Sample data
INSERT INTO retail_sales VALUES
('2024-01-15', 'Electronics', 'S001', 1500.00, 3),
('2024-01-15', 'Electronics', 'S002', 2200.00, 5),
('2024-01-15', 'Clothing', 'S001', 800.00, 12),
('2024-01-16', 'Electronics', 'S001', 1800.00, 4),
('2024-01-16', 'Clothing', 'S002', 950.00, 15),
('2024-01-16', 'Clothing', 'S001', 600.00, 8);
Here’s a production-ready CUBE query with proper formatting:
SELECT
COALESCE(
CASE WHEN GROUPING(sale_date) = 1 THEN 'All Dates' END,
CAST(sale_date AS VARCHAR(10))
) AS sale_date,
CASE
WHEN GROUPING(category) = 1 THEN 'All Categories'
ELSE category
END AS category,
CASE
WHEN GROUPING(store_id) = 1 THEN 'All Stores'
ELSE store_id
END AS store,
SUM(revenue) AS total_revenue,
SUM(units_sold) AS total_units,
ROUND(SUM(revenue) / NULLIF(SUM(units_sold), 0), 2) AS avg_price_per_unit,
GROUPING_ID(sale_date, category, store_id) AS summary_level
FROM retail_sales
GROUP BY CUBE(sale_date, category, store_id)
ORDER BY
GROUPING(sale_date),
GROUPING(category),
GROUPING(store_id),
sale_date,
category,
store_id;
This single query generates every subtotal combination your dashboard might need: by date, by category, by store, by date+category, by date+store, by category+store, fully detailed, and grand total. The summary_level column helps your application layer filter to specific views.
Performance Considerations and Best Practices
CUBE’s exponential growth is its biggest liability. Three columns produce 8 grouping sets. Six columns produce 64. Ten columns produce 1,024 grouping sets, each requiring aggregation across your entire dataset.
Practical limits: Keep CUBE to 4-5 columns maximum in most cases. Beyond that, query times become unpredictable and results become unwieldy.
Use GROUPING SETS when you don’t need everything: If your report only needs 5 specific combinations out of 16 possible, write them explicitly:
-- Don't do this if you only need specific combinations
GROUP BY CUBE(a, b, c, d) -- 16 grouping sets
-- Do this instead
GROUP BY GROUPING SETS(
(a, b, c, d),
(a, b),
(c, d),
(a),
()
) -- 5 grouping sets
Indexing strategy: Create composite indexes that match your most common CUBE column combinations. The database can use index scans for grouping operations when indexes align with grouping columns.
Materialize for dashboards: If your dashboard hits the same CUBE query repeatedly, materialize the results into a summary table and refresh it periodically. Real-time CUBE queries against large tables will eventually cause problems.
Filter before CUBE, not after: Apply WHERE clauses to reduce the dataset before CUBE processes it. Filtering CUBE results after aggregation wastes computation.
-- Good: Filter first
SELECT region, product, SUM(sales)
FROM sales_data
WHERE sale_date >= '2024-01-01'
GROUP BY CUBE(region, product);
-- Bad: Aggregate everything, then filter
SELECT * FROM (
SELECT region, product, SUM(sales)
FROM sales_data
GROUP BY CUBE(region, product)
) sub
WHERE region = 'North' OR region IS NULL;
CUBE is a powerful tool for analytical queries, but it’s not a default choice. Use it when you genuinely need all dimension combinations. For hierarchical reports, ROLLUP is lighter. For specific combinations, GROUPING SETS gives you control. Match the operator to your actual requirements.