SQL - GROUP BY Clause with Examples
The GROUP BY clause is the backbone of SQL reporting. It takes scattered rows of data and collapses them into meaningful summaries. Without it, you'd be stuck scrolling through thousands of...
Key Insights
- GROUP BY collapses multiple rows into single summary rows based on shared column values, working hand-in-hand with aggregate functions like COUNT, SUM, and AVG to transform raw data into actionable insights.
- The HAVING clause filters groups after aggregation, while WHERE filters individual rows before grouping—confusing these is one of the most common SQL mistakes developers make.
- Every non-aggregated column in your SELECT must appear in your GROUP BY clause, or your query will fail (or worse, return unpredictable results in MySQL’s permissive mode).
Introduction to GROUP BY
The GROUP BY clause is the backbone of SQL reporting. It takes scattered rows of data and collapses them into meaningful summaries. Without it, you’d be stuck scrolling through thousands of individual records instead of seeing “you had 1,247 orders last month.”
The basic syntax is straightforward:
SELECT column1, AGGREGATE_FUNCTION(column2)
FROM table_name
WHERE condition
GROUP BY column1;
Let’s start with a practical example. Say you have an orders table and want to see how many orders exist in each status:
SELECT
status,
COUNT(*) AS order_count
FROM orders
GROUP BY status;
This might return:
| status | order_count |
|---|---|
| pending | 142 |
| shipped | 1,893 |
| delivered | 8,421 |
| cancelled | 67 |
Instead of 10,523 individual rows, you get four. That’s the power of GROUP BY.
How GROUP BY Works with Aggregate Functions
GROUP BY is useless without aggregate functions, and aggregate functions are limited without GROUP BY. They’re designed to work together.
Here are the primary aggregate functions you’ll use:
- COUNT() - Counts rows in each group
- SUM() - Totals numeric values
- AVG() - Calculates the mean
- MIN() / MAX() - Finds boundary values
Let’s see them in action with a sales scenario:
-- Sales totals by region
SELECT
region,
COUNT(*) AS total_orders,
SUM(amount) AS total_revenue,
AVG(amount) AS avg_order_value,
MIN(amount) AS smallest_order,
MAX(amount) AS largest_order
FROM sales
GROUP BY region;
For HR reporting, you might calculate department statistics:
-- Salary analysis by department
SELECT
department,
COUNT(*) AS employee_count,
ROUND(AVG(salary), 2) AS avg_salary,
SUM(salary) AS total_payroll
FROM employees
WHERE employment_status = 'active'
GROUP BY department
ORDER BY total_payroll DESC;
The mental model is simple: GROUP BY creates “buckets” based on unique values in the specified column(s). Every row gets placed into exactly one bucket. Then aggregate functions calculate a single value from all rows in each bucket.
Grouping by Multiple Columns
Single-column grouping is often too coarse. Real analysis requires drilling down into multiple dimensions.
When you group by multiple columns, SQL creates a bucket for each unique combination of values:
-- Revenue by year AND quarter
SELECT
EXTRACT(YEAR FROM order_date) AS year,
EXTRACT(QUARTER FROM order_date) AS quarter,
SUM(amount) AS quarterly_revenue,
COUNT(*) AS order_count
FROM orders
GROUP BY
EXTRACT(YEAR FROM order_date),
EXTRACT(QUARTER FROM order_date)
ORDER BY year, quarter;
Result:
| year | quarter | quarterly_revenue | order_count |
|---|---|---|---|
| 2023 | 1 | 245,000 | 1,203 |
| 2023 | 2 | 312,000 | 1,567 |
| 2023 | 3 | 287,000 | 1,421 |
| 2023 | 4 | 398,000 | 1,892 |
| 2024 | 1 | 267,000 | 1,334 |
Here’s another example tracking order status by product category:
-- Orders by category AND status
SELECT
p.category,
o.status,
COUNT(*) AS order_count,
SUM(o.quantity * o.unit_price) AS total_value
FROM orders o
JOIN products p ON o.product_id = p.id
GROUP BY p.category, o.status
ORDER BY p.category, o.status;
The column order in GROUP BY matters for readability and when combined with ORDER BY, but doesn’t affect which rows end up in which groups—only the unique combinations matter.
Filtering Groups with HAVING
Here’s where many developers stumble. You need to filter your grouped results, so you reach for WHERE. But WHERE filters individual rows before grouping happens. To filter after aggregation, use HAVING.
-- WRONG: This won't work
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
WHERE COUNT(*) > 10; -- Error!
-- CORRECT: Use HAVING for aggregate conditions
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 10;
The execution order clarifies this:
- FROM - Get the data
- WHERE - Filter individual rows
- GROUP BY - Create groups
- HAVING - Filter groups
- SELECT - Choose columns
- ORDER BY - Sort results
You can use both WHERE and HAVING in the same query:
-- Categories with significant sales (active products only)
SELECT
category,
SUM(quantity * price) AS total_sales,
COUNT(DISTINCT product_id) AS products_sold
FROM order_items oi
JOIN products p ON oi.product_id = p.id
WHERE p.is_active = true -- Filter rows first
GROUP BY category
HAVING SUM(quantity * price) > 1000 -- Then filter groups
ORDER BY total_sales DESC;
Rule of thumb: if your condition involves an aggregate function, it belongs in HAVING. If it involves raw column values, use WHERE.
Common Mistakes and Pitfalls
Mistake 1: Selecting Non-Grouped Columns
This is the most common GROUP BY error:
-- WRONG: Which employee_name should be shown for each department?
SELECT
department,
employee_name, -- Not aggregated, not in GROUP BY
AVG(salary)
FROM employees
GROUP BY department;
PostgreSQL and SQL Server will reject this outright. MySQL in permissive mode might return arbitrary values—a silent bug waiting to cause confusion.
The fix: either add the column to GROUP BY or aggregate it:
-- Option 1: Group by both columns
SELECT department, employee_name, AVG(salary)
FROM employees
GROUP BY department, employee_name;
-- Option 2: Use an aggregate function
SELECT department, COUNT(employee_name), AVG(salary)
FROM employees
GROUP BY department;
Mistake 2: Ignoring NULL Behavior
NULLs form their own group:
SELECT
region,
COUNT(*) AS customer_count
FROM customers
GROUP BY region;
| region | customer_count |
|---|---|
| North | 234 |
| South | 189 |
| NULL | 45 |
Those 45 customers with no region assigned get grouped together. Handle this explicitly:
SELECT
COALESCE(region, 'Unassigned') AS region,
COUNT(*) AS customer_count
FROM customers
GROUP BY COALESCE(region, 'Unassigned');
Mistake 3: COUNT(*) vs COUNT(column)
These aren’t the same:
SELECT
department,
COUNT(*) AS total_rows,
COUNT(manager_id) AS rows_with_manager
FROM employees
GROUP BY department;
COUNT(*) counts all rows. COUNT(column) counts non-NULL values in that column.
Real-World Use Cases
Monthly Active Users Report
SELECT
DATE_TRUNC('month', last_login) AS month,
COUNT(DISTINCT user_id) AS active_users,
COUNT(*) AS total_logins,
ROUND(COUNT(*)::DECIMAL / COUNT(DISTINCT user_id), 2) AS logins_per_user
FROM user_sessions
WHERE last_login >= DATE_TRUNC('year', CURRENT_DATE)
GROUP BY DATE_TRUNC('month', last_login)
ORDER BY month;
Top Customers by Purchase Volume
SELECT
c.id,
c.company_name,
COUNT(o.id) AS order_count,
SUM(o.total_amount) AS lifetime_value,
MAX(o.order_date) AS last_order_date
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.status = 'completed'
GROUP BY c.id, c.company_name
HAVING SUM(o.total_amount) > 10000
ORDER BY lifetime_value DESC
LIMIT 5;
Sales Dashboard Summary
SELECT
p.category,
DATE_TRUNC('week', o.created_at) AS week,
SUM(oi.quantity) AS units_sold,
SUM(oi.quantity * oi.unit_price) AS gross_revenue,
SUM(oi.quantity * (oi.unit_price - p.cost)) AS gross_profit,
COUNT(DISTINCT o.customer_id) AS unique_customers
FROM order_items oi
JOIN orders o ON oi.order_id = o.id
JOIN products p ON oi.product_id = p.id
WHERE o.created_at >= CURRENT_DATE - INTERVAL '12 weeks'
GROUP BY p.category, DATE_TRUNC('week', o.created_at)
ORDER BY p.category, week;
Performance Considerations
GROUP BY operations can be expensive. The database must sort or hash all rows to identify groups, then compute aggregates.
Index Your GROUP BY Columns
-- If you frequently group by these columns, index them
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_date_status ON orders(order_date, status);
Check Your Query Plans
EXPLAIN ANALYZE
SELECT status, COUNT(*)
FROM orders
GROUP BY status;
Look for “HashAggregate” (good for few groups) vs “GroupAggregate” (good for sorted data or many groups). A sequential scan on a large table before grouping is a red flag.
Consider Alternatives for Complex Analytics
When GROUP BY queries become too slow:
- Materialized views - Pre-compute common aggregations
- Window functions - When you need aggregates alongside detail rows
- Summary tables - Maintain running totals updated by triggers
-- Window function alternative when you need both detail and summary
SELECT
order_id,
customer_id,
amount,
SUM(amount) OVER (PARTITION BY customer_id) AS customer_total
FROM orders;
GROUP BY is fundamental SQL. Master it, understand its relationship with HAVING, and you’ll write cleaner, faster analytics queries. The patterns here cover 90% of real-world reporting needs—the rest is just combining these building blocks in creative ways.