SQL - GROUP BY Clause with Examples | Application Architect

Key Insights

GROUP BY collapses multiple rows into single summary rows based on shared column values, working hand-in-hand with aggregate functions like COUNT, SUM, and AVG to transform raw data into actionable insights.
The HAVING clause filters groups after aggregation, while WHERE filters individual rows before grouping—confusing these is one of the most common SQL mistakes developers make.
Every non-aggregated column in your SELECT must appear in your GROUP BY clause, or your query will fail (or worse, return unpredictable results in MySQL’s permissive mode).

Introduction to GROUP BY

The GROUP BY clause is the backbone of SQL reporting. It takes scattered rows of data and collapses them into meaningful summaries. Without it, you’d be stuck scrolling through thousands of individual records instead of seeing “you had 1,247 orders last month.”

The basic syntax is straightforward:

SELECT column1, AGGREGATE_FUNCTION(column2)
FROM table_name
WHERE condition
GROUP BY column1;

Let’s start with a practical example. Say you have an orders table and want to see how many orders exist in each status:

SELECT 
    status,
    COUNT(*) AS order_count
FROM orders
GROUP BY status;

This might return:

status	order_count
pending	142
shipped	1,893
delivered	8,421
cancelled	67

Instead of 10,523 individual rows, you get four. That’s the power of GROUP BY.

How GROUP BY Works with Aggregate Functions

GROUP BY is useless without aggregate functions, and aggregate functions are limited without GROUP BY. They’re designed to work together.

Here are the primary aggregate functions you’ll use:

COUNT() - Counts rows in each group
SUM() - Totals numeric values
AVG() - Calculates the mean
MIN() / MAX() - Finds boundary values

Let’s see them in action with a sales scenario:

-- Sales totals by region
SELECT 
    region,
    COUNT(*) AS total_orders,
    SUM(amount) AS total_revenue,
    AVG(amount) AS avg_order_value,
    MIN(amount) AS smallest_order,
    MAX(amount) AS largest_order
FROM sales
GROUP BY region;

For HR reporting, you might calculate department statistics:

-- Salary analysis by department
SELECT 
    department,
    COUNT(*) AS employee_count,
    ROUND(AVG(salary), 2) AS avg_salary,
    SUM(salary) AS total_payroll
FROM employees
WHERE employment_status = 'active'
GROUP BY department
ORDER BY total_payroll DESC;

The mental model is simple: GROUP BY creates “buckets” based on unique values in the specified column(s). Every row gets placed into exactly one bucket. Then aggregate functions calculate a single value from all rows in each bucket.

Grouping by Multiple Columns

Single-column grouping is often too coarse. Real analysis requires drilling down into multiple dimensions.

When you group by multiple columns, SQL creates a bucket for each unique combination of values:

-- Revenue by year AND quarter
SELECT 
    EXTRACT(YEAR FROM order_date) AS year,
    EXTRACT(QUARTER FROM order_date) AS quarter,
    SUM(amount) AS quarterly_revenue,
    COUNT(*) AS order_count
FROM orders
GROUP BY 
    EXTRACT(YEAR FROM order_date),
    EXTRACT(QUARTER FROM order_date)
ORDER BY year, quarter;

Result:

year	quarter	quarterly_revenue	order_count
2023	1	245,000	1,203
2023	2	312,000	1,567
2023	3	287,000	1,421
2023	4	398,000	1,892
2024	1	267,000	1,334

Here’s another example tracking order status by product category:

-- Orders by category AND status
SELECT 
    p.category,
    o.status,
    COUNT(*) AS order_count,
    SUM(o.quantity * o.unit_price) AS total_value
FROM orders o
JOIN products p ON o.product_id = p.id
GROUP BY p.category, o.status
ORDER BY p.category, o.status;

The column order in GROUP BY matters for readability and when combined with ORDER BY, but doesn’t affect which rows end up in which groups—only the unique combinations matter.

Filtering Groups with HAVING

Here’s where many developers stumble. You need to filter your grouped results, so you reach for WHERE. But WHERE filters individual rows before grouping happens. To filter after aggregation, use HAVING.

-- WRONG: This won't work
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
WHERE COUNT(*) > 10;  -- Error!

-- CORRECT: Use HAVING for aggregate conditions
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 10;

The execution order clarifies this:

FROM - Get the data
WHERE - Filter individual rows
GROUP BY - Create groups
HAVING - Filter groups
SELECT - Choose columns
ORDER BY - Sort results

You can use both WHERE and HAVING in the same query:

-- Categories with significant sales (active products only)
SELECT 
    category,
    SUM(quantity * price) AS total_sales,
    COUNT(DISTINCT product_id) AS products_sold
FROM order_items oi
JOIN products p ON oi.product_id = p.id
WHERE p.is_active = true           -- Filter rows first
GROUP BY category
HAVING SUM(quantity * price) > 1000 -- Then filter groups
ORDER BY total_sales DESC;

Rule of thumb: if your condition involves an aggregate function, it belongs in HAVING. If it involves raw column values, use WHERE.

Common Mistakes and Pitfalls

Mistake 1: Selecting Non-Grouped Columns

This is the most common GROUP BY error:

-- WRONG: Which employee_name should be shown for each department?
SELECT 
    department,
    employee_name,  -- Not aggregated, not in GROUP BY
    AVG(salary)
FROM employees
GROUP BY department;

PostgreSQL and SQL Server will reject this outright. MySQL in permissive mode might return arbitrary values—a silent bug waiting to cause confusion.

The fix: either add the column to GROUP BY or aggregate it:

-- Option 1: Group by both columns
SELECT department, employee_name, AVG(salary)
FROM employees
GROUP BY department, employee_name;

-- Option 2: Use an aggregate function
SELECT department, COUNT(employee_name), AVG(salary)
FROM employees
GROUP BY department;

Mistake 2: Ignoring NULL Behavior

NULLs form their own group:

SELECT 
    region,
    COUNT(*) AS customer_count
FROM customers
GROUP BY region;

region	customer_count
North	234
South	189
NULL	45

Those 45 customers with no region assigned get grouped together. Handle this explicitly:

SELECT 
    COALESCE(region, 'Unassigned') AS region,
    COUNT(*) AS customer_count
FROM customers
GROUP BY COALESCE(region, 'Unassigned');

Mistake 3: COUNT(*) vs COUNT(column)

These aren’t the same:

SELECT 
    department,
    COUNT(*) AS total_rows,
    COUNT(manager_id) AS rows_with_manager
FROM employees
GROUP BY department;

COUNT(*) counts all rows. COUNT(column) counts non-NULL values in that column.

Real-World Use Cases

Monthly Active Users Report

SELECT 
    DATE_TRUNC('month', last_login) AS month,
    COUNT(DISTINCT user_id) AS active_users,
    COUNT(*) AS total_logins,
    ROUND(COUNT(*)::DECIMAL / COUNT(DISTINCT user_id), 2) AS logins_per_user
FROM user_sessions
WHERE last_login >= DATE_TRUNC('year', CURRENT_DATE)
GROUP BY DATE_TRUNC('month', last_login)
ORDER BY month;

Top Customers by Purchase Volume

SELECT 
    c.id,
    c.company_name,
    COUNT(o.id) AS order_count,
    SUM(o.total_amount) AS lifetime_value,
    MAX(o.order_date) AS last_order_date
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.status = 'completed'
GROUP BY c.id, c.company_name
HAVING SUM(o.total_amount) > 10000
ORDER BY lifetime_value DESC
LIMIT 5;

Sales Dashboard Summary

SELECT 
    p.category,
    DATE_TRUNC('week', o.created_at) AS week,
    SUM(oi.quantity) AS units_sold,
    SUM(oi.quantity * oi.unit_price) AS gross_revenue,
    SUM(oi.quantity * (oi.unit_price - p.cost)) AS gross_profit,
    COUNT(DISTINCT o.customer_id) AS unique_customers
FROM order_items oi
JOIN orders o ON oi.order_id = o.id
JOIN products p ON oi.product_id = p.id
WHERE o.created_at >= CURRENT_DATE - INTERVAL '12 weeks'
GROUP BY p.category, DATE_TRUNC('week', o.created_at)
ORDER BY p.category, week;

Performance Considerations

GROUP BY operations can be expensive. The database must sort or hash all rows to identify groups, then compute aggregates.

Index Your GROUP BY Columns

-- If you frequently group by these columns, index them
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_date_status ON orders(order_date, status);

Check Your Query Plans

EXPLAIN ANALYZE
SELECT status, COUNT(*)
FROM orders
GROUP BY status;

Look for “HashAggregate” (good for few groups) vs “GroupAggregate” (good for sorted data or many groups). A sequential scan on a large table before grouping is a red flag.

Consider Alternatives for Complex Analytics

When GROUP BY queries become too slow:

Materialized views - Pre-compute common aggregations
Window functions - When you need aggregates alongside detail rows
Summary tables - Maintain running totals updated by triggers

-- Window function alternative when you need both detail and summary
SELECT 
    order_id,
    customer_id,
    amount,
    SUM(amount) OVER (PARTITION BY customer_id) AS customer_total
FROM orders;

GROUP BY is fundamental SQL. Master it, understand its relationship with HAVING, and you’ll write cleaner, faster analytics queries. The patterns here cover 90% of real-world reporting needs—the rest is just combining these building blocks in creative ways.