How to Use GROUP BY in MySQL

GROUP BY is MySQL's mechanism for transforming detailed row-level data into summary statistics. Instead of returning every individual row, GROUP BY collapses rows sharing common values into single...

Key Insights

  • GROUP BY collapses rows with identical values in specified columns into summary rows, enabling aggregation with functions like COUNT, SUM, and AVG—essential for any reporting or analytics query.
  • The HAVING clause filters grouped results after aggregation, while WHERE filters individual rows before grouping; using them correctly dramatically improves query performance and accuracy.
  • MySQL’s ONLY_FULL_GROUP_BY mode enforces strict SQL standards, requiring every non-aggregated column in SELECT to appear in GROUP BY—understand this to avoid cryptic errors in production.

Introduction to GROUP BY

GROUP BY is MySQL’s mechanism for transforming detailed row-level data into summary statistics. Instead of returning every individual row, GROUP BY collapses rows sharing common values into single summary rows, allowing you to apply aggregate functions like COUNT, SUM, AVG, MAX, and MIN.

You’ll use GROUP BY constantly for reporting dashboards, financial summaries, user analytics, and any scenario where you need to answer questions like “how many orders per customer?” or “what’s the total revenue by product category?” Without GROUP BY, you’d be stuck with row-level data and forced to aggregate in application code—inefficient and error-prone.

Here’s the fundamental difference:

-- Without GROUP BY: returns every order
SELECT customer_id, order_amount 
FROM orders;

-- With GROUP BY: returns one row per customer with total
SELECT customer_id, SUM(order_amount) as total_spent
FROM orders
GROUP BY customer_id;

The first query might return 10,000 rows. The second returns one row per unique customer, with their aggregated spending. That’s the power of GROUP BY.

Basic GROUP BY Syntax

The basic structure is straightforward: SELECT the columns you want to group by plus any aggregate calculations, then specify GROUP BY with the grouping columns.

SELECT column_name, AGGREGATE_FUNCTION(column_name)
FROM table_name
GROUP BY column_name;

Let’s look at practical examples with different aggregate functions:

-- Count orders per customer
SELECT customer_id, COUNT(*) as order_count
FROM orders
GROUP BY customer_id;

-- Total sales revenue by product
SELECT product_id, SUM(sale_amount) as total_revenue
FROM sales
GROUP BY product_id;

-- Average order value by customer
SELECT customer_id, AVG(order_amount) as avg_order_value
FROM orders
GROUP BY customer_id;

-- Highest and lowest prices per category
SELECT category, MAX(price) as highest_price, MIN(price) as lowest_price
FROM products
GROUP BY category;

Each aggregate function serves a specific purpose. COUNT tells you volume, SUM gives you totals, AVG provides averages, while MAX and MIN identify extremes. You can combine multiple aggregates in a single query—there’s no limit.

Grouping by Multiple Columns

Real-world analysis often requires grouping by multiple dimensions simultaneously. MySQL processes multi-column GROUP BY hierarchically, creating groups for each unique combination of values.

-- Sales totals by region AND product category
SELECT 
    region,
    category,
    SUM(sale_amount) as total_sales,
    COUNT(*) as transaction_count
FROM sales
GROUP BY region, category
ORDER BY region, category;

This query creates separate groups for each region-category combination: (‘North’, ‘Electronics’), (‘North’, ‘Clothing’), (‘South’, ‘Electronics’), etc. The order of columns in GROUP BY doesn’t affect the grouping logic, but it can impact performance if you have indexes.

Here’s a more complex example showing quarterly sales by sales representative:

-- Quarterly performance by sales rep
SELECT 
    sales_rep_id,
    YEAR(sale_date) as sale_year,
    QUARTER(sale_date) as sale_quarter,
    SUM(amount) as quarterly_total,
    COUNT(DISTINCT customer_id) as unique_customers
FROM sales
GROUP BY sales_rep_id, YEAR(sale_date), QUARTER(sale_date)
ORDER BY sales_rep_id, sale_year, sale_quarter;

Notice the use of DISTINCT within COUNT—this counts unique customers per group, not total transactions. This kind of nuanced aggregation is where GROUP BY truly shines.

Using HAVING vs WHERE

This is where many developers stumble. WHERE filters rows before grouping happens. HAVING filters the grouped results after aggregation. They’re not interchangeable.

-- WHERE: Filter rows before grouping
-- Only include orders from 2024, then group
SELECT customer_id, COUNT(*) as order_count
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY customer_id;

-- HAVING: Filter groups after aggregation
-- Group all orders, then only show customers with 5+ orders
SELECT customer_id, COUNT(*) as order_count
FROM orders
GROUP BY customer_id
HAVING COUNT(*) >= 5;

-- Combined: Filter rows first, then filter groups
-- Only 2024 orders, only customers with 5+ orders in 2024
SELECT customer_id, COUNT(*) as order_count
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY customer_id
HAVING COUNT(*) >= 5;

Performance tip: Always use WHERE when possible. Filtering 1 million rows down to 100,000 before grouping is far more efficient than grouping all 1 million then filtering the results. Use HAVING only for conditions that require aggregated values.

-- Efficient: WHERE filters first
SELECT category, AVG(price) as avg_price
FROM products
WHERE in_stock = 1
GROUP BY category
HAVING AVG(price) > 100;

-- Less efficient: Everything in HAVING
SELECT category, AVG(price) as avg_price
FROM products
GROUP BY category
HAVING AVG(price) > 100 AND MAX(in_stock) = 1;

Common Patterns and Best Practices

Ordering Grouped Results

GROUP BY doesn’t guarantee any particular order. Always use ORDER BY explicitly:

-- Top 10 customers by total spending
SELECT 
    customer_id,
    SUM(order_amount) as total_spent
FROM orders
GROUP BY customer_id
ORDER BY total_spent DESC
LIMIT 10;

Combining GROUP BY with JOINs

You’ll frequently need to join tables before grouping:

-- Sales by product name (not just ID)
SELECT 
    p.product_name,
    p.category,
    COUNT(s.sale_id) as units_sold,
    SUM(s.sale_amount) as total_revenue
FROM products p
INNER JOIN sales s ON p.product_id = s.product_id
GROUP BY p.product_id, p.product_name, p.category
ORDER BY total_revenue DESC;

Notice we GROUP BY product_id even though we’re selecting product_name. This is required by ONLY_FULL_GROUP_BY mode (more on this shortly).

Index Optimization

Indexes on GROUP BY columns dramatically improve performance:

-- Create index on commonly grouped columns
CREATE INDEX idx_sales_date_product ON sales(sale_date, product_id);

-- This query will use the index efficiently
SELECT 
    sale_date,
    product_id,
    SUM(amount) as daily_product_total
FROM sales
GROUP BY sale_date, product_id;

The index should match your GROUP BY column order for optimal performance.

Common Pitfalls and Errors

The most common error is selecting columns that aren’t in GROUP BY and aren’t aggregated:

-- WRONG: customer_name isn't aggregated or in GROUP BY
SELECT customer_id, customer_name, COUNT(*) as order_count
FROM orders
GROUP BY customer_id;

-- CORRECT: Include customer_name in GROUP BY
SELECT customer_id, customer_name, COUNT(*) as order_count
FROM orders
GROUP BY customer_id, customer_name;

-- ALSO CORRECT: Use MAX/MIN if you know values are identical
SELECT customer_id, MAX(customer_name) as customer_name, COUNT(*) as order_count
FROM orders
GROUP BY customer_id;

MySQL’s ONLY_FULL_GROUP_BY SQL mode enforces this rule. If enabled (it’s default in MySQL 5.7.5+), the first query will fail with an error. This is good—it prevents ambiguous results. If customer_id 123 has multiple different customer_name values, which one should MySQL return? The mode forces you to be explicit.

Another pitfall is forgetting that NULL values form their own group:

-- NULL category forms a separate group
SELECT category, COUNT(*) as product_count
FROM products
GROUP BY category;

-- Explicitly handle NULLs if needed
SELECT 
    COALESCE(category, 'Uncategorized') as category,
    COUNT(*) as product_count
FROM products
GROUP BY category;

Practical Real-World Example

Let’s build a comprehensive e-commerce sales report that combines multiple GROUP BY techniques:

-- Monthly sales dashboard: revenue, orders, and customer metrics
SELECT 
    DATE_FORMAT(o.order_date, '%Y-%m') as month,
    p.category,
    COUNT(DISTINCT o.order_id) as total_orders,
    COUNT(DISTINCT o.customer_id) as unique_customers,
    SUM(oi.quantity) as units_sold,
    SUM(oi.quantity * oi.unit_price) as gross_revenue,
    AVG(oi.quantity * oi.unit_price) as avg_order_value,
    SUM(oi.quantity * oi.unit_price) / COUNT(DISTINCT o.customer_id) as revenue_per_customer
FROM orders o
INNER JOIN order_items oi ON o.order_id = oi.order_id
INNER JOIN products p ON oi.product_id = p.product_id
WHERE o.order_date >= DATE_SUB(CURDATE(), INTERVAL 12 MONTH)
    AND o.status = 'completed'
GROUP BY DATE_FORMAT(o.order_date, '%Y-%m'), p.category
HAVING gross_revenue > 1000
ORDER BY month DESC, gross_revenue DESC;

This query demonstrates:

  • Multi-table JOINs before grouping
  • Multiple aggregate functions (COUNT, SUM, AVG)
  • DISTINCT within aggregates for unique counts
  • Calculated fields (revenue_per_customer)
  • Date formatting for time-based grouping
  • WHERE for pre-filtering
  • HAVING for post-aggregation filtering
  • ORDER BY for meaningful result ordering

This is production-ready code that generates actionable business intelligence. The key is understanding how each piece—WHERE, JOIN, GROUP BY, HAVING, ORDER BY—works together in the correct sequence.

Master GROUP BY and you’ll transform raw data into insights that drive decisions. Start with simple single-column grouping, progress to multi-dimensional analysis, and always remember: WHERE before grouping, HAVING after.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.