SQL - Aggregate Functions (COUNT, SUM, AVG, MIN, MAX)
Aggregate functions are the workhorses of SQL reporting. They take multiple rows of data and collapse them into single summary values. Without them, you'd be pulling raw data into application code...
Key Insights
- Aggregate functions collapse multiple rows into single summary values, transforming raw data into actionable metrics—master these five functions and you’ll handle 90% of reporting queries.
- COUNT(*) counts all rows including NULLs, while COUNT(column) only counts non-NULL values—this distinction catches even experienced developers off guard.
- HAVING filters after aggregation while WHERE filters before—use WHERE to reduce the dataset early for better performance, then HAVING only when you need to filter on calculated aggregates.
Introduction to Aggregate Functions
Aggregate functions are the workhorses of SQL reporting. They take multiple rows of data and collapse them into single summary values. Without them, you’d be pulling raw data into application code and calculating totals, averages, and counts yourself—a waste of database power and network bandwidth.
Here’s the fundamental difference between regular queries and aggregate queries:
-- Returns every order row (potentially millions)
SELECT order_id, amount
FROM orders;
-- Returns a single row with the total
SELECT SUM(amount) AS total_revenue
FROM orders;
The first query might return 10 million rows. The second returns one number. That’s the power of aggregation—the database does the heavy lifting where the data lives.
The five core aggregate functions are COUNT, SUM, AVG, MIN, and MAX. Each serves a specific purpose, and understanding their nuances will make your reporting queries both correct and efficient.
COUNT: Counting Rows and Values
COUNT is deceptively simple until you realize it has three distinct behaviors depending on how you call it.
-- COUNT(*): Counts all rows, including those with NULL values
SELECT COUNT(*) AS total_orders
FROM orders;
-- COUNT(column): Counts non-NULL values in that column
SELECT COUNT(shipped_date) AS shipped_orders
FROM orders;
-- COUNT(DISTINCT column): Counts unique non-NULL values
SELECT COUNT(DISTINCT customer_id) AS unique_customers
FROM orders;
The NULL handling is where developers get burned. Consider an orders table where shipped_date is NULL for pending orders:
-- Sample data scenario:
-- 1000 total orders
-- 850 have shipped (shipped_date is not NULL)
-- 150 are pending (shipped_date is NULL)
SELECT
COUNT(*) AS total_orders, -- Returns 1000
COUNT(shipped_date) AS shipped, -- Returns 850
COUNT(*) - COUNT(shipped_date) AS pending -- Returns 150
FROM orders;
This pattern is useful for calculating fill rates or completion percentages:
SELECT
COUNT(email) * 100.0 / COUNT(*) AS email_capture_rate
FROM customers;
SUM and AVG: Numeric Calculations
SUM adds up all values in a column. AVG calculates the arithmetic mean. Both ignore NULL values entirely—they don’t treat NULL as zero.
-- Total revenue from all orders
SELECT SUM(amount) AS total_revenue
FROM orders
WHERE status = 'completed';
-- Average order value
SELECT AVG(amount) AS avg_order_value
FROM orders
WHERE status = 'completed';
Watch out for data type precision. Integer division can bite you:
-- If quantity and price are integers, this might truncate
SELECT SUM(quantity * price) AS revenue
FROM order_items;
-- Cast to decimal for accurate calculations
SELECT SUM(quantity * CAST(price AS DECIMAL(10,2))) AS revenue
FROM order_items;
You can combine aggregates with arithmetic for derived metrics:
SELECT
SUM(amount) AS gross_revenue,
SUM(discount) AS total_discounts,
SUM(amount) - SUM(discount) AS net_revenue,
AVG(amount - discount) AS avg_net_order_value
FROM orders
WHERE created_at >= '2024-01-01';
A common gotcha: AVG ignores NULLs, which might not be what you want. If you have ratings where NULL means “not rated” and you want those to count as zero:
-- This ignores unrated items
SELECT AVG(rating) FROM products;
-- This treats NULL ratings as 0
SELECT AVG(COALESCE(rating, 0)) FROM products;
MIN and MAX: Finding Extremes
MIN and MAX work on any comparable data type—numbers, dates, and strings.
-- Price range for products
SELECT
MIN(price) AS cheapest,
MAX(price) AS most_expensive,
MAX(price) - MIN(price) AS price_spread
FROM products
WHERE active = true;
-- Date range for order history
SELECT
MIN(created_at) AS first_order,
MAX(created_at) AS latest_order
FROM orders
WHERE customer_id = 12345;
With strings, MIN and MAX use alphabetical ordering:
-- First and last customer alphabetically
SELECT
MIN(last_name) AS first_alphabetically,
MAX(last_name) AS last_alphabetically
FROM customers;
A practical pattern is finding the most recent record per group, though this often requires a subquery or window function for the full row:
-- Latest order date per customer (just the date)
SELECT
customer_id,
MAX(created_at) AS last_order_date
FROM orders
GROUP BY customer_id;
GROUP BY: Aggregates by Category
GROUP BY transforms aggregate functions from whole-table summaries into per-category breakdowns. This is where aggregates become truly powerful for reporting.
-- Sales by region
SELECT
region,
COUNT(*) AS order_count,
SUM(amount) AS total_sales,
AVG(amount) AS avg_order_value
FROM orders
GROUP BY region;
You can group by multiple columns for more granular breakdowns:
-- Monthly sales by region
SELECT
region,
DATE_TRUNC('month', created_at) AS month,
SUM(amount) AS monthly_sales
FROM orders
GROUP BY region, DATE_TRUNC('month', created_at)
ORDER BY region, month;
A critical rule: every non-aggregated column in your SELECT must appear in GROUP BY. This query fails:
-- ERROR: column "customer_name" must appear in GROUP BY clause
SELECT
region,
customer_name, -- Not aggregated, not grouped
SUM(amount)
FROM orders
GROUP BY region;
You either group by it or aggregate it:
-- Option 1: Add to GROUP BY
SELECT region, customer_name, SUM(amount)
FROM orders
GROUP BY region, customer_name;
-- Option 2: Aggregate it somehow
SELECT region, COUNT(DISTINCT customer_name) AS customer_count, SUM(amount)
FROM orders
GROUP BY region;
HAVING: Filtering Aggregated Results
WHERE filters rows before aggregation. HAVING filters groups after aggregation. This distinction matters for both correctness and performance.
-- WHERE: Filter rows before aggregation
-- Find total sales for completed orders only
SELECT
customer_id,
SUM(amount) AS total_spent
FROM orders
WHERE status = 'completed' -- Filters individual rows
GROUP BY customer_id;
-- HAVING: Filter groups after aggregation
-- Find customers who spent more than $10,000
SELECT
customer_id,
SUM(amount) AS total_spent
FROM orders
WHERE status = 'completed'
GROUP BY customer_id
HAVING SUM(amount) > 10000; -- Filters aggregated results
You can combine multiple HAVING conditions:
-- High-value, frequent customers
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(amount) AS total_spent,
AVG(amount) AS avg_order
FROM orders
WHERE created_at >= '2024-01-01'
GROUP BY customer_id
HAVING COUNT(*) >= 5
AND SUM(amount) > 5000;
Performance tip: always use WHERE for non-aggregate conditions. This query is inefficient:
-- Bad: Filters on non-aggregate in HAVING
SELECT region, SUM(amount)
FROM orders
GROUP BY region
HAVING region != 'test';
-- Good: Filter early with WHERE
SELECT region, SUM(amount)
FROM orders
WHERE region != 'test'
GROUP BY region;
Practical Patterns and Performance Tips
Real dashboards need multiple metrics in one query. Combine aggregates efficiently:
-- Executive dashboard summary
SELECT
COUNT(*) AS total_orders,
COUNT(DISTINCT customer_id) AS unique_customers,
SUM(amount) AS gross_revenue,
SUM(amount) - SUM(COALESCE(refund_amount, 0)) AS net_revenue,
AVG(amount) AS avg_order_value,
MIN(created_at) AS period_start,
MAX(created_at) AS period_end,
COUNT(CASE WHEN status = 'completed' THEN 1 END) AS completed_orders,
COUNT(CASE WHEN status = 'refunded' THEN 1 END) AS refunded_orders
FROM orders
WHERE created_at >= DATE_TRUNC('month', CURRENT_DATE);
The conditional COUNT pattern using CASE is extremely useful for creating pivot-style summaries without multiple queries.
For performance, consider these guidelines:
Index the columns you filter on. A query with WHERE created_at >= '2024-01-01' benefits from an index on created_at. The aggregate itself doesn’t need an index—it operates on whatever rows survive the WHERE clause.
Filter early, aggregate late. Move as many conditions as possible into WHERE rather than HAVING. The database processes fewer rows.
Be careful with DISTINCT in aggregates. COUNT(DISTINCT column) is more expensive than COUNT(column) because the database must track unique values. Use it when you need it, but don’t add DISTINCT reflexively.
Consider covering indexes for frequent aggregate queries. If you constantly run SELECT region, SUM(amount) FROM orders GROUP BY region, an index on (region, amount) lets the database answer the query from the index alone.
Aggregate functions are foundational SQL. Master these five, understand how GROUP BY partitions your data, and know when to use WHERE versus HAVING. You’ll write cleaner queries, build faster reports, and let the database do what it does best—crunch numbers close to the data.