SQL Aggregate Functions: SUM, COUNT, AVG, MIN, MAX
Aggregate functions are SQL's built-in tools for summarizing data. Instead of returning every row in a table, they perform calculations across sets of rows and return a single result. This is...
Key Insights
- Aggregate functions collapse multiple rows into single summary values, making them essential for reporting and analytics—COUNT, SUM, AVG, MIN, and MAX cover 90% of business intelligence needs
- NULL values are handled differently across aggregate functions: COUNT(column) excludes them while SUM and AVG skip them in calculations, which can silently skew your results if you’re not careful
- Combining aggregates with GROUP BY and HAVING transforms basic summaries into powerful analytical queries, but mixing aggregated and non-aggregated columns without proper grouping is the most common SQL mistake
Introduction to Aggregate Functions
Aggregate functions are SQL’s built-in tools for summarizing data. Instead of returning every row in a table, they perform calculations across sets of rows and return a single result. This is fundamental to data analysis—you rarely want to see all 10 million transactions; you want to know the total revenue, average order value, or customer count.
Every aggregate function operates on a column (or expression) and reduces multiple values to one. Here’s the difference:
-- Regular SELECT: returns all rows
SELECT order_amount FROM orders;
-- Returns: 150.00, 75.50, 200.00, 125.75, ...
-- Aggregate SELECT: returns one summary value
SELECT COUNT(*) FROM orders;
-- Returns: 1247
The five core aggregate functions—COUNT, SUM, AVG, MIN, and MAX—handle the vast majority of analytical queries you’ll write. Master these, and you’ll handle most business reporting requirements.
COUNT - Counting Rows and Values
COUNT is the most frequently used aggregate function. It answers the question “how many?” but the syntax matters significantly.
COUNT(*) counts all rows, including those with NULL values:
SELECT COUNT(*) AS total_orders
FROM orders;
-- Returns: 1247 (every row in the table)
COUNT(column_name) counts only non-NULL values in that specific column:
SELECT COUNT(customer_email) AS customers_with_email
FROM orders;
-- Returns: 1089 (excludes 158 rows where email is NULL)
This distinction is critical. If you’re counting optional fields like email addresses or phone numbers, COUNT(*) and COUNT(column) will give different results. Always use COUNT(column) when you specifically want to count non-NULL values.
COUNT(DISTINCT column_name) counts unique values:
SELECT COUNT(DISTINCT customer_id) AS unique_customers
FROM orders;
-- Returns: 423 (unique customers, even if they placed multiple orders)
This is invaluable for metrics like “monthly active users” or “products sold.” The difference between total orders (1247) and unique customers (423) tells you that customers average about 3 orders each.
SUM and AVG - Numerical Calculations
SUM adds up numeric values, while AVG calculates the mean. Both ignore NULL values, which can be either helpful or dangerous depending on your use case.
SELECT
SUM(order_amount) AS total_revenue,
AVG(order_amount) AS average_order_value
FROM orders;
-- Returns: total_revenue: 187,450.00, average_order_value: 150.28
Notice that AVG divides by the count of non-NULL values, not all rows. If your table has 1247 rows but 47 have NULL amounts, AVG divides the sum by 1200, not 1247. This is usually what you want, but be aware of it.
Data type considerations matter. Summing integers returns an integer, which can cause unexpected truncation:
-- Integer division truncates
SELECT AVG(quantity) FROM order_items;
-- If quantities are 1, 2, 2, 3, returns 2 (not 2.0)
-- Cast to decimal for precision
SELECT AVG(CAST(quantity AS DECIMAL(10,2))) FROM order_items;
-- Returns: 2.00
Here’s a practical example calculating multiple metrics:
SELECT
COUNT(*) AS total_orders,
SUM(order_amount) AS total_revenue,
AVG(order_amount) AS avg_order_value,
SUM(order_amount) / COUNT(DISTINCT customer_id) AS revenue_per_customer
FROM orders
WHERE order_date >= '2024-01-01';
This single query gives you order volume, revenue, average order size, and customer value—the foundation of most sales dashboards.
MIN and MAX - Finding Extremes
MIN and MAX find the smallest and largest values in a column. They work with numbers, dates, and even strings (alphabetically).
Numeric data:
SELECT
MIN(order_amount) AS smallest_order,
MAX(order_amount) AS largest_order
FROM orders;
-- Returns: smallest_order: 5.99, largest_order: 2,450.00
Date data:
SELECT
MIN(order_date) AS first_order,
MAX(order_date) AS most_recent_order
FROM orders;
-- Returns: first_order: 2023-01-15, most_recent_order: 2024-12-30
This is perfect for finding date ranges in your data or identifying the newest/oldest records.
String data:
SELECT
MIN(product_name) AS first_alphabetically,
MAX(product_name) AS last_alphabetically
FROM products;
-- Returns: first_alphabetically: "Adapter Cable", last_alphabetically: "Wireless Mouse"
String comparison follows alphabetical ordering, which is occasionally useful for finding edge cases in product catalogs or customer lists.
GROUP BY - Aggregating Subsets
The real power of aggregate functions emerges when you combine them with GROUP BY. Instead of one summary for the entire table, you get summaries for each category.
SELECT
category,
COUNT(*) AS product_count,
AVG(price) AS avg_price,
MIN(price) AS cheapest,
MAX(price) AS most_expensive
FROM products
GROUP BY category;
Results:
category | product_count | avg_price | cheapest | most_expensive
-------------|---------------|-----------|----------|---------------
Electronics | 45 | 129.99 | 9.99 | 899.99
Clothing | 120 | 34.50 | 12.99 | 89.99
Home & Garden| 67 | 45.75 | 5.99 | 249.99
Each category gets its own row with aggregated statistics. This is how you build comparative reports.
HAVING filters aggregated results (WHERE filters before aggregation):
SELECT
category,
COUNT(*) AS product_count,
AVG(price) AS avg_price
FROM products
WHERE price > 10 -- Filter individual rows first
GROUP BY category
HAVING COUNT(*) > 50 -- Then filter aggregated results
ORDER BY avg_price DESC;
This finds categories with more than 50 products (after excluding items under $10), sorted by average price. HAVING operates on the aggregated results, while WHERE operates on individual rows before aggregation.
Real-World Use Case
Here’s a complete sales dashboard query combining everything:
SELECT
p.category,
DATE_TRUNC('month', o.order_date) AS month,
COUNT(DISTINCT o.order_id) AS total_orders,
COUNT(DISTINCT o.customer_id) AS unique_customers,
SUM(oi.quantity) AS units_sold,
SUM(oi.quantity * oi.unit_price) AS revenue,
AVG(oi.quantity * oi.unit_price) AS avg_order_value,
MIN(o.order_date) AS first_order_date,
MAX(o.order_date) AS last_order_date
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE o.order_date >= '2024-01-01'
AND o.status = 'completed'
GROUP BY p.category, DATE_TRUNC('month', o.order_date)
HAVING SUM(oi.quantity * oi.unit_price) > 1000
ORDER BY month DESC, revenue DESC;
This query delivers monthly category performance: order volume, customer count, units sold, revenue, and average order value—filtered to completed orders with meaningful revenue (over $1000 per category per month).
Common Pitfalls and Best Practices
NULL handling requires attention. Aggregates ignore NULLs, but this can produce unexpected results:
-- If some orders have NULL amounts, this might be misleading
SELECT AVG(order_amount) FROM orders;
-- Better: make NULL handling explicit
SELECT
AVG(COALESCE(order_amount, 0)) AS avg_including_nulls,
AVG(order_amount) AS avg_excluding_nulls,
COUNT(*) - COUNT(order_amount) AS null_count
FROM orders;
The most common error is selecting non-aggregated columns without GROUP BY:
-- WRONG: customer_name isn't aggregated or grouped
SELECT customer_name, COUNT(*)
FROM orders;
-- CORRECT: group by customer_name
SELECT customer_name, COUNT(*)
FROM orders
GROUP BY customer_name;
-- ALSO CORRECT: aggregate everything
SELECT COUNT(DISTINCT customer_name), COUNT(*)
FROM orders;
Most databases will reject the first query with an error. PostgreSQL is strict about this; MySQL historically allowed it but returned unpredictable results.
Performance matters with large datasets. Aggregate queries scan entire tables (or indexes). For millions of rows:
- Ensure your WHERE clause uses indexed columns
- Consider materialized views for frequently-run aggregate queries
- Use EXPLAIN to verify your query plan isn’t doing full table scans unnecessarily
-- Good: filters on indexed order_date before aggregating
SELECT category, SUM(amount)
FROM orders
WHERE order_date >= '2024-01-01' -- indexed column
GROUP BY category;
-- Less efficient: aggregates everything, then filters
SELECT category, SUM(amount)
FROM orders
GROUP BY category
HAVING MAX(order_date) >= '2024-01-01';
Aggregate functions are the foundation of data analysis in SQL. Master COUNT, SUM, AVG, MIN, and MAX with GROUP BY, and you’ll handle the majority of business intelligence requirements. The key is understanding NULL behavior, properly grouping columns, and writing queries that filter efficiently before aggregating.