SQL: HAVING vs WHERE
Every SQL developer eventually writes a query that throws an error like 'aggregate function not allowed in WHERE clause' or wonders why their HAVING clause runs slower than expected. The confusion...
Key Insights
- WHERE filters individual rows before grouping occurs, while HAVING filters groups after aggregation—using the wrong one will either fail or cause performance issues
- SQL processes WHERE before GROUP BY and HAVING after it, meaning WHERE reduces the dataset early while HAVING operates on aggregated results
- Always prefer WHERE for non-aggregated conditions since filtering rows early is significantly more efficient than filtering groups later
Introduction: The Filtering Dilemma
Every SQL developer eventually writes a query that throws an error like “aggregate function not allowed in WHERE clause” or wonders why their HAVING clause runs slower than expected. The confusion between WHERE and HAVING is one of the most common stumbling blocks in SQL, and it’s not just about syntax—it’s about understanding how SQL processes queries.
The fundamental difference is simple: WHERE filters rows, HAVING filters groups. But the implications of this distinction affect query correctness, performance, and readability. Let’s look at a query using both:
SELECT
department,
COUNT(*) as employee_count,
AVG(salary) as avg_salary
FROM employees
WHERE hire_date >= '2020-01-01'
GROUP BY department
HAVING COUNT(*) > 5;
This query filters employees hired since 2020 (WHERE), groups them by department, then shows only departments with more than 5 such employees (HAVING). Each clause has a distinct job, and they’re not interchangeable.
WHERE Clause: Filtering Before Aggregation
The WHERE clause operates on individual rows before any grouping happens. It’s your first line of defense for reducing the dataset, and it can reference any column in your tables—but it cannot use aggregate functions.
Think of WHERE as a bouncer checking IDs at the door. Each row either passes the condition or gets excluded before the query moves forward. This happens early in query processing, which makes WHERE clauses excellent for performance when properly indexed.
-- Filter employees before any aggregation
SELECT
department,
first_name,
last_name,
salary
FROM employees
WHERE salary > 75000
AND department IN ('Engineering', 'Product')
AND status = 'active';
This query examines each row in the employees table individually. If a row’s salary is above 75000, belongs to Engineering or Product, and has active status, it passes through. Otherwise, it’s excluded from further processing.
WHERE is perfect for:
- Filtering by specific column values
- Date range restrictions
- JOIN conditions
- Excluding null or invalid data
- Any condition that applies to individual rows
-- Counting employees by department, but only active ones hired this year
SELECT
department,
COUNT(*) as new_hires
FROM employees
WHERE status = 'active'
AND hire_date >= '2024-01-01'
GROUP BY department;
Notice that WHERE runs before GROUP BY. We filter down to active employees hired in 2024 first, then group them. This is efficient because we’re not grouping unnecessary rows.
HAVING Clause: Filtering After Aggregation
HAVING filters groups after aggregation has occurred. It’s specifically designed to work with GROUP BY and can use aggregate functions like COUNT, SUM, AVG, MAX, and MIN. You cannot use HAVING without grouping (explicitly or implicitly).
Think of HAVING as a quality control check that happens after you’ve assembled your groups. It looks at the aggregated results and decides which groups make the cut.
-- Find departments with more than 10 employees
SELECT
department,
COUNT(*) as employee_count,
AVG(salary) as avg_salary
FROM employees
GROUP BY department
HAVING COUNT(*) > 10;
This query groups all employees by department, counts them, calculates average salary, then filters out any department with 10 or fewer employees. You couldn’t do this with WHERE because the count doesn’t exist until after grouping.
HAVING is essential for:
- Filtering based on aggregate calculations
- Finding outliers in grouped data
- Setting thresholds for grouped results
- Comparing aggregated values
-- Find high-performing sales regions
SELECT
region,
COUNT(DISTINCT customer_id) as customer_count,
SUM(order_total) as total_revenue,
AVG(order_total) as avg_order_value
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY region
HAVING SUM(order_total) > 100000
AND AVG(order_total) > 500;
This query finds regions that generated over $100,000 in revenue with an average order value above $500. These conditions only make sense after aggregation.
Query Execution Order
Understanding SQL’s logical processing order is crucial for using WHERE and HAVING correctly. SQL doesn’t execute your query in the order you write it. The logical processing order is:
- FROM (and JOINs)
- WHERE
- GROUP BY
- HAVING
- SELECT
- ORDER BY
- LIMIT
This means WHERE filters the raw data before grouping, while HAVING filters the grouped results. Here’s a side-by-side comparison:
-- Example dataset
CREATE TABLE sales (
id INTEGER,
product_id INTEGER,
sale_date DATE,
amount DECIMAL(10,2),
region TEXT
);
-- Query showing execution flow
SELECT
region,
COUNT(*) as sale_count,
SUM(amount) as total_sales
FROM sales -- 1. Start with all rows
WHERE sale_date >= '2024-01-01' -- 2. Filter to 2024 sales (row-level)
GROUP BY region -- 3. Group remaining rows by region
HAVING SUM(amount) > 50000 -- 4. Filter groups with >$50k sales
ORDER BY total_sales DESC; -- 5. Sort the results
Step by step:
- FROM loads the sales table (let’s say 10,000 rows)
- WHERE reduces it to 3,000 rows (2024 sales only)
- GROUP BY creates 5 groups (one per region) from those 3,000 rows
- HAVING eliminates 2 groups that didn’t hit $50k
- We get 3 regions in our final result
If you tried to put SUM(amount) > 50000 in the WHERE clause, you’d get an error because SUM doesn’t exist yet at that processing stage.
Common Use Cases & Patterns
The most powerful queries often use WHERE and HAVING together, each doing what it does best.
-- Sales report: Filter by date range (WHERE),
-- then find high-revenue products (HAVING)
SELECT
product_name,
COUNT(*) as order_count,
SUM(quantity) as units_sold,
SUM(quantity * unit_price) as total_revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE o.order_date BETWEEN '2024-01-01' AND '2024-03-31'
AND o.status = 'completed'
AND p.category = 'Electronics'
GROUP BY p.product_id, p.product_name
HAVING SUM(quantity * unit_price) > 10000
ORDER BY total_revenue DESC;
This query demonstrates the pattern:
- WHERE narrows to completed electronics orders in Q1 2024 (row-level filtering)
- GROUP BY aggregates by product
- HAVING keeps only products that generated over $10,000 (group-level filtering)
Another common pattern is finding anomalies:
-- Find customers with unusual order patterns
SELECT
customer_id,
COUNT(*) as order_count,
AVG(order_total) as avg_order,
MAX(order_total) as largest_order
FROM orders
WHERE order_date >= DATE('now', '-6 months')
GROUP BY customer_id
HAVING COUNT(*) > 20
OR MAX(order_total) > 5000;
This identifies customers who either ordered frequently (>20 times) or made a large purchase (>$5000) in the last six months.
Performance Considerations
WHERE clauses are generally more efficient than HAVING because they reduce the dataset early. Every row filtered by WHERE is one less row to group and aggregate.
-- Less efficient: Filtering after aggregation
SELECT
department,
COUNT(*) as emp_count
FROM employees
GROUP BY department
HAVING department IN ('Engineering', 'Sales');
-- More efficient: Filter before grouping
SELECT
department,
COUNT(*) as emp_count
FROM employees
WHERE department IN ('Engineering', 'Sales')
GROUP BY department;
Both queries produce the same result, but the second is faster because it excludes other departments before grouping. If you have 10 departments and 1,000 employees, the first query groups all 1,000 rows then discards 8 groups. The second groups only the relevant rows from the start.
You can verify this with SQLite’s query planner:
EXPLAIN QUERY PLAN
SELECT department, COUNT(*) as emp_count
FROM employees
WHERE department IN ('Engineering', 'Sales')
GROUP BY department;
Indexes work differently with each clause:
- WHERE can use indexes on filtered columns (highly effective)
- HAVING operates on aggregated results, so indexes on base columns don’t help the HAVING clause itself
- However, indexes still help the WHERE and JOIN clauses that feed into HAVING
Common Mistakes & Best Practices
Mistake 1: Using aggregate functions in WHERE
-- This will fail
SELECT department, AVG(salary)
FROM employees
WHERE AVG(salary) > 75000
GROUP BY department;
-- Correct: Use HAVING for aggregates
SELECT department, AVG(salary)
FROM employees
GROUP BY department
HAVING AVG(salary) > 75000;
Mistake 2: Using HAVING when WHERE would work
-- Inefficient
SELECT department, COUNT(*)
FROM employees
GROUP BY department
HAVING department = 'Engineering';
-- Better
SELECT department, COUNT(*)
FROM employees
WHERE department = 'Engineering'
GROUP BY department;
Mistake 3: Forgetting you can use both
-- Get departments with high average salaries,
-- but only count employees hired after 2020
SELECT
department,
COUNT(*) as recent_hires,
AVG(salary) as avg_salary
FROM employees
WHERE hire_date > '2020-01-01'
GROUP BY department
HAVING AVG(salary) > 80000;
Quick Reference:
| Aspect | WHERE | HAVING |
|---|---|---|
| Filters | Individual rows | Grouped results |
| Timing | Before GROUP BY | After GROUP BY |
| Can use aggregates? | No | Yes |
| Works without GROUP BY? | Yes | No |
| Performance | Better (reduces early) | Operates on aggregated data |
| Use for | Column values, JOINs | Aggregate conditions |
The rule of thumb: if you can use WHERE, use WHERE. Only use HAVING when you need to filter based on aggregated calculations. This keeps your queries fast, clear, and correct.