SQL: HAVING vs WHERE

Key Insights

WHERE filters individual rows before grouping occurs, while HAVING filters groups after aggregation—using the wrong one will either fail or cause performance issues
SQL processes WHERE before GROUP BY and HAVING after it, meaning WHERE reduces the dataset early while HAVING operates on aggregated results
Always prefer WHERE for non-aggregated conditions since filtering rows early is significantly more efficient than filtering groups later

Introduction: The Filtering Dilemma

Every SQL developer eventually writes a query that throws an error like “aggregate function not allowed in WHERE clause” or wonders why their HAVING clause runs slower than expected. The confusion between WHERE and HAVING is one of the most common stumbling blocks in SQL, and it’s not just about syntax—it’s about understanding how SQL processes queries.

The fundamental difference is simple: WHERE filters rows, HAVING filters groups. But the implications of this distinction affect query correctness, performance, and readability. Let’s look at a query using both:

SELECT 
    department,
    COUNT(*) as employee_count,
    AVG(salary) as avg_salary
FROM employees
WHERE hire_date >= '2020-01-01'
GROUP BY department
HAVING COUNT(*) > 5;

This query filters employees hired since 2020 (WHERE), groups them by department, then shows only departments with more than 5 such employees (HAVING). Each clause has a distinct job, and they’re not interchangeable.

WHERE Clause: Filtering Before Aggregation

The WHERE clause operates on individual rows before any grouping happens. It’s your first line of defense for reducing the dataset, and it can reference any column in your tables—but it cannot use aggregate functions.

Think of WHERE as a bouncer checking IDs at the door. Each row either passes the condition or gets excluded before the query moves forward. This happens early in query processing, which makes WHERE clauses excellent for performance when properly indexed.

-- Filter employees before any aggregation
SELECT 
    department,
    first_name,
    last_name,
    salary
FROM employees
WHERE salary > 75000
    AND department IN ('Engineering', 'Product')
    AND status = 'active';

This query examines each row in the employees table individually. If a row’s salary is above 75000, belongs to Engineering or Product, and has active status, it passes through. Otherwise, it’s excluded from further processing.

WHERE is perfect for:

Filtering by specific column values
Date range restrictions
JOIN conditions
Excluding null or invalid data
Any condition that applies to individual rows

-- Counting employees by department, but only active ones hired this year
SELECT 
    department,
    COUNT(*) as new_hires
FROM employees
WHERE status = 'active'
    AND hire_date >= '2024-01-01'
GROUP BY department;

Notice that WHERE runs before GROUP BY. We filter down to active employees hired in 2024 first, then group them. This is efficient because we’re not grouping unnecessary rows.

HAVING Clause: Filtering After Aggregation

HAVING filters groups after aggregation has occurred. It’s specifically designed to work with GROUP BY and can use aggregate functions like COUNT, SUM, AVG, MAX, and MIN. You cannot use HAVING without grouping (explicitly or implicitly).

Think of HAVING as a quality control check that happens after you’ve assembled your groups. It looks at the aggregated results and decides which groups make the cut.

-- Find departments with more than 10 employees
SELECT 
    department,
    COUNT(*) as employee_count,
    AVG(salary) as avg_salary
FROM employees
GROUP BY department
HAVING COUNT(*) > 10;

This query groups all employees by department, counts them, calculates average salary, then filters out any department with 10 or fewer employees. You couldn’t do this with WHERE because the count doesn’t exist until after grouping.

HAVING is essential for:

Filtering based on aggregate calculations
Finding outliers in grouped data
Setting thresholds for grouped results
Comparing aggregated values

-- Find high-performing sales regions
SELECT 
    region,
    COUNT(DISTINCT customer_id) as customer_count,
    SUM(order_total) as total_revenue,
    AVG(order_total) as avg_order_value
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY region
HAVING SUM(order_total) > 100000
    AND AVG(order_total) > 500;

This query finds regions that generated over $100,000 in revenue with an average order value above $500. These conditions only make sense after aggregation.

Query Execution Order

Understanding SQL’s logical processing order is crucial for using WHERE and HAVING correctly. SQL doesn’t execute your query in the order you write it. The logical processing order is:

FROM (and JOINs)
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
LIMIT

This means WHERE filters the raw data before grouping, while HAVING filters the grouped results. Here’s a side-by-side comparison:

-- Example dataset
CREATE TABLE sales (
    id INTEGER,
    product_id INTEGER,
    sale_date DATE,
    amount DECIMAL(10,2),
    region TEXT
);

-- Query showing execution flow
SELECT 
    region,
    COUNT(*) as sale_count,
    SUM(amount) as total_sales
FROM sales                           -- 1. Start with all rows
WHERE sale_date >= '2024-01-01'     -- 2. Filter to 2024 sales (row-level)
GROUP BY region                      -- 3. Group remaining rows by region
HAVING SUM(amount) > 50000          -- 4. Filter groups with >$50k sales
ORDER BY total_sales DESC;          -- 5. Sort the results

Step by step:

FROM loads the sales table (let’s say 10,000 rows)
WHERE reduces it to 3,000 rows (2024 sales only)
GROUP BY creates 5 groups (one per region) from those 3,000 rows
HAVING eliminates 2 groups that didn’t hit $50k
We get 3 regions in our final result

If you tried to put SUM(amount) > 50000 in the WHERE clause, you’d get an error because SUM doesn’t exist yet at that processing stage.

Common Use Cases & Patterns

The most powerful queries often use WHERE and HAVING together, each doing what it does best.

-- Sales report: Filter by date range (WHERE), 
-- then find high-revenue products (HAVING)
SELECT 
    product_name,
    COUNT(*) as order_count,
    SUM(quantity) as units_sold,
    SUM(quantity * unit_price) as total_revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE o.order_date BETWEEN '2024-01-01' AND '2024-03-31'
    AND o.status = 'completed'
    AND p.category = 'Electronics'
GROUP BY p.product_id, p.product_name
HAVING SUM(quantity * unit_price) > 10000
ORDER BY total_revenue DESC;

This query demonstrates the pattern:

WHERE narrows to completed electronics orders in Q1 2024 (row-level filtering)
GROUP BY aggregates by product
HAVING keeps only products that generated over $10,000 (group-level filtering)

Another common pattern is finding anomalies:

-- Find customers with unusual order patterns
SELECT 
    customer_id,
    COUNT(*) as order_count,
    AVG(order_total) as avg_order,
    MAX(order_total) as largest_order
FROM orders
WHERE order_date >= DATE('now', '-6 months')
GROUP BY customer_id
HAVING COUNT(*) > 20
    OR MAX(order_total) > 5000;

This identifies customers who either ordered frequently (>20 times) or made a large purchase (>$5000) in the last six months.

Performance Considerations

WHERE clauses are generally more efficient than HAVING because they reduce the dataset early. Every row filtered by WHERE is one less row to group and aggregate.

-- Less efficient: Filtering after aggregation
SELECT 
    department,
    COUNT(*) as emp_count
FROM employees
GROUP BY department
HAVING department IN ('Engineering', 'Sales');

-- More efficient: Filter before grouping
SELECT 
    department,
    COUNT(*) as emp_count
FROM employees
WHERE department IN ('Engineering', 'Sales')
GROUP BY department;

Both queries produce the same result, but the second is faster because it excludes other departments before grouping. If you have 10 departments and 1,000 employees, the first query groups all 1,000 rows then discards 8 groups. The second groups only the relevant rows from the start.

You can verify this with SQLite’s query planner:

EXPLAIN QUERY PLAN
SELECT department, COUNT(*) as emp_count
FROM employees
WHERE department IN ('Engineering', 'Sales')
GROUP BY department;

Indexes work differently with each clause:

WHERE can use indexes on filtered columns (highly effective)
HAVING operates on aggregated results, so indexes on base columns don’t help the HAVING clause itself
However, indexes still help the WHERE and JOIN clauses that feed into HAVING

Common Mistakes & Best Practices

Mistake 1: Using aggregate functions in WHERE

-- This will fail
SELECT department, AVG(salary)
FROM employees
WHERE AVG(salary) > 75000
GROUP BY department;

-- Correct: Use HAVING for aggregates
SELECT department, AVG(salary)
FROM employees
GROUP BY department
HAVING AVG(salary) > 75000;

Mistake 2: Using HAVING when WHERE would work

-- Inefficient
SELECT department, COUNT(*)
FROM employees
GROUP BY department
HAVING department = 'Engineering';

-- Better
SELECT department, COUNT(*)
FROM employees
WHERE department = 'Engineering'
GROUP BY department;

Mistake 3: Forgetting you can use both

-- Get departments with high average salaries,
-- but only count employees hired after 2020
SELECT 
    department,
    COUNT(*) as recent_hires,
    AVG(salary) as avg_salary
FROM employees
WHERE hire_date > '2020-01-01'
GROUP BY department
HAVING AVG(salary) > 80000;

Quick Reference:

Aspect	WHERE	HAVING
Filters	Individual rows	Grouped results
Timing	Before GROUP BY	After GROUP BY
Can use aggregates?	No	Yes
Works without GROUP BY?	Yes	No
Performance	Better (reduces early)	Operates on aggregated data
Use for	Column values, JOINs	Aggregate conditions

The rule of thumb: if you can use WHERE, use WHERE. Only use HAVING when you need to filter based on aggregated calculations. This keeps your queries fast, clear, and correct.