SQL - HAVING Clause with Examples | Application Architect

Key Insights

The HAVING clause filters groups after aggregation, while WHERE filters individual rows before grouping—confusing these is one of the most common SQL mistakes
HAVING only makes sense with GROUP BY and aggregate functions; if you’re not aggregating data, you should be using WHERE instead
Combining WHERE and HAVING in the same query is not only valid but often necessary for optimal performance and correct results

Introduction to HAVING

The HAVING clause exists because WHERE has a fundamental limitation: it cannot filter based on aggregate function results. When you group data and want to keep only groups meeting certain criteria, HAVING is your tool.

Think of it this way: WHERE decides which rows participate in the grouping. HAVING decides which groups appear in the final result.

-- WHERE filters rows BEFORE grouping
SELECT department, COUNT(*) as employee_count
FROM employees
WHERE salary > 50000
GROUP BY department;

-- HAVING filters groups AFTER aggregation
SELECT department, COUNT(*) as employee_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;

The first query counts employees per department, but only considers employees earning over $50,000. The second query counts all employees per department, then excludes departments with five or fewer employees.

This distinction matters. Misunderstanding it leads to wrong results and confused debugging sessions.

Syntax and Basic Usage

The HAVING clause appears after GROUP BY in your query structure:

SELECT column1, aggregate_function(column2)
FROM table_name
WHERE condition
GROUP BY column1
HAVING aggregate_condition
ORDER BY column1;

SQL processes these clauses in a specific order:

FROM - identify the table(s)
WHERE - filter individual rows
GROUP BY - create groups
HAVING - filter groups
SELECT - compute final columns
ORDER BY - sort results

Here’s a practical example filtering groups by count:

-- Find product categories with at least 10 products
SELECT 
    category_id,
    COUNT(*) as product_count,
    AVG(price) as avg_price
FROM products
GROUP BY category_id
HAVING COUNT(*) >= 10
ORDER BY product_count DESC;

This query groups products by category, calculates the count and average price for each, then eliminates categories with fewer than 10 products.

HAVING with Aggregate Functions

HAVING shines when combined with aggregate functions. You can use any standard aggregate: SUM, AVG, MAX, MIN, COUNT, and their variations.

-- Find salespeople with total sales exceeding $100,000
SELECT 
    salesperson_id,
    SUM(amount) as total_sales,
    COUNT(*) as transaction_count
FROM sales
WHERE sale_date >= '2024-01-01'
GROUP BY salesperson_id
HAVING SUM(amount) > 100000
ORDER BY total_sales DESC;

You can combine multiple conditions using AND and OR:

-- Find customers with high value AND high frequency
SELECT 
    customer_id,
    COUNT(*) as order_count,
    SUM(total) as lifetime_value,
    AVG(total) as avg_order_value
FROM orders
GROUP BY customer_id
HAVING COUNT(*) >= 5 
   AND SUM(total) > 1000
   AND AVG(total) > 150;

This identifies customers who have placed at least 5 orders, spent over $1,000 total, and maintain an average order value above $150. All three conditions must be true for a customer to appear in results.

-- Find products that are either bestsellers OR high-margin
SELECT 
    product_id,
    SUM(quantity) as units_sold,
    AVG(profit_margin) as avg_margin
FROM order_items
GROUP BY product_id
HAVING SUM(quantity) > 1000 
    OR AVG(profit_margin) > 0.4;

HAVING vs WHERE: When to Use Each

The rule is straightforward: use WHERE for conditions on individual row values, use HAVING for conditions on aggregated values.

Here’s where developers trip up:

-- WRONG: This will error
SELECT department, AVG(salary)
FROM employees
WHERE AVG(salary) > 60000  -- Cannot use aggregate in WHERE
GROUP BY department;

-- CORRECT: Use HAVING for aggregate conditions
SELECT department, AVG(salary)
FROM employees
GROUP BY department
HAVING AVG(salary) > 60000;

Performance matters here. WHERE filters rows before grouping, reducing the data the database must process. HAVING filters after grouping, meaning all that aggregation work already happened.

-- Inefficient: Filters after grouping all data
SELECT 
    region,
    SUM(revenue) as total_revenue
FROM sales
GROUP BY region
HAVING region IN ('North', 'South');

-- Efficient: Filters before grouping
SELECT 
    region,
    SUM(revenue) as total_revenue
FROM sales
WHERE region IN ('North', 'South')
GROUP BY region;

Both queries produce identical results, but the second processes far less data. Only use HAVING when you genuinely need to filter on aggregated values.

Often you need both clauses working together:

-- Find high-performing sales regions for Q1 2024
SELECT 
    region,
    COUNT(*) as transaction_count,
    SUM(amount) as total_sales,
    AVG(amount) as avg_transaction
FROM sales
WHERE sale_date BETWEEN '2024-01-01' AND '2024-03-31'  -- Row filter
  AND status = 'completed'                              -- Row filter
GROUP BY region
HAVING SUM(amount) > 500000                            -- Group filter
   AND COUNT(*) > 100                                  -- Group filter
ORDER BY total_sales DESC;

WHERE narrows to completed Q1 sales. HAVING then keeps only regions exceeding $500,000 with over 100 transactions.

Advanced HAVING Patterns

HAVING with Subqueries

Subqueries in HAVING enable comparisons against calculated values:

-- Find departments with above-average headcount
SELECT 
    department,
    COUNT(*) as employee_count
FROM employees
GROUP BY department
HAVING COUNT(*) > (
    SELECT AVG(dept_count)
    FROM (
        SELECT COUNT(*) as dept_count
        FROM employees
        GROUP BY department
    ) as dept_counts
);

This compares each department’s headcount against the average headcount across all departments.

-- Find products selling better than category average
SELECT 
    p.category_id,
    p.product_id,
    SUM(oi.quantity) as units_sold
FROM products p
JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY p.category_id, p.product_id
HAVING SUM(oi.quantity) > (
    SELECT AVG(category_sales)
    FROM (
        SELECT SUM(oi2.quantity) as category_sales
        FROM products p2
        JOIN order_items oi2 ON p2.product_id = oi2.product_id
        WHERE p2.category_id = p.category_id
        GROUP BY p2.product_id
    ) as cat_avg
);

HAVING with CASE Expressions

CASE expressions let you create conditional aggregations:

-- Find customers with more returns than purchases
SELECT 
    customer_id,
    SUM(CASE WHEN type = 'purchase' THEN 1 ELSE 0 END) as purchases,
    SUM(CASE WHEN type = 'return' THEN 1 ELSE 0 END) as returns
FROM transactions
GROUP BY customer_id
HAVING SUM(CASE WHEN type = 'return' THEN 1 ELSE 0 END) > 
       SUM(CASE WHEN type = 'purchase' THEN 1 ELSE 0 END);

-- Find products with declining sales trend
SELECT 
    product_id,
    SUM(CASE WHEN sale_date >= CURRENT_DATE - INTERVAL '30 days' 
        THEN quantity ELSE 0 END) as last_30_days,
    SUM(CASE WHEN sale_date >= CURRENT_DATE - INTERVAL '60 days' 
             AND sale_date < CURRENT_DATE - INTERVAL '30 days'
        THEN quantity ELSE 0 END) as previous_30_days
FROM sales
WHERE sale_date >= CURRENT_DATE - INTERVAL '60 days'
GROUP BY product_id
HAVING SUM(CASE WHEN sale_date >= CURRENT_DATE - INTERVAL '30 days' 
           THEN quantity ELSE 0 END) < 
       SUM(CASE WHEN sale_date >= CURRENT_DATE - INTERVAL '60 days' 
                AND sale_date < CURRENT_DATE - INTERVAL '30 days'
           THEN quantity ELSE 0 END) * 0.8;

This finds products where recent sales dropped more than 20% compared to the previous period.

Practical Use Cases

Finding Duplicates

HAVING excels at duplicate detection:

-- Find duplicate email addresses
SELECT 
    email,
    COUNT(*) as occurrence_count
FROM users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY occurrence_count DESC;

-- Find duplicate orders (same customer, same day, same total)
SELECT 
    customer_id,
    order_date,
    total,
    COUNT(*) as duplicate_count
FROM orders
GROUP BY customer_id, order_date, total
HAVING COUNT(*) > 1;

Identifying Outliers

-- Find customers with unusually high order frequency
SELECT 
    customer_id,
    COUNT(*) as order_count,
    COUNT(*) * 1.0 / DATE_PART('day', MAX(order_date) - MIN(order_date)) as orders_per_day
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 50
   AND COUNT(*) * 1.0 / NULLIF(DATE_PART('day', MAX(order_date) - MIN(order_date)), 0) > 2;

Business Reporting

-- Top customers by lifetime value with minimum engagement
SELECT 
    c.customer_id,
    c.name,
    COUNT(DISTINCT o.order_id) as total_orders,
    SUM(o.total) as lifetime_value,
    MAX(o.order_date) as last_order
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
HAVING COUNT(DISTINCT o.order_id) >= 3
   AND SUM(o.total) > 500
ORDER BY lifetime_value DESC
LIMIT 100;

-- Low stock products needing reorder
SELECT 
    p.product_id,
    p.name,
    p.stock_quantity,
    COALESCE(SUM(oi.quantity), 0) as units_sold_30d
FROM products p
LEFT JOIN order_items oi ON p.product_id = oi.product_id
LEFT JOIN orders o ON oi.order_id = o.order_id 
    AND o.order_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY p.product_id, p.name, p.stock_quantity
HAVING p.stock_quantity < COALESCE(SUM(oi.quantity), 0) * 0.5
ORDER BY units_sold_30d DESC;

Summary and Best Practices

Quick Reference:

HAVING filters groups; WHERE filters rows
HAVING requires GROUP BY (with rare exceptions)
HAVING conditions use aggregate functions
WHERE executes before GROUP BY; HAVING executes after

Performance Tips:

Filter with WHERE whenever possible—it reduces data before aggregation
Only use HAVING for conditions that genuinely require aggregated values
Index columns used in WHERE clauses for better performance
Be cautious with subqueries in HAVING; they execute for each group

Common Pitfalls:

Using HAVING without GROUP BY (usually a mistake)
Putting row-level conditions in HAVING instead of WHERE
Referencing column aliases in HAVING (not supported in all databases)
Forgetting that NULL values affect aggregate calculations

HAVING is a focused tool. Use it when you need to filter aggregated results, and reach for WHERE for everything else. Master this distinction, and your SQL queries will be both correct and efficient.