How to Write Subqueries in MySQL

Key Insights

Subqueries let you nest queries inside other queries, but choosing between scalar, row, and correlated subqueries dramatically affects both readability and performance.
Correlated subqueries execute once per row in the outer query—understand this behavior before using them on large datasets.
EXISTS often outperforms IN for existence checks, especially when the subquery returns many rows, because it short-circuits after finding the first match.

Introduction to Subqueries

A subquery is a query nested inside another SQL statement. The inner query executes first (usually), and its result feeds into the outer query. You’ll also hear them called nested queries or inner queries.

Here’s the basic anatomy:

SELECT column_name
FROM table_name
WHERE column_name = (SELECT column_name FROM another_table WHERE condition);
--                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
--                   This is the subquery (inner query)

When should you reach for a subquery instead of a JOIN? Use subqueries when you need to:

Compare values against aggregated results
Check for existence without returning the matched data
Create derived tables for complex filtering
Make your intent clearer than an equivalent JOIN would

JOINs are typically faster for combining data from multiple tables. Subqueries shine when you’re filtering based on computed values or when the logic reads more naturally as a nested question.

Scalar Subqueries

Scalar subqueries return exactly one value—one row, one column. MySQL expects this when you use a subquery where a single value belongs.

In the WHERE clause:

-- Find employees earning above the average salary
SELECT employee_id, first_name, last_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

The subquery calculates the average salary once, then MySQL compares each employee’s salary against that single value.

In the SELECT clause:

-- Show each department with its employee count
SELECT 
    department_name,
    (SELECT COUNT(*) 
     FROM employees e 
     WHERE e.department_id = d.department_id) AS employee_count
FROM departments d;

In the HAVING clause:

-- Find departments where average salary exceeds company average
SELECT department_id, AVG(salary) AS dept_avg
FROM employees
GROUP BY department_id
HAVING AVG(salary) > (SELECT AVG(salary) FROM employees);

If your scalar subquery returns more than one row, MySQL throws an error. Always verify your subquery logic returns exactly one value.

Row and Table Subqueries

When you need to compare against multiple values, use row and table subqueries with the IN, ANY, or ALL operators.

Using IN for multiple values:

-- Find products in categories with low inventory (less than 10 total units)
SELECT product_id, product_name, category_id
FROM products
WHERE category_id IN (
    SELECT category_id
    FROM products
    GROUP BY category_id
    HAVING SUM(units_in_stock) < 10
);

The subquery returns a list of category IDs, and the outer query matches against that list.

Using ANY (equivalent to SOME):

-- Find employees who earn more than at least one manager
SELECT employee_id, first_name, salary
FROM employees
WHERE salary > ANY (
    SELECT salary 
    FROM employees 
    WHERE job_title = 'Manager'
);

ANY returns true if the comparison succeeds for at least one value in the subquery result.

Using ALL for stricter comparisons:

-- Find products more expensive than ALL products in the 'Budget' category
SELECT product_id, product_name, price
FROM products
WHERE price > ALL (
    SELECT price 
    FROM products 
    WHERE category_name = 'Budget'
);

ALL requires the comparison to succeed against every value returned by the subquery. If the subquery returns an empty set, ALL conditions evaluate to true—a common gotcha.

Correlated Subqueries

Correlated subqueries reference columns from the outer query. This creates a dependency: the subquery executes once for each row the outer query processes.

-- Find customers whose orders exceed their personal average order value
SELECT 
    c.customer_id,
    c.customer_name,
    o.order_id,
    o.order_total
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_total > (
    SELECT AVG(o2.order_total)
    FROM orders o2
    WHERE o2.customer_id = c.customer_id  -- References outer query
);

Notice how c.customer_id in the subquery references the outer query’s customer. MySQL must re-execute this subquery for each customer-order combination.

Another practical example:

-- Find the most recent order for each customer
SELECT o1.customer_id, o1.order_id, o1.order_date
FROM orders o1
WHERE o1.order_date = (
    SELECT MAX(o2.order_date)
    FROM orders o2
    WHERE o2.customer_id = o1.customer_id
);

Performance warning: Correlated subqueries can devastate performance on large tables. If your outer query returns 10,000 rows, the subquery runs 10,000 times. Always check execution plans and consider rewriting as JOINs when performance matters.

Subqueries in FROM Clause (Derived Tables)

You can use a subquery as a temporary table in the FROM clause. MySQL calls these derived tables, and they require an alias.

-- Find top customers by region (customers above regional average)
SELECT 
    regional_stats.region,
    c.customer_id,
    c.customer_name,
    c.total_purchases
FROM customers c
JOIN (
    SELECT 
        region,
        AVG(total_purchases) AS avg_purchases
    FROM customers
    GROUP BY region
) AS regional_stats ON c.region = regional_stats.region
WHERE c.total_purchases > regional_stats.avg_purchases
ORDER BY regional_stats.region, c.total_purchases DESC;

The derived table calculates regional averages, then we join against it to find above-average customers.

Complex aggregation example:

-- Monthly sales summary with running comparison
SELECT 
    monthly.sale_month,
    monthly.total_sales,
    monthly.order_count,
    monthly.total_sales / monthly.order_count AS avg_order_value
FROM (
    SELECT 
        DATE_FORMAT(order_date, '%Y-%m') AS sale_month,
        SUM(order_total) AS total_sales,
        COUNT(*) AS order_count
    FROM orders
    WHERE order_date >= DATE_SUB(CURDATE(), INTERVAL 12 MONTH)
    GROUP BY DATE_FORMAT(order_date, '%Y-%m')
) AS monthly
ORDER BY monthly.sale_month;

Derived tables let you aggregate first, then filter or join against those aggregates—something you can’t do directly in a single query level.

EXISTS and NOT EXISTS

EXISTS tests whether a subquery returns any rows. It doesn’t care about the actual values—just whether rows exist.

-- Find customers who have never placed an order
SELECT c.customer_id, c.customer_name, c.email
FROM customers c
WHERE NOT EXISTS (
    SELECT 1 
    FROM orders o 
    WHERE o.customer_id = c.customer_id
);

The SELECT 1 is a convention—EXISTS only checks for row existence, so the selected value doesn’t matter.

EXISTS vs IN performance:

-- Using IN (can be slow with large subquery results)
SELECT customer_id, customer_name
FROM customers
WHERE customer_id IN (SELECT customer_id FROM orders WHERE order_total > 1000);

-- Using EXISTS (often faster for large datasets)
SELECT c.customer_id, c.customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1 
    FROM orders o 
    WHERE o.customer_id = c.customer_id 
    AND o.order_total > 1000
);

EXISTS short-circuits after finding the first match. IN must build the complete result set from the subquery before comparison. For large tables with proper indexes, EXISTS typically wins.

Practical use case—data integrity check:

-- Find products that have never been ordered
SELECT p.product_id, p.product_name, p.created_date
FROM products p
WHERE NOT EXISTS (
    SELECT 1 
    FROM order_items oi 
    WHERE oi.product_id = p.product_id
)
AND p.created_date < DATE_SUB(CURDATE(), INTERVAL 30 DAY);

Best Practices and Performance Tips

When to use subqueries vs JOINs:

Use subqueries when:

You need aggregates in WHERE conditions
EXISTS/NOT EXISTS checks are cleaner than LEFT JOIN + NULL checks
The logic reads more naturally as a nested question

Use JOINs when:

You need columns from multiple tables in the result
Performance is critical and you’ve benchmarked both approaches
You’re combining large datasets

Consider CTEs for complex queries:

-- CTE version (MySQL 8.0+) - often more readable than nested subqueries
WITH high_value_customers AS (
    SELECT customer_id, SUM(order_total) AS lifetime_value
    FROM orders
    GROUP BY customer_id
    HAVING SUM(order_total) > 10000
)
SELECT c.customer_name, hvc.lifetime_value
FROM customers c
JOIN high_value_customers hvc ON c.customer_id = hvc.customer_id;

Refactoring a slow subquery:

-- Slow: Correlated subquery runs for every product
SELECT p.product_id, p.product_name,
    (SELECT SUM(quantity) FROM order_items oi WHERE oi.product_id = p.product_id) AS total_sold
FROM products p;

-- Faster: Single JOIN with aggregation
SELECT p.product_id, p.product_name, COALESCE(SUM(oi.quantity), 0) AS total_sold
FROM products p
LEFT JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY p.product_id, p.product_name;

Indexing considerations:

Index columns used in subquery WHERE clauses
Index columns used in correlation conditions (the link between inner and outer queries)
Use EXPLAIN to verify MySQL uses your indexes

Common pitfalls:

NULL handling with NOT IN: If the subquery returns any NULL values, NOT IN returns no rows. Use NOT EXISTS instead.
Forgetting derived table aliases: MySQL requires them—always alias your FROM clause subqueries.
Over-nesting: If you have three or more levels of subqueries, consider CTEs or breaking the query into steps.

Subqueries are a powerful tool when used appropriately. Master the different types, understand their execution behavior, and always verify performance on realistic data volumes.