SQL Subqueries: Correlated and Non-Correlated

Key Insights

Non-correlated subqueries execute once independently and pass results to the outer query, making them predictable and often cacheable by the query optimizer
Correlated subqueries execute once per outer row and reference outer query columns, which can cause performance issues on large datasets but enable row-by-row comparisons impossible with simple joins
Modern SQL optimizers can sometimes rewrite correlated subqueries as joins, but understanding the execution difference helps you write queries that perform well across different database systems

Understanding Subqueries

A subquery is a SELECT statement nested inside another SQL statement. Think of it as a query within a query—the inner query produces results that the outer query consumes. Subqueries let you break complex problems into logical steps, making your SQL more readable and often more maintainable than sprawling joins with multiple conditions.

The critical distinction between correlated and non-correlated subqueries lies in their execution model. Non-correlated subqueries run independently, while correlated subqueries depend on values from the outer query. This difference fundamentally affects performance, use cases, and how you should approach query optimization.

Here’s a simple subquery finding employees who earn more than the company average:

SELECT employee_id, name, salary
FROM employees
WHERE salary > (
    SELECT AVG(salary)
    FROM employees
);

This non-correlated subquery calculates the average once, then the outer query filters against that single value.

Non-Correlated Subqueries

Non-correlated subqueries execute exactly once, independent of the outer query. The database runs the inner query first, gets the result set, and then uses those results to execute the outer query. This independence makes them efficient and predictable.

You’ll typically use non-correlated subqueries in three places: WHERE clauses for filtering, FROM clauses as inline views, and SELECT lists for scalar values.

Filtering with IN:

SELECT customer_id, customer_name, email
FROM customers
WHERE customer_id IN (
    SELECT DISTINCT customer_id
    FROM orders
    WHERE order_date >= '2024-01-01'
);

This finds all customers who placed orders this year. The subquery runs once, returns a list of customer IDs, and the outer query filters against that list.

Inline views in FROM:

SELECT 
    dept.department_name,
    dept_stats.avg_salary,
    dept_stats.employee_count
FROM departments dept
JOIN (
    SELECT 
        department_id,
        AVG(salary) as avg_salary,
        COUNT(*) as employee_count
    FROM employees
    GROUP BY department_id
) dept_stats ON dept.department_id = dept_stats.department_id
WHERE dept_stats.employee_count > 5;

The subquery creates a derived table with aggregated statistics. This pattern is powerful when you need to aggregate data before joining or when you want to filter on aggregated values.

Scalar subqueries in SELECT:

SELECT 
    product_id,
    product_name,
    price,
    (SELECT AVG(price) FROM products) as avg_price,
    price - (SELECT AVG(price) FROM products) as price_difference
FROM products;

A scalar subquery returns exactly one value (one row, one column). Here it calculates how each product’s price compares to the average. Note that the subquery executes once per SELECT statement, not once per row, though the optimizer may handle this differently.

Correlated Subqueries

Correlated subqueries reference columns from the outer query, creating a dependency that changes their execution model entirely. The database must execute the subquery once for each row processed by the outer query. This row-by-row execution can be expensive but enables comparisons that would be difficult or impossible with simple joins.

Row-by-row comparison:

SELECT e1.employee_id, e1.name, e1.salary, e1.department_id
FROM employees e1
WHERE e1.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);

This finds employees earning above their department’s average. For each employee row, the subquery recalculates the average for that employee’s department. The e1.department_id reference creates the correlation.

EXISTS for existence checks:

SELECT d.department_id, d.department_name
FROM departments d
WHERE EXISTS (
    SELECT 1
    FROM employees e
    WHERE e.department_id = d.department_id
    AND e.hire_date >= '2024-01-01'
);

EXISTS returns true if the subquery returns any rows. It’s optimized for boolean checks—the database stops scanning as soon as it finds one matching row. The SELECT 1 is a convention; the actual selected values don’t matter since EXISTS only checks for row existence.

NOT EXISTS for finding orphaned records:

SELECT p.product_id, p.product_name
FROM products p
WHERE NOT EXISTS (
    SELECT 1
    FROM order_items oi
    WHERE oi.product_id = p.product_id
);

This finds products that have never been ordered. NOT EXISTS is often the clearest way to express “find records without related records” logic.

Performance Considerations

The execution difference between correlated and non-correlated subqueries has real performance implications. A non-correlated subquery runs once; a correlated subquery potentially runs thousands or millions of times.

Consider this comparison:

-- Correlated subquery
SELECT e1.employee_id, e1.name, e1.department_id
FROM employees e1
WHERE e1.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);

-- Equivalent JOIN
SELECT e.employee_id, e.name, e.department_id
FROM employees e
JOIN (
    SELECT department_id, AVG(salary) as avg_salary
    FROM employees
    GROUP BY department_id
) dept_avg ON e.department_id = dept_avg.department_id
WHERE e.salary > dept_avg.avg_salary;

The JOIN version calculates each department average once, then joins the results. The correlated version recalculates for each employee. On a table with 10,000 employees across 50 departments, the correlated version might execute the subquery 10,000 times versus 50 aggregations in the JOIN version.

However, modern optimizers can sometimes transform correlated subqueries into joins automatically. Run EXPLAIN to see what your database actually does:

EXPLAIN SELECT e1.employee_id, e1.name
FROM employees e1
WHERE e1.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);

If you see “DEPENDENT SUBQUERY” in MySQL or similar indicators in other databases, you’re getting row-by-row execution. Look for join operations to confirm optimization.

When EXISTS or NOT EXISTS checks indexed foreign keys, correlated subqueries can actually outperform joins because the database can use index seeks and stop at the first match. Profile both approaches with your actual data.

Common Patterns and Best Practices

ANY and ALL operators:

-- Find products more expensive than ANY product in category 'Electronics'
SELECT product_name, price
FROM products
WHERE price > ANY (
    SELECT price
    FROM products
    WHERE category = 'Electronics'
);

-- Find products more expensive than ALL products in category 'Books'
SELECT product_name, price
FROM products
WHERE price > ALL (
    SELECT price
    FROM products
    WHERE category = 'Books'
);

ANY returns true if the comparison is true for at least one value. ALL requires it to be true for every value. These are non-correlated but enable set-based comparisons.

Multiple column subqueries:

SELECT employee_id, name, department_id, salary
FROM employees
WHERE (department_id, salary) IN (
    SELECT department_id, MAX(salary)
    FROM employees
    GROUP BY department_id
);

This finds the highest-paid employee in each department using a tuple comparison. Not all databases support this syntax equally well, but it’s standard SQL.

Handling NULLs:

-- Dangerous: returns no rows if subquery contains NULL
SELECT product_name
FROM products
WHERE product_id NOT IN (
    SELECT product_id
    FROM discontinued_products
);

-- Safe: NOT EXISTS handles NULLs correctly
SELECT product_name
FROM products p
WHERE NOT EXISTS (
    SELECT 1
    FROM discontinued_products d
    WHERE d.product_id = p.product_id
);

The NOT IN version fails if discontinued_products.product_id contains any NULL values because NULL NOT IN (...) evaluates to unknown, not true. NOT EXISTS doesn’t have this problem. Always prefer NOT EXISTS over NOT IN unless you’re certain the subquery can’t return NULLs.

Scalar subquery safety:

-- This will error if subquery returns multiple rows
SELECT 
    customer_id,
    (SELECT order_date FROM orders WHERE customer_id = c.customer_id) as last_order
FROM customers c;

-- Safe version with aggregation
SELECT 
    customer_id,
    (SELECT MAX(order_date) FROM orders WHERE customer_id = c.customer_id) as last_order
FROM customers c;

Scalar subqueries must return exactly one row and one column. Use MAX, MIN, or LIMIT 1 to guarantee single-row results.

Choosing the Right Approach

Use non-correlated subqueries when you need to filter or calculate based on aggregate values that don’t depend on individual outer rows. They’re efficient, clear, and portable across database systems.

Use correlated subqueries when you need row-by-row comparisons or existence checks. EXISTS and NOT EXISTS are particularly valuable for checking relationships. However, always consider whether a JOIN would be clearer or faster, especially on large datasets.

The best approach often depends on your specific database system, data volume, and indexing strategy. Write the clearest query first, then optimize based on actual performance measurements, not assumptions. Modern query optimizers are sophisticated—sometimes the “slow” correlated subquery gets rewritten into an efficient join automatically, and sometimes the “fast” join performs poorly due to statistics or indexing issues.

Master both patterns, understand their execution models, and let your query plans guide your optimization decisions.