SQL - Correlated Subquery with Examples

Key Insights

Correlated subqueries reference columns from the outer query and execute once per row, making them powerful for row-by-row comparisons but potentially expensive on large datasets.
The EXISTS/NOT EXISTS pattern with correlated subqueries is often the most readable and performant way to check for the presence or absence of related records.
While JOINs and window functions can replace many correlated subqueries with better performance, some scenarios—like complex row-level conditions—still call for the correlated approach.

Introduction to Correlated Subqueries

A correlated subquery is a subquery that references columns from the outer query. Unlike a regular (non-correlated) subquery that executes once and returns a fixed result, a correlated subquery executes once for each row processed by the outer query. This fundamental difference changes everything about how you write and optimize these queries.

Consider the distinction:

-- Non-correlated subquery: runs once, returns single value
SELECT * FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- Correlated subquery: runs once per employee row
SELECT * FROM employees e
WHERE salary > (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id);

The second query references e.department_id from the outer query inside the subquery. This correlation means the database must evaluate the subquery for each employee, comparing their salary against their specific department’s average—not the company-wide average.

How Correlated Subqueries Work

Understanding the execution flow is critical for writing efficient correlated subqueries. Here’s the step-by-step process:

The database fetches a row from the outer query
It passes referenced column values to the inner query
The inner query executes using those values
The result determines whether the outer row qualifies
Repeat for every row in the outer query

-- Execution flow visualization
-- Outer query: employees table
-- +----+--------+--------+---------------+
-- | id | name   | salary | department_id |
-- +----+--------+--------+---------------+
-- | 1  | Alice  | 75000  | 10            |  <- Pass dept_id=10 to inner query
-- | 2  | Bob    | 65000  | 20            |  <- Pass dept_id=20 to inner query
-- | 3  | Carol  | 80000  | 10            |  <- Pass dept_id=10 to inner query
-- +----+--------+--------+---------------+

SELECT e.name, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(salary) 
    FROM employees 
    WHERE department_id = e.department_id  -- Correlation point
);

This row-by-row execution has clear performance implications. If your outer query returns 10,000 rows, the inner query runs 10,000 times. Modern query optimizers can sometimes flatten correlated subqueries into joins, but you shouldn’t count on it. Always test with realistic data volumes.

Basic Correlated Subquery Syntax

The most common pattern places the correlated subquery in the WHERE clause for filtering. The outer table must have an alias that the inner query can reference.

-- Find employees earning more than their department average
SELECT 
    e.employee_id,
    e.name,
    e.salary,
    e.department_id
FROM employees e
WHERE e.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e.department_id
);

Notice the alias pattern: e for the outer query, e2 for the inner query on the same table. This self-referencing pattern is common when comparing rows within a single table.

You can also use correlated subqueries with comparison operators beyond simple equality:

-- Find orders where the total exceeds the customer's average order value
SELECT 
    o.order_id,
    o.customer_id,
    o.total_amount
FROM orders o
WHERE o.total_amount > (
    SELECT AVG(o2.total_amount) * 1.5
    FROM orders o2
    WHERE o2.customer_id = o.customer_id
    AND o2.order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 1 YEAR)
);

Correlated Subqueries with EXISTS and NOT EXISTS

The EXISTS operator is purpose-built for correlated subqueries. It returns TRUE if the subquery returns any rows, FALSE otherwise. This pattern excels at checking for the presence or absence of related records.

-- Find customers who have placed at least one order
SELECT 
    c.customer_id,
    c.name,
    c.email
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
);

The SELECT 1 inside EXISTS is a convention—the actual selected value doesn’t matter because EXISTS only checks for row existence. Some developers use SELECT *, but SELECT 1 signals intent more clearly.

NOT EXISTS handles the inverse case elegantly:

-- Find products that have never been ordered
SELECT 
    p.product_id,
    p.name,
    p.price
FROM products p
WHERE NOT EXISTS (
    SELECT 1
    FROM order_items oi
    WHERE oi.product_id = p.product_id
);

This pattern often outperforms LEFT JOIN with NULL check alternatives, especially when the related table is large. The database can stop searching as soon as it finds one matching row (or confirms none exist), rather than counting all matches.

-- Find customers with no orders in the last 90 days (churning customers)
SELECT 
    c.customer_id,
    c.name,
    c.last_login
FROM customers c
WHERE c.status = 'active'
AND NOT EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
    AND o.order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 90 DAY)
);

Correlated Subqueries in SELECT and UPDATE Statements

Correlated subqueries aren’t limited to WHERE clauses. Placing them in the SELECT list lets you fetch related aggregates or lookups inline.

-- Get each employee with their department's employee count
SELECT 
    e.employee_id,
    e.name,
    e.department_id,
    (
        SELECT COUNT(*)
        FROM employees e2
        WHERE e2.department_id = e.department_id
    ) AS department_size,
    (
        SELECT d.name
        FROM departments d
        WHERE d.department_id = e.department_id
    ) AS department_name
FROM employees e;

This approach works but can be inefficient—each subquery runs per row. For simple lookups, JOINs are usually better. However, SELECT-list correlated subqueries shine when you need conditional aggregates that are awkward to express with joins.

UPDATE statements with correlated subqueries are powerful for bulk data modifications:

-- Update each product's cached order count
UPDATE products p
SET total_orders = (
    SELECT COUNT(*)
    FROM order_items oi
    WHERE oi.product_id = p.product_id
);

-- Update employee salaries to match department minimum (for underpaid employees)
UPDATE employees e
SET salary = (
    SELECT MIN(e2.salary) * 1.1
    FROM employees e2
    WHERE e2.department_id = e.department_id
    AND e2.job_level = e.job_level
)
WHERE e.salary < (
    SELECT MIN(e3.salary)
    FROM employees e3
    WHERE e3.department_id = e.department_id
    AND e3.job_level = e.job_level
);

Performance Considerations and Alternatives

Correlated subqueries have a reputation for poor performance, and it’s often deserved. The row-by-row execution model means complexity grows linearly with outer query size. Here’s how to evaluate alternatives:

JOINs replace many correlated subqueries with set-based operations:

-- Correlated subquery version
SELECT e.name, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e.department_id
);

-- JOIN with derived table version
SELECT e.name, e.salary
FROM employees e
INNER JOIN (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
) dept_avg ON e.department_id = dept_avg.department_id
WHERE e.salary > dept_avg.avg_salary;

The JOIN version calculates department averages once, then joins—typically faster for large tables.

Window functions handle many row-comparison scenarios elegantly:

-- Using window function instead of correlated subquery
SELECT name, salary
FROM (
    SELECT 
        name, 
        salary,
        AVG(salary) OVER (PARTITION BY department_id) AS dept_avg
    FROM employees
) sub
WHERE salary > dept_avg;

When to stick with correlated subqueries:

EXISTS/NOT EXISTS checks (often optimal as-is)
Complex conditional logic that’s awkward with joins
Small outer result sets where simplicity trumps optimization
When the optimizer transforms them anyway (check your EXPLAIN plan)

Practical Use Cases

Beyond the examples above, correlated subqueries solve several real-world problems effectively.

Finding duplicates:

-- Find duplicate email addresses
SELECT 
    e.employee_id,
    e.email
FROM employees e
WHERE EXISTS (
    SELECT 1
    FROM employees e2
    WHERE e2.email = e.email
    AND e2.employee_id < e.employee_id
);

The employee_id < e.employee_id trick ensures you only flag duplicates once, not both copies.

Running comparisons against previous records:

-- Find orders larger than the customer's previous order
SELECT 
    o.order_id,
    o.customer_id,
    o.total_amount,
    o.order_date
FROM orders o
WHERE o.total_amount > (
    SELECT MAX(o2.total_amount)
    FROM orders o2
    WHERE o2.customer_id = o.customer_id
    AND o2.order_date < o.order_date
);

Conditional existence checks:

-- Find departments where all employees have completed training
SELECT d.department_id, d.name
FROM departments d
WHERE NOT EXISTS (
    SELECT 1
    FROM employees e
    WHERE e.department_id = d.department_id
    AND NOT EXISTS (
        SELECT 1
        FROM training_completions tc
        WHERE tc.employee_id = e.employee_id
        AND tc.course_id = 'REQUIRED_101'
    )
);

This nested NOT EXISTS pattern reads as “find departments where no employee exists who hasn’t completed training”—a double negative that expresses universal quantification.

Correlated subqueries remain a essential tool in SQL. Use them deliberately, understand their execution model, and always verify performance with realistic data. When they’re the right tool, they express intent clearly. When they’re not, JOINs and window functions are waiting.