SQL: Subqueries vs CTEs

Key Insights

CTEs make complex queries readable by breaking logic into named steps, while subqueries embed logic inline—use CTEs when you need multiple references or multi-step transformations
SQLite treats CTEs as optimization fences and doesn’t materialize them automatically, meaning performance between CTEs and subqueries is often equivalent for simple cases
Recursive CTEs are the only SQL-standard way to traverse hierarchical data like org charts or category trees—subqueries cannot replicate this functionality

Introduction: Two Approaches to Complex Queries

When your SQL query needs intermediate calculations, filtered datasets, or multi-step logic, you have two primary tools: subqueries and Common Table Expressions (CTEs). Both allow you to compose complex queries from simpler parts, but they differ significantly in syntax, readability, and capabilities.

A subquery is a query nested inside another query—it can appear in SELECT, FROM, WHERE, or HAVING clauses. A CTE, introduced via the WITH clause, creates a named temporary result set that exists for the duration of a single statement.

This article compares both approaches with practical examples, explores their performance characteristics in SQLite, and provides a framework for choosing between them. You’ll also see how recursive CTEs unlock capabilities that subqueries simply cannot match.

Subqueries: Queries Within Queries

Subqueries embed one query inside another. They come in three flavors: scalar subqueries (returning a single value), row subqueries (returning one row), and table subqueries (returning multiple rows).

You can place subqueries in different clauses:

-- Scalar subquery in SELECT
SELECT 
    name,
    salary,
    (SELECT AVG(salary) FROM employees) AS company_avg
FROM employees;

-- Table subquery in FROM
SELECT dept_name, avg_salary
FROM (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
) AS dept_averages
JOIN departments ON dept_averages.department_id = departments.id;

-- Subquery in WHERE
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

Correlated subqueries reference columns from the outer query, executing once per outer row:

-- Find employees earning above their department's average
SELECT e1.name, e1.department_id, e1.salary
FROM employees e1
WHERE e1.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);

This correlated subquery runs for every row in the outer query, calculating each department’s average on the fly. For small datasets this works fine, but the repeated execution can become expensive.

Subqueries work well for simple, one-off filters or calculations. They keep related logic close together. However, they become problematic when nested multiple levels deep or when you need to reference the same intermediate result multiple times.

CTEs: Named Temporary Result Sets

CTEs use the WITH clause to define named result sets before your main query. Think of them as temporary views that exist only for your statement:

-- Single CTE
WITH department_averages AS (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
)
SELECT e.name, e.department_id, e.salary
FROM employees e
JOIN department_averages da ON e.department_id = da.department_id
WHERE e.salary > da.avg_salary;

This achieves the same result as our correlated subquery but calculates each department average once, not once per employee. The query plan is clearer: first compute department averages, then filter employees.

You can chain multiple CTEs, with later ones referencing earlier ones:

WITH department_averages AS (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
),
high_performers AS (
    SELECT e.*, da.avg_salary
    FROM employees e
    JOIN department_averages da ON e.department_id = da.department_id
    WHERE e.salary > da.avg_salary
),
top_departments AS (
    SELECT department_id, COUNT(*) AS high_performer_count
    FROM high_performers
    GROUP BY department_id
    HAVING COUNT(*) >= 3
)
SELECT d.name, td.high_performer_count
FROM top_departments td
JOIN departments d ON td.department_id = d.id
ORDER BY td.high_performer_count DESC;

Each CTE builds on previous ones, creating a readable pipeline of transformations. You can test each CTE independently by selecting from it directly during development.

Side-by-Side Comparison

Let’s compare readability with a real-world reporting query. First, the subquery version:

SELECT 
    d.name AS department,
    dept_stats.employee_count,
    dept_stats.avg_salary,
    dept_stats.total_payroll
FROM departments d
JOIN (
    SELECT 
        department_id,
        COUNT(*) AS employee_count,
        AVG(salary) AS avg_salary,
        SUM(salary) AS total_payroll
    FROM (
        SELECT e.*, 
            (SELECT AVG(salary) FROM employees e2 WHERE e2.department_id = e.department_id) AS dept_avg
        FROM employees e
        WHERE e.hire_date >= date('now', '-1 year')
          AND e.status = 'active'
    ) recent_employees
    WHERE salary > dept_avg * 1.1
    GROUP BY department_id
) dept_stats ON d.id = dept_stats.department_id
WHERE dept_stats.employee_count >= 5
ORDER BY dept_stats.total_payroll DESC;

Now with CTEs:

WITH recent_active_employees AS (
    SELECT *
    FROM employees
    WHERE hire_date >= date('now', '-1 year')
      AND status = 'active'
),
department_averages AS (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
),
high_earners AS (
    SELECT e.*
    FROM recent_active_employees e
    JOIN department_averages da ON e.department_id = da.department_id
    WHERE e.salary > da.avg_salary * 1.1
),
department_stats AS (
    SELECT 
        department_id,
        COUNT(*) AS employee_count,
        AVG(salary) AS avg_salary,
        SUM(salary) AS total_payroll
    FROM high_earners
    GROUP BY department_id
    HAVING COUNT(*) >= 5
)
SELECT 
    d.name AS department,
    ds.employee_count,
    ds.avg_salary,
    ds.total_payroll
FROM department_stats ds
JOIN departments d ON d.id = ds.department_id
ORDER BY ds.total_payroll DESC;

The CTE version is longer but dramatically more readable. Each step has a descriptive name that documents what it does. You can test each CTE independently. When debugging, you can select from any intermediate CTE to inspect your data at that stage.

In SQLite, performance is often comparable. SQLite doesn’t automatically materialize CTEs—it inlines them into the query plan similar to subqueries. The optimizer makes the final call. For simple queries, you won’t see meaningful performance differences. For complex queries, always test with realistic data volumes.

Recursive CTEs: When CTEs Shine

Recursive CTEs handle hierarchical data—something subqueries cannot do. A recursive CTE has two parts: an anchor (base case) and a recursive member that references the CTE itself.

Here’s an employee organizational hierarchy:

-- Schema: employees(id, name, manager_id, title)

WITH RECURSIVE org_chart AS (
    -- Anchor: start with the CEO (no manager)
    SELECT id, name, manager_id, title, 0 AS level, name AS path
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive: find direct reports
    SELECT 
        e.id, 
        e.name, 
        e.manager_id, 
        e.title,
        oc.level + 1,
        oc.path || ' > ' || e.name
    FROM employees e
    JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT 
    level,
    name,
    title,
    path
FROM org_chart
ORDER BY path;

This query traverses the entire organization tree, regardless of depth. The anchor finds the CEO, then the recursive part finds direct reports, then their reports, and so on until no more employees are found.

Another common use case is category hierarchies:

-- Schema: categories(id, name, parent_id)

WITH RECURSIVE category_tree AS (
    SELECT id, name, parent_id, name AS full_path, 0 AS depth
    FROM categories
    WHERE parent_id IS NULL
    
    UNION ALL
    
    SELECT 
        c.id, 
        c.name, 
        c.parent_id,
        ct.full_path || ' / ' || c.name,
        ct.depth + 1
    FROM categories c
    JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT * FROM category_tree
WHERE depth <= 3  -- Limit depth if needed
ORDER BY full_path;

Recursive CTEs are the only standard SQL way to handle these traversals. You could achieve similar results with application code and multiple queries, but that’s far less efficient and more complex.

Decision Framework: When to Use Each

Use subqueries when:

You need a simple, one-off filter or calculation
The logic is straightforward and won’t be reused
You’re writing a quick ad-hoc query
The subquery appears only once

Use CTEs when:

Your query has multiple logical steps
You reference the same intermediate result multiple times
Readability and maintainability matter (production code)
You need to traverse hierarchical data (recursive CTE required)
You’re debugging complex logic and want to inspect intermediate results

As a rule of thumb: if you’re nesting subqueries more than two levels deep, refactor to CTEs. If you find yourself copying the same subquery multiple times, use a CTE.

Best Practices and Common Pitfalls

Avoid scalar subqueries in SELECT that execute per row:

-- Anti-pattern: subquery runs for every row
SELECT 
    name,
    salary,
    (SELECT AVG(salary) FROM employees WHERE department_id = e.department_id) AS dept_avg
FROM employees e;

-- Better: calculate once with CTE
WITH department_averages AS (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
)
SELECT 
    e.name,
    e.salary,
    da.avg_salary AS dept_avg
FROM employees e
JOIN department_averages da ON e.department_id = da.department_id;

Name your CTEs descriptively. cte1, temp, or x are useless. Use names like active_customers, monthly_revenue, or filtered_orders that explain what the CTE contains.

Remember that SQLite doesn’t support materialization hints like PostgreSQL’s MATERIALIZED. You can’t force SQLite to compute and store a CTE result. The optimizer decides. If you need guaranteed materialization, create a temporary table:

CREATE TEMP TABLE department_averages AS
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id;

-- Use the temp table in subsequent queries
SELECT e.name, e.salary
FROM employees e
JOIN department_averages da ON e.department_id = da.department_id
WHERE e.salary > da.avg_salary;

DROP TABLE department_averages;

For recursive CTEs, always include a termination condition to prevent infinite loops. Limit recursion depth if you’re unsure about your data structure.

Choose the right tool for your specific situation. Simple queries don’t need CTEs. Complex queries become unmaintainable with deeply nested subqueries. Write code that the next developer (or future you) can understand and modify confidently.