SQL: Subqueries vs CTEs
When your SQL query needs intermediate calculations, filtered datasets, or multi-step logic, you have two primary tools: subqueries and Common Table Expressions (CTEs). Both allow you to compose...
Key Insights
- CTEs make complex queries readable by breaking logic into named steps, while subqueries embed logic inline—use CTEs when you need multiple references or multi-step transformations
- SQLite treats CTEs as optimization fences and doesn’t materialize them automatically, meaning performance between CTEs and subqueries is often equivalent for simple cases
- Recursive CTEs are the only SQL-standard way to traverse hierarchical data like org charts or category trees—subqueries cannot replicate this functionality
Introduction: Two Approaches to Complex Queries
When your SQL query needs intermediate calculations, filtered datasets, or multi-step logic, you have two primary tools: subqueries and Common Table Expressions (CTEs). Both allow you to compose complex queries from simpler parts, but they differ significantly in syntax, readability, and capabilities.
A subquery is a query nested inside another query—it can appear in SELECT, FROM, WHERE, or HAVING clauses. A CTE, introduced via the WITH clause, creates a named temporary result set that exists for the duration of a single statement.
This article compares both approaches with practical examples, explores their performance characteristics in SQLite, and provides a framework for choosing between them. You’ll also see how recursive CTEs unlock capabilities that subqueries simply cannot match.
Subqueries: Queries Within Queries
Subqueries embed one query inside another. They come in three flavors: scalar subqueries (returning a single value), row subqueries (returning one row), and table subqueries (returning multiple rows).
You can place subqueries in different clauses:
-- Scalar subquery in SELECT
SELECT
name,
salary,
(SELECT AVG(salary) FROM employees) AS company_avg
FROM employees;
-- Table subquery in FROM
SELECT dept_name, avg_salary
FROM (
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
) AS dept_averages
JOIN departments ON dept_averages.department_id = departments.id;
-- Subquery in WHERE
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
Correlated subqueries reference columns from the outer query, executing once per outer row:
-- Find employees earning above their department's average
SELECT e1.name, e1.department_id, e1.salary
FROM employees e1
WHERE e1.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department_id = e1.department_id
);
This correlated subquery runs for every row in the outer query, calculating each department’s average on the fly. For small datasets this works fine, but the repeated execution can become expensive.
Subqueries work well for simple, one-off filters or calculations. They keep related logic close together. However, they become problematic when nested multiple levels deep or when you need to reference the same intermediate result multiple times.
CTEs: Named Temporary Result Sets
CTEs use the WITH clause to define named result sets before your main query. Think of them as temporary views that exist only for your statement:
-- Single CTE
WITH department_averages AS (
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
)
SELECT e.name, e.department_id, e.salary
FROM employees e
JOIN department_averages da ON e.department_id = da.department_id
WHERE e.salary > da.avg_salary;
This achieves the same result as our correlated subquery but calculates each department average once, not once per employee. The query plan is clearer: first compute department averages, then filter employees.
You can chain multiple CTEs, with later ones referencing earlier ones:
WITH department_averages AS (
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
),
high_performers AS (
SELECT e.*, da.avg_salary
FROM employees e
JOIN department_averages da ON e.department_id = da.department_id
WHERE e.salary > da.avg_salary
),
top_departments AS (
SELECT department_id, COUNT(*) AS high_performer_count
FROM high_performers
GROUP BY department_id
HAVING COUNT(*) >= 3
)
SELECT d.name, td.high_performer_count
FROM top_departments td
JOIN departments d ON td.department_id = d.id
ORDER BY td.high_performer_count DESC;
Each CTE builds on previous ones, creating a readable pipeline of transformations. You can test each CTE independently by selecting from it directly during development.
Side-by-Side Comparison
Let’s compare readability with a real-world reporting query. First, the subquery version:
SELECT
d.name AS department,
dept_stats.employee_count,
dept_stats.avg_salary,
dept_stats.total_payroll
FROM departments d
JOIN (
SELECT
department_id,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary,
SUM(salary) AS total_payroll
FROM (
SELECT e.*,
(SELECT AVG(salary) FROM employees e2 WHERE e2.department_id = e.department_id) AS dept_avg
FROM employees e
WHERE e.hire_date >= date('now', '-1 year')
AND e.status = 'active'
) recent_employees
WHERE salary > dept_avg * 1.1
GROUP BY department_id
) dept_stats ON d.id = dept_stats.department_id
WHERE dept_stats.employee_count >= 5
ORDER BY dept_stats.total_payroll DESC;
Now with CTEs:
WITH recent_active_employees AS (
SELECT *
FROM employees
WHERE hire_date >= date('now', '-1 year')
AND status = 'active'
),
department_averages AS (
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
),
high_earners AS (
SELECT e.*
FROM recent_active_employees e
JOIN department_averages da ON e.department_id = da.department_id
WHERE e.salary > da.avg_salary * 1.1
),
department_stats AS (
SELECT
department_id,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary,
SUM(salary) AS total_payroll
FROM high_earners
GROUP BY department_id
HAVING COUNT(*) >= 5
)
SELECT
d.name AS department,
ds.employee_count,
ds.avg_salary,
ds.total_payroll
FROM department_stats ds
JOIN departments d ON d.id = ds.department_id
ORDER BY ds.total_payroll DESC;
The CTE version is longer but dramatically more readable. Each step has a descriptive name that documents what it does. You can test each CTE independently. When debugging, you can select from any intermediate CTE to inspect your data at that stage.
In SQLite, performance is often comparable. SQLite doesn’t automatically materialize CTEs—it inlines them into the query plan similar to subqueries. The optimizer makes the final call. For simple queries, you won’t see meaningful performance differences. For complex queries, always test with realistic data volumes.
Recursive CTEs: When CTEs Shine
Recursive CTEs handle hierarchical data—something subqueries cannot do. A recursive CTE has two parts: an anchor (base case) and a recursive member that references the CTE itself.
Here’s an employee organizational hierarchy:
-- Schema: employees(id, name, manager_id, title)
WITH RECURSIVE org_chart AS (
-- Anchor: start with the CEO (no manager)
SELECT id, name, manager_id, title, 0 AS level, name AS path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: find direct reports
SELECT
e.id,
e.name,
e.manager_id,
e.title,
oc.level + 1,
oc.path || ' > ' || e.name
FROM employees e
JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT
level,
name,
title,
path
FROM org_chart
ORDER BY path;
This query traverses the entire organization tree, regardless of depth. The anchor finds the CEO, then the recursive part finds direct reports, then their reports, and so on until no more employees are found.
Another common use case is category hierarchies:
-- Schema: categories(id, name, parent_id)
WITH RECURSIVE category_tree AS (
SELECT id, name, parent_id, name AS full_path, 0 AS depth
FROM categories
WHERE parent_id IS NULL
UNION ALL
SELECT
c.id,
c.name,
c.parent_id,
ct.full_path || ' / ' || c.name,
ct.depth + 1
FROM categories c
JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT * FROM category_tree
WHERE depth <= 3 -- Limit depth if needed
ORDER BY full_path;
Recursive CTEs are the only standard SQL way to handle these traversals. You could achieve similar results with application code and multiple queries, but that’s far less efficient and more complex.
Decision Framework: When to Use Each
Use subqueries when:
- You need a simple, one-off filter or calculation
- The logic is straightforward and won’t be reused
- You’re writing a quick ad-hoc query
- The subquery appears only once
Use CTEs when:
- Your query has multiple logical steps
- You reference the same intermediate result multiple times
- Readability and maintainability matter (production code)
- You need to traverse hierarchical data (recursive CTE required)
- You’re debugging complex logic and want to inspect intermediate results
As a rule of thumb: if you’re nesting subqueries more than two levels deep, refactor to CTEs. If you find yourself copying the same subquery multiple times, use a CTE.
Best Practices and Common Pitfalls
Avoid scalar subqueries in SELECT that execute per row:
-- Anti-pattern: subquery runs for every row
SELECT
name,
salary,
(SELECT AVG(salary) FROM employees WHERE department_id = e.department_id) AS dept_avg
FROM employees e;
-- Better: calculate once with CTE
WITH department_averages AS (
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
)
SELECT
e.name,
e.salary,
da.avg_salary AS dept_avg
FROM employees e
JOIN department_averages da ON e.department_id = da.department_id;
Name your CTEs descriptively. cte1, temp, or x are useless. Use names like active_customers, monthly_revenue, or filtered_orders that explain what the CTE contains.
Remember that SQLite doesn’t support materialization hints like PostgreSQL’s MATERIALIZED. You can’t force SQLite to compute and store a CTE result. The optimizer decides. If you need guaranteed materialization, create a temporary table:
CREATE TEMP TABLE department_averages AS
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id;
-- Use the temp table in subsequent queries
SELECT e.name, e.salary
FROM employees e
JOIN department_averages da ON e.department_id = da.department_id
WHERE e.salary > da.avg_salary;
DROP TABLE department_averages;
For recursive CTEs, always include a termination condition to prevent infinite loops. Limit recursion depth if you’re unsure about your data structure.
Choose the right tool for your specific situation. Simple queries don’t need CTEs. Complex queries become unmaintainable with deeply nested subqueries. Write code that the next developer (or future you) can understand and modify confidently.