SQL - CTE (Common Table Expression) Tutorial
A Common Table Expression (CTE) is a temporary named result set that exists only within the scope of a single SQL statement. Think of it as defining a variable that holds a query result, which you...
Key Insights
- CTEs transform unreadable nested subqueries into sequential, named building blocks that you can reference like tables—making complex queries maintainable and debuggable.
- Recursive CTEs solve hierarchical data problems (org charts, bill of materials, threaded comments) that would otherwise require application code or multiple round trips.
- CTEs are query-scoped and typically not materialized—they’re about readability first, and you should benchmark before assuming performance benefits over subqueries.
Introduction to CTEs
A Common Table Expression (CTE) is a temporary named result set that exists only within the scope of a single SQL statement. Think of it as defining a variable that holds a query result, which you can then reference by name in your main query.
The syntax is straightforward:
WITH cte_name AS (
SELECT columns
FROM table
WHERE conditions
)
SELECT * FROM cte_name;
Here’s why this matters. Compare these two approaches to finding employees who earn above their department’s average:
-- Subquery approach: nested and hard to follow
SELECT e.employee_name, e.salary, e.department_id
FROM employees e
WHERE e.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department_id = e.department_id
);
-- CTE approach: linear and readable
WITH dept_averages AS (
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
)
SELECT e.employee_name, e.salary, e.department_id
FROM employees e
JOIN dept_averages da ON e.department_id = da.department_id
WHERE e.salary > da.avg_salary;
The CTE version reads top-to-bottom. You understand what dept_averages contains before you see how it’s used. This becomes critical when queries grow beyond trivial complexity.
Basic CTE Syntax and Structure
Every CTE has three components: the WITH keyword, a name, and a query wrapped in parentheses after AS. The name follows standard SQL identifier rules—stick to snake_case and make it descriptive.
WITH active_employees AS (
SELECT
employee_id,
employee_name,
hire_date,
department_id
FROM employees
WHERE status = 'active'
AND hire_date >= '2020-01-01'
)
SELECT
ae.employee_name,
d.department_name,
ae.hire_date
FROM active_employees ae
JOIN departments d ON ae.department_id = d.department_id
ORDER BY ae.hire_date DESC;
Naming conventions matter. Avoid generic names like cte1 or temp. Use names that describe the data: active_employees, monthly_totals, top_performers. Your future self will thank you when debugging at 2 AM.
The CTE must immediately precede the statement that uses it. You can’t define a CTE and then run multiple separate queries against it—it’s scoped to exactly one statement.
Multiple CTEs in a Single Query
Real-world queries often need multiple intermediate result sets. You define multiple CTEs by separating them with commas:
WITH monthly_sales AS (
SELECT
salesperson_id,
DATE_TRUNC('month', sale_date) AS sale_month,
SUM(amount) AS total_sales
FROM sales
WHERE sale_date >= '2024-01-01'
GROUP BY salesperson_id, DATE_TRUNC('month', sale_date)
),
sales_averages AS (
SELECT
salesperson_id,
AVG(total_sales) AS avg_monthly_sales
FROM monthly_sales
GROUP BY salesperson_id
),
top_performers AS (
SELECT salesperson_id
FROM sales_averages
WHERE avg_monthly_sales > 50000
)
SELECT
e.employee_name,
ms.sale_month,
ms.total_sales,
sa.avg_monthly_sales
FROM monthly_sales ms
JOIN sales_averages sa ON ms.salesperson_id = sa.salesperson_id
JOIN top_performers tp ON ms.salesperson_id = tp.salesperson_id
JOIN employees e ON ms.salesperson_id = e.employee_id
ORDER BY e.employee_name, ms.sale_month;
Notice how sales_averages references monthly_sales, and top_performers references sales_averages. Each CTE can reference any CTE defined before it. This creates a pipeline of transformations that’s easy to follow and modify.
Recursive CTEs
Recursive CTEs handle hierarchical or graph-like data. They consist of two parts: an anchor member (the starting point) and a recursive member (how to get the next level).
Here’s an organizational hierarchy query:
WITH RECURSIVE org_hierarchy AS (
-- Anchor: start with the CEO (no manager)
SELECT
employee_id,
employee_name,
manager_id,
1 AS level,
employee_name AS path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: find employees who report to people we've already found
SELECT
e.employee_id,
e.employee_name,
e.manager_id,
oh.level + 1,
oh.path || ' > ' || e.employee_name
FROM employees e
JOIN org_hierarchy oh ON e.manager_id = oh.employee_id
)
SELECT * FROM org_hierarchy ORDER BY path;
The anchor finds the root (CEO with no manager). The recursive member joins back to the CTE itself, finding the next level down. This continues until no new rows are produced.
Recursive CTEs also generate sequences. Need a date series for reporting?
WITH RECURSIVE date_series AS (
SELECT DATE '2024-01-01' AS report_date
UNION ALL
SELECT report_date + INTERVAL '1 day'
FROM date_series
WHERE report_date < DATE '2024-12-31'
)
SELECT report_date FROM date_series;
Always ensure termination. Without the WHERE report_date < DATE '2024-12-31' clause, this runs forever. Most databases have a recursion limit (PostgreSQL defaults to 100), but don’t rely on it—be explicit about your stop condition.
CTEs vs Subqueries vs Temp Tables
These three approaches solve similar problems with different trade-offs:
-- Approach 1: Subquery
SELECT
e.employee_name,
e.salary,
dept_stats.avg_salary
FROM employees e
JOIN (
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
) dept_stats ON e.department_id = dept_stats.department_id
WHERE e.salary > dept_stats.avg_salary;
-- Approach 2: CTE
WITH dept_stats AS (
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
)
SELECT
e.employee_name,
e.salary,
dept_stats.avg_salary
FROM employees e
JOIN dept_stats ON e.department_id = dept_stats.department_id
WHERE e.salary > dept_stats.avg_salary;
-- Approach 3: Temp Table
CREATE TEMP TABLE dept_stats AS
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id;
SELECT
e.employee_name,
e.salary,
dept_stats.avg_salary
FROM employees e
JOIN dept_stats ON e.department_id = dept_stats.department_id
WHERE e.salary > dept_stats.avg_salary;
DROP TABLE dept_stats;
Performance: CTEs and subqueries often produce identical execution plans. The optimizer typically inlines CTEs (treats them as subqueries). Temp tables materialize results to disk, which helps when you reference the same data multiple times or need indexes.
Readability: CTEs win for complex queries. Named result sets beat nested parentheses.
Scope: CTEs exist for one statement. Temp tables persist for the session (or transaction, depending on settings). Use temp tables when multiple statements need the same intermediate data.
My recommendation: Start with CTEs for readability. Profile your queries. Switch to temp tables only when you have evidence that materialization helps—typically when the same expensive calculation is referenced multiple times.
Practical Use Cases
Data Deduplication
The ROW_NUMBER pattern is the canonical CTE use case:
WITH ranked_records AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY created_at DESC
) AS rn
FROM users
)
SELECT
user_id,
email,
username,
created_at
FROM ranked_records
WHERE rn = 1;
This keeps only the most recent record per email address. Without a CTE, you’d need a correlated subquery or self-join—both harder to read and reason about.
Pagination
Efficient pagination with total count:
WITH filtered_data AS (
SELECT *
FROM products
WHERE category = 'electronics'
AND price > 100
),
counted AS (
SELECT COUNT(*) AS total_count FROM filtered_data
)
SELECT
fd.*,
c.total_count
FROM filtered_data fd
CROSS JOIN counted c
ORDER BY fd.created_at DESC
LIMIT 20 OFFSET 40;
This returns page 3 (offset 40, limit 20) along with the total count for pagination UI—without running the filter twice.
Common Pitfalls and Best Practices
Scope limitations: You cannot reference a CTE outside its statement. This fails:
WITH my_cte AS (SELECT 1 AS x);
SELECT * FROM my_cte; -- Error: my_cte doesn't exist here
Materialization assumptions: Don’t assume CTEs are materialized. In PostgreSQL, they were materialized by default until version 12, then the behavior changed to inline by default. Use MATERIALIZED or NOT MATERIALIZED hints if you need control:
WITH expensive_calc AS MATERIALIZED (
SELECT complex_function(column) AS result
FROM large_table
)
SELECT * FROM expensive_calc ec1
JOIN expensive_calc ec2 ON ec1.id = ec2.parent_id;
Debugging complex CTEs: When a multi-CTE query produces wrong results, test each CTE in isolation:
WITH cte1 AS (...),
cte2 AS (...)
-- Comment out the final query temporarily
-- SELECT * FROM cte1; -- Check this first
SELECT * FROM cte2; -- Then this
Performance debugging: Run EXPLAIN ANALYZE and look for whether CTEs are inlined or materialized. If you see “CTE Scan” in PostgreSQL, it’s materialized. If you see the CTE’s logic embedded in the main plan, it’s inlined.
CTEs are a readability tool first. They make complex queries maintainable, debuggable, and reviewable. Use them liberally for clarity, but verify performance assumptions with actual execution plans on your actual data.