SQL - CTE (Common Table Expression) Tutorial

A Common Table Expression (CTE) is a temporary named result set that exists only within the scope of a single SQL statement. Think of it as defining a variable that holds a query result, which you...

Key Insights

  • CTEs transform unreadable nested subqueries into sequential, named building blocks that you can reference like tables—making complex queries maintainable and debuggable.
  • Recursive CTEs solve hierarchical data problems (org charts, bill of materials, threaded comments) that would otherwise require application code or multiple round trips.
  • CTEs are query-scoped and typically not materialized—they’re about readability first, and you should benchmark before assuming performance benefits over subqueries.

Introduction to CTEs

A Common Table Expression (CTE) is a temporary named result set that exists only within the scope of a single SQL statement. Think of it as defining a variable that holds a query result, which you can then reference by name in your main query.

The syntax is straightforward:

WITH cte_name AS (
    SELECT columns
    FROM table
    WHERE conditions
)
SELECT * FROM cte_name;

Here’s why this matters. Compare these two approaches to finding employees who earn above their department’s average:

-- Subquery approach: nested and hard to follow
SELECT e.employee_name, e.salary, e.department_id
FROM employees e
WHERE e.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e.department_id
);

-- CTE approach: linear and readable
WITH dept_averages AS (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
)
SELECT e.employee_name, e.salary, e.department_id
FROM employees e
JOIN dept_averages da ON e.department_id = da.department_id
WHERE e.salary > da.avg_salary;

The CTE version reads top-to-bottom. You understand what dept_averages contains before you see how it’s used. This becomes critical when queries grow beyond trivial complexity.

Basic CTE Syntax and Structure

Every CTE has three components: the WITH keyword, a name, and a query wrapped in parentheses after AS. The name follows standard SQL identifier rules—stick to snake_case and make it descriptive.

WITH active_employees AS (
    SELECT 
        employee_id,
        employee_name,
        hire_date,
        department_id
    FROM employees
    WHERE status = 'active'
    AND hire_date >= '2020-01-01'
)
SELECT 
    ae.employee_name,
    d.department_name,
    ae.hire_date
FROM active_employees ae
JOIN departments d ON ae.department_id = d.department_id
ORDER BY ae.hire_date DESC;

Naming conventions matter. Avoid generic names like cte1 or temp. Use names that describe the data: active_employees, monthly_totals, top_performers. Your future self will thank you when debugging at 2 AM.

The CTE must immediately precede the statement that uses it. You can’t define a CTE and then run multiple separate queries against it—it’s scoped to exactly one statement.

Multiple CTEs in a Single Query

Real-world queries often need multiple intermediate result sets. You define multiple CTEs by separating them with commas:

WITH monthly_sales AS (
    SELECT 
        salesperson_id,
        DATE_TRUNC('month', sale_date) AS sale_month,
        SUM(amount) AS total_sales
    FROM sales
    WHERE sale_date >= '2024-01-01'
    GROUP BY salesperson_id, DATE_TRUNC('month', sale_date)
),
sales_averages AS (
    SELECT 
        salesperson_id,
        AVG(total_sales) AS avg_monthly_sales
    FROM monthly_sales
    GROUP BY salesperson_id
),
top_performers AS (
    SELECT salesperson_id
    FROM sales_averages
    WHERE avg_monthly_sales > 50000
)
SELECT 
    e.employee_name,
    ms.sale_month,
    ms.total_sales,
    sa.avg_monthly_sales
FROM monthly_sales ms
JOIN sales_averages sa ON ms.salesperson_id = sa.salesperson_id
JOIN top_performers tp ON ms.salesperson_id = tp.salesperson_id
JOIN employees e ON ms.salesperson_id = e.employee_id
ORDER BY e.employee_name, ms.sale_month;

Notice how sales_averages references monthly_sales, and top_performers references sales_averages. Each CTE can reference any CTE defined before it. This creates a pipeline of transformations that’s easy to follow and modify.

Recursive CTEs

Recursive CTEs handle hierarchical or graph-like data. They consist of two parts: an anchor member (the starting point) and a recursive member (how to get the next level).

Here’s an organizational hierarchy query:

WITH RECURSIVE org_hierarchy AS (
    -- Anchor: start with the CEO (no manager)
    SELECT 
        employee_id,
        employee_name,
        manager_id,
        1 AS level,
        employee_name AS path
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive: find employees who report to people we've already found
    SELECT 
        e.employee_id,
        e.employee_name,
        e.manager_id,
        oh.level + 1,
        oh.path || ' > ' || e.employee_name
    FROM employees e
    JOIN org_hierarchy oh ON e.manager_id = oh.employee_id
)
SELECT * FROM org_hierarchy ORDER BY path;

The anchor finds the root (CEO with no manager). The recursive member joins back to the CTE itself, finding the next level down. This continues until no new rows are produced.

Recursive CTEs also generate sequences. Need a date series for reporting?

WITH RECURSIVE date_series AS (
    SELECT DATE '2024-01-01' AS report_date
    
    UNION ALL
    
    SELECT report_date + INTERVAL '1 day'
    FROM date_series
    WHERE report_date < DATE '2024-12-31'
)
SELECT report_date FROM date_series;

Always ensure termination. Without the WHERE report_date < DATE '2024-12-31' clause, this runs forever. Most databases have a recursion limit (PostgreSQL defaults to 100), but don’t rely on it—be explicit about your stop condition.

CTEs vs Subqueries vs Temp Tables

These three approaches solve similar problems with different trade-offs:

-- Approach 1: Subquery
SELECT 
    e.employee_name,
    e.salary,
    dept_stats.avg_salary
FROM employees e
JOIN (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
) dept_stats ON e.department_id = dept_stats.department_id
WHERE e.salary > dept_stats.avg_salary;

-- Approach 2: CTE
WITH dept_stats AS (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
)
SELECT 
    e.employee_name,
    e.salary,
    dept_stats.avg_salary
FROM employees e
JOIN dept_stats ON e.department_id = dept_stats.department_id
WHERE e.salary > dept_stats.avg_salary;

-- Approach 3: Temp Table
CREATE TEMP TABLE dept_stats AS
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id;

SELECT 
    e.employee_name,
    e.salary,
    dept_stats.avg_salary
FROM employees e
JOIN dept_stats ON e.department_id = dept_stats.department_id
WHERE e.salary > dept_stats.avg_salary;

DROP TABLE dept_stats;

Performance: CTEs and subqueries often produce identical execution plans. The optimizer typically inlines CTEs (treats them as subqueries). Temp tables materialize results to disk, which helps when you reference the same data multiple times or need indexes.

Readability: CTEs win for complex queries. Named result sets beat nested parentheses.

Scope: CTEs exist for one statement. Temp tables persist for the session (or transaction, depending on settings). Use temp tables when multiple statements need the same intermediate data.

My recommendation: Start with CTEs for readability. Profile your queries. Switch to temp tables only when you have evidence that materialization helps—typically when the same expensive calculation is referenced multiple times.

Practical Use Cases

Data Deduplication

The ROW_NUMBER pattern is the canonical CTE use case:

WITH ranked_records AS (
    SELECT 
        *,
        ROW_NUMBER() OVER (
            PARTITION BY email 
            ORDER BY created_at DESC
        ) AS rn
    FROM users
)
SELECT 
    user_id,
    email,
    username,
    created_at
FROM ranked_records
WHERE rn = 1;

This keeps only the most recent record per email address. Without a CTE, you’d need a correlated subquery or self-join—both harder to read and reason about.

Pagination

Efficient pagination with total count:

WITH filtered_data AS (
    SELECT *
    FROM products
    WHERE category = 'electronics'
    AND price > 100
),
counted AS (
    SELECT COUNT(*) AS total_count FROM filtered_data
)
SELECT 
    fd.*,
    c.total_count
FROM filtered_data fd
CROSS JOIN counted c
ORDER BY fd.created_at DESC
LIMIT 20 OFFSET 40;

This returns page 3 (offset 40, limit 20) along with the total count for pagination UI—without running the filter twice.

Common Pitfalls and Best Practices

Scope limitations: You cannot reference a CTE outside its statement. This fails:

WITH my_cte AS (SELECT 1 AS x);
SELECT * FROM my_cte; -- Error: my_cte doesn't exist here

Materialization assumptions: Don’t assume CTEs are materialized. In PostgreSQL, they were materialized by default until version 12, then the behavior changed to inline by default. Use MATERIALIZED or NOT MATERIALIZED hints if you need control:

WITH expensive_calc AS MATERIALIZED (
    SELECT complex_function(column) AS result
    FROM large_table
)
SELECT * FROM expensive_calc ec1
JOIN expensive_calc ec2 ON ec1.id = ec2.parent_id;

Debugging complex CTEs: When a multi-CTE query produces wrong results, test each CTE in isolation:

WITH cte1 AS (...),
     cte2 AS (...)
-- Comment out the final query temporarily
-- SELECT * FROM cte1;  -- Check this first
SELECT * FROM cte2;     -- Then this

Performance debugging: Run EXPLAIN ANALYZE and look for whether CTEs are inlined or materialized. If you see “CTE Scan” in PostgreSQL, it’s materialized. If you see the CTE’s logic embedded in the main plan, it’s inlined.

CTEs are a readability tool first. They make complex queries maintainable, debuggable, and reviewable. Use them liberally for clarity, but verify performance assumptions with actual execution plans on your actual data.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.