SQL Common Table Expressions: Recursive and Non-Recursive CTEs

Common Table Expressions (CTEs) are temporary named result sets that exist only during query execution. Introduced in SQL:1999, they provide a cleaner alternative to subqueries and improve code...

Key Insights

  • CTEs improve query readability by naming intermediate result sets, making complex queries easier to understand and maintain than nested subqueries
  • Recursive CTEs solve hierarchical and graph problems through iterative execution of anchor and recursive members until a termination condition is met
  • While CTEs excel at code organization, they may be re-evaluated multiple times in some databases—use indexes strategically and consider materialized alternatives for performance-critical queries

Introduction to CTEs

Common Table Expressions (CTEs) are temporary named result sets that exist only during query execution. Introduced in SQL:1999, they provide a cleaner alternative to subqueries and improve code organization for complex data transformations.

The basic syntax uses the WITH clause to define one or more named result sets that subsequent queries can reference. Here’s a direct comparison:

-- Using a subquery (harder to read)
SELECT 
    d.department_name,
    dept_sales.total_amount
FROM departments d
JOIN (
    SELECT 
        department_id,
        SUM(amount) as total_amount
    FROM sales
    WHERE sale_date >= '2024-01-01'
    GROUP BY department_id
) dept_sales ON d.id = dept_sales.department_id;

-- Using a CTE (clearer intent)
WITH dept_sales AS (
    SELECT 
        department_id,
        SUM(amount) as total_amount
    FROM sales
    WHERE sale_date >= '2024-01-01'
    GROUP BY department_id
)
SELECT 
    d.department_name,
    dept_sales.total_amount
FROM departments d
JOIN dept_sales ON d.id = dept_sales.department_id;

CTEs are supported in all major database systems including PostgreSQL, SQL Server, MySQL 8.0+, Oracle, and SQLite 3.8+. They’re particularly valuable for data analysis, reporting queries, and working with hierarchical data structures.

Non-Recursive CTEs

Non-recursive CTEs function as named subqueries that can be referenced multiple times within the main query. They’re ideal for breaking down complex logic into digestible, reusable components.

Here’s a practical example analyzing sales performance:

WITH monthly_sales AS (
    SELECT 
        DATE_TRUNC('month', sale_date) as month,
        product_id,
        SUM(quantity) as units_sold,
        SUM(amount) as revenue
    FROM sales
    WHERE sale_date >= '2024-01-01'
    GROUP BY DATE_TRUNC('month', sale_date), product_id
),
product_rankings AS (
    SELECT 
        month,
        product_id,
        revenue,
        RANK() OVER (PARTITION BY month ORDER BY revenue DESC) as rank
    FROM monthly_sales
)
SELECT 
    pr.month,
    p.product_name,
    pr.revenue,
    pr.rank
FROM product_rankings pr
JOIN products p ON pr.product_id = p.id
WHERE pr.rank <= 10
ORDER BY pr.month, pr.rank;

This query chains two CTEs to first aggregate sales data, then rank products by revenue within each month. The alternative—nesting these operations as subqueries—would create deeply indented, difficult-to-maintain code.

Multiple CTEs can also reference each other sequentially:

WITH active_customers AS (
    SELECT customer_id
    FROM orders
    WHERE order_date >= CURRENT_DATE - INTERVAL '90 days'
    GROUP BY customer_id
),
customer_metrics AS (
    SELECT 
        c.id,
        c.name,
        COUNT(o.id) as order_count,
        SUM(o.total) as lifetime_value
    FROM customers c
    JOIN active_customers ac ON c.id = ac.customer_id
    JOIN orders o ON c.id = o.customer_id
    GROUP BY c.id, c.name
)
SELECT 
    name,
    order_count,
    lifetime_value,
    lifetime_value / order_count as avg_order_value
FROM customer_metrics
WHERE order_count >= 5
ORDER BY lifetime_value DESC;

CTEs also improve performance when filtering before expensive join operations:

WITH eligible_products AS (
    SELECT id, name, category_id
    FROM products
    WHERE in_stock = true 
      AND price BETWEEN 10 AND 100
)
SELECT 
    ep.name,
    c.category_name,
    inv.quantity
FROM eligible_products ep
JOIN categories c ON ep.category_id = c.id
JOIN inventory inv ON ep.id = inv.product_id;

By filtering products first, we reduce the dataset before joining with other tables, potentially saving significant processing time.

Recursive CTEs

Recursive CTEs solve problems involving hierarchical or graph-structured data. They consist of two parts: an anchor member (the base case) and a recursive member (the iterative step), combined with UNION ALL.

The execution model is straightforward: run the anchor query first, then repeatedly execute the recursive query using results from the previous iteration until no new rows are produced.

Here’s a simple number generator:

WITH RECURSIVE numbers AS (
    -- Anchor member: starting point
    SELECT 1 as n
    
    UNION ALL
    
    -- Recursive member: iteration logic
    SELECT n + 1
    FROM numbers
    WHERE n < 100
)
SELECT n FROM numbers;

This generates integers from 1 to 100. The WHERE n < 100 clause provides the termination condition, preventing infinite recursion.

A more practical example traverses an organizational hierarchy:

WITH RECURSIVE employee_hierarchy AS (
    -- Anchor: start with CEO (no manager)
    SELECT 
        id,
        name,
        manager_id,
        title,
        1 as level,
        CAST(name AS VARCHAR(1000)) as path
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive: find direct reports
    SELECT 
        e.id,
        e.name,
        e.manager_id,
        e.title,
        eh.level + 1,
        CAST(eh.path || ' > ' || e.name AS VARCHAR(1000))
    FROM employees e
    JOIN employee_hierarchy eh ON e.manager_id = eh.id
)
SELECT 
    level,
    name,
    title,
    path
FROM employee_hierarchy
ORDER BY level, name;

This query builds the complete reporting structure, tracking each employee’s depth in the hierarchy and their management chain.

Date series generation is another common use case:

WITH RECURSIVE date_series AS (
    SELECT DATE '2024-01-01' as date
    
    UNION ALL
    
    SELECT date + INTERVAL '1 day'
    FROM date_series
    WHERE date < DATE '2024-12-31'
)
SELECT 
    ds.date,
    COALESCE(SUM(s.amount), 0) as daily_revenue
FROM date_series ds
LEFT JOIN sales s ON ds.date = s.sale_date
GROUP BY ds.date
ORDER BY ds.date;

This ensures every date appears in results, even days with zero sales.

Advanced Recursive Patterns

Recursive CTEs excel at complex hierarchical queries like bill of materials (BOM) explosions:

WITH RECURSIVE bom_explosion AS (
    -- Anchor: top-level product
    SELECT 
        p.id as product_id,
        p.name as product_name,
        c.component_id,
        c.quantity,
        1 as level,
        c.quantity as total_quantity
    FROM products p
    JOIN components c ON p.id = c.product_id
    WHERE p.id = 1001  -- Specific product
    
    UNION ALL
    
    -- Recursive: components of components
    SELECT 
        be.product_id,
        be.product_name,
        c.component_id,
        c.quantity,
        be.level + 1,
        be.total_quantity * c.quantity
    FROM bom_explosion be
    JOIN components c ON be.component_id = c.product_id
    WHERE be.level < 10  -- Prevent runaway recursion
)
SELECT 
    level,
    p.name as component_name,
    total_quantity
FROM bom_explosion be
JOIN products p ON be.component_id = p.id
ORDER BY level, component_name;

Category trees with depth tracking:

WITH RECURSIVE category_tree AS (
    SELECT 
        id,
        name,
        parent_id,
        0 as depth,
        ARRAY[id] as path_ids
    FROM categories
    WHERE parent_id IS NULL
    
    UNION ALL
    
    SELECT 
        c.id,
        c.name,
        c.parent_id,
        ct.depth + 1,
        ct.path_ids || c.id
    FROM categories c
    JOIN category_tree ct ON c.parent_id = ct.id
    WHERE ct.depth < 5  -- Limit depth
)
SELECT 
    REPEAT('  ', depth) || name as indented_name,
    depth,
    path_ids
FROM category_tree
ORDER BY path_ids;

Graph traversal with cycle detection:

WITH RECURSIVE graph_paths AS (
    SELECT 
        node_id,
        connected_to,
        ARRAY[node_id, connected_to] as path,
        false as is_cycle
    FROM connections
    WHERE node_id = 1  -- Starting node
    
    UNION ALL
    
    SELECT 
        gp.node_id,
        c.connected_to,
        gp.path || c.connected_to,
        c.connected_to = ANY(gp.path)
    FROM graph_paths gp
    JOIN connections c ON gp.connected_to = c.node_id
    WHERE NOT gp.is_cycle
      AND array_length(gp.path, 1) < 20
)
SELECT DISTINCT path
FROM graph_paths
WHERE NOT is_cycle;

The cycle detection prevents infinite loops by checking if a node already exists in the current path.

Performance Considerations

CTE performance varies by database implementation. In PostgreSQL and MySQL, CTEs are typically optimization fences—they’re executed once and materialized. SQL Server may inline CTEs or execute them multiple times depending on the optimizer’s decisions.

For performance-critical queries, examine execution plans:

EXPLAIN ANALYZE
WITH sales_summary AS (
    SELECT 
        customer_id,
        SUM(amount) as total
    FROM orders
    GROUP BY customer_id
)
SELECT c.name, ss.total
FROM customers c
JOIN sales_summary ss ON c.id = ss.customer_id;

Key performance strategies:

  • Index appropriately: Ensure columns used in joins and WHERE clauses within CTEs have indexes
  • Limit recursion depth: Always include explicit depth limits in recursive CTEs
  • Consider materialization: For CTEs referenced multiple times, temporary tables may perform better
  • Watch for multiple evaluations: In SQL Server, CTEs might execute multiple times; use OPTION (MAXRECURSION) to control limits

Database-specific recursion limits:

-- SQL Server: set maximum recursion depth
OPTION (MAXRECURSION 1000)

-- PostgreSQL: no built-in limit, but can timeout
SET statement_timeout = '30s';

Best Practices and Common Pitfalls

Always provide explicit termination conditions to prevent infinite recursion:

-- WRONG: No termination condition
WITH RECURSIVE infinite_loop AS (
    SELECT 1 as n
    UNION ALL
    SELECT n + 1 FROM infinite_loop  -- Will run forever!
)
SELECT * FROM infinite_loop;

-- CORRECT: Explicit limit
WITH RECURSIVE safe_sequence AS (
    SELECT 1 as n
    UNION ALL
    SELECT n + 1 FROM safe_sequence WHERE n < 1000
)
SELECT * FROM safe_sequence;

Use descriptive CTE names that explain their purpose:

-- Good naming
WITH active_premium_customers AS (...),
     recent_high_value_orders AS (...)

-- Poor naming  
WITH cte1 AS (...),
     temp AS (...)

Understand database-specific limitations. MySQL 8.0+ supports CTEs but has different optimization characteristics than PostgreSQL. SQL Server allows multiple references to CTEs but may re-execute them.

CTEs are ideal for improving readability and handling hierarchical data, but consider alternatives when:

  • The CTE is referenced once and contains a simple subquery
  • Performance testing shows temporary tables outperform CTEs
  • You need to index intermediate results

Common Table Expressions transform complex queries into maintainable, logical steps. Master both recursive and non-recursive patterns to handle everything from simple data transformations to complex graph traversals efficiently.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.