How to Use Recursive CTEs in PostgreSQL

Key Insights

Recursive CTEs consist of two parts: a base case that provides initial rows and a recursive member that references itself until a termination condition is met
Use recursive queries for hierarchical data (org charts, category trees), graph traversal (routes, dependencies), and generating sequences without helper tables
Always include explicit termination conditions and depth limits to prevent infinite loops—PostgreSQL has no built-in recursion depth limit

Introduction to CTEs and Recursive Queries

Common Table Expressions (CTEs) are temporary named result sets that exist only during query execution. They make complex queries more readable by breaking them into logical chunks. While standard CTEs are essentially named subqueries, recursive CTEs can reference themselves, enabling you to solve problems that would otherwise require procedural code or complex self-joins.

You need recursion when working with data that has parent-child relationships of unknown depth: organizational hierarchies, bill-of-materials structures, category trees, network graphs, or when generating sequences dynamically.

Here’s the fundamental difference:

-- Standard CTE: just a named subquery
WITH recent_orders AS (
    SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '7 days'
)
SELECT * FROM recent_orders WHERE total > 100;

-- Recursive CTE: references itself
WITH RECURSIVE numbers AS (
    SELECT 1 AS n              -- Base case
    UNION ALL
    SELECT n + 1 FROM numbers  -- Recursive case references 'numbers'
    WHERE n < 10
)
SELECT * FROM numbers;

The RECURSIVE keyword enables self-referencing. Without it, PostgreSQL would reject any CTE that references its own name.

Anatomy of a Recursive CTE

Every recursive CTE has two mandatory components connected by UNION or UNION ALL:

Base case (anchor member): Returns initial rows without recursion
Recursive member: References the CTE itself and generates new rows

Here’s the execution model:

WITH RECURSIVE cte_name AS (
    -- BASE CASE: executed once, provides initial working table
    SELECT initial_value AS column_name
    FROM source_table
    WHERE initial_conditions
    
    UNION ALL  -- Usually UNION ALL (faster, keeps duplicates)
    
    -- RECURSIVE MEMBER: executed repeatedly on previous iteration's results
    SELECT derived_value
    FROM cte_name  -- Self-reference
    JOIN other_tables ON join_conditions
    WHERE termination_condition  -- CRITICAL: must eventually become false
)
SELECT * FROM cte_name;

PostgreSQL executes this iteratively:

Run the base case, store results in a working table
Run the recursive member using the working table
Append new rows to the result set
Replace working table with new rows
Repeat steps 2-4 until no new rows are generated

Use UNION ALL instead of UNION unless you specifically need to eliminate duplicates. UNION adds overhead by checking for duplicates after each iteration.

The termination condition is crucial. Without it, you create an infinite loop. PostgreSQL will eventually error out, but only after consuming significant resources.

Generating Number Sequences and Date Ranges

One of the simplest applications is generating sequences without creating helper tables:

-- Generate numbers 1 to 100
WITH RECURSIVE numbers AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1 FROM numbers WHERE n < 100
)
SELECT * FROM numbers;

-- Generate last 30 days
WITH RECURSIVE date_series AS (
    SELECT CURRENT_DATE - INTERVAL '29 days' AS date
    UNION ALL
    SELECT date + INTERVAL '1 day'
    FROM date_series
    WHERE date < CURRENT_DATE
)
SELECT 
    date,
    EXTRACT(DOW FROM date) AS day_of_week,
    TO_CHAR(date, 'Day') AS day_name
FROM date_series;

-- Practical use: fill gaps in daily metrics
WITH RECURSIVE all_dates AS (
    SELECT DATE '2024-01-01' AS date
    UNION ALL
    SELECT date + 1
    FROM all_dates
    WHERE date < DATE '2024-01-31'
)
SELECT 
    ad.date,
    COALESCE(m.revenue, 0) AS revenue,
    COALESCE(m.orders, 0) AS orders
FROM all_dates ad
LEFT JOIN daily_metrics m ON ad.date = m.date
ORDER BY ad.date;

This eliminates the need for generate_series() when you want more control or need to combine generation logic with complex queries.

Traversing Hierarchical Data

Hierarchical data is where recursive CTEs truly shine. Consider an employee table with manager relationships:

CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    name VARCHAR(100),
    manager_id INTEGER REFERENCES employees(id),
    title VARCHAR(100)
);

-- Find all subordinates under a specific manager (including indirect reports)
WITH RECURSIVE subordinates AS (
    -- Base case: start with the target manager
    SELECT id, name, manager_id, title, 1 AS level
    FROM employees
    WHERE id = 5  -- CEO or target manager
    
    UNION ALL
    
    -- Recursive case: find direct reports of current level
    SELECT e.id, e.name, e.manager_id, e.title, s.level + 1
    FROM employees e
    INNER JOIN subordinates s ON e.manager_id = s.id
)
SELECT 
    REPEAT('  ', level - 1) || name AS org_chart,
    title,
    level
FROM subordinates
ORDER BY level, name;

-- Build full path from employee to CEO
WITH RECURSIVE org_path AS (
    SELECT id, name, manager_id, name AS path, 0 AS depth
    FROM employees
    WHERE id = 42  -- Target employee
    
    UNION ALL
    
    SELECT e.id, e.name, e.manager_id, e.name || ' > ' || op.path, op.depth + 1
    FROM employees e
    INNER JOIN org_path op ON op.manager_id = e.id
)
SELECT path, depth
FROM org_path
WHERE manager_id IS NULL  -- Reached the top
ORDER BY depth DESC
LIMIT 1;

This approach handles arbitrary hierarchy depths without knowing the structure in advance. You can calculate levels, build paths, and traverse in either direction (up or down the tree).

Graph Traversal and Path Finding

For network-like structures where nodes can have multiple connections, recursive CTEs can traverse graphs and find paths:

CREATE TABLE flights (
    origin VARCHAR(3),
    destination VARCHAR(3),
    airline VARCHAR(50),
    duration INTEGER  -- minutes
);

-- Find all possible routes from JFK to LAX
WITH RECURSIVE routes AS (
    -- Base case: direct flights from origin
    SELECT 
        origin,
        destination,
        ARRAY[origin, destination] AS path,
        duration AS total_duration,
        1 AS stops
    FROM flights
    WHERE origin = 'JFK'
    
    UNION ALL
    
    -- Recursive case: extend existing routes
    SELECT 
        r.origin,
        f.destination,
        r.path || f.destination,
        r.total_duration + f.duration,
        r.stops + 1
    FROM routes r
    INNER JOIN flights f ON r.destination = f.origin
    WHERE 
        f.destination != ALL(r.path)  -- Prevent cycles
        AND r.stops < 3  -- Limit to 3 stops maximum
        AND f.destination = 'LAX'  -- Only complete paths
)
SELECT 
    path,
    stops,
    total_duration,
    ROUND(total_duration / 60.0, 1) AS hours
FROM routes
ORDER BY stops, total_duration;

The key technique here is tracking visited nodes in an array to prevent infinite loops in cyclic graphs. The destination != ALL(path) condition ensures we never revisit a city.

Performance Considerations and Optimization

Recursive CTEs can be expensive. PostgreSQL materializes each iteration, and without proper constraints, they can explode exponentially.

Critical optimizations:

-- BAD: No depth limit, could run forever
WITH RECURSIVE unlimited AS (
    SELECT 1 AS n, ARRAY[1] AS path
    UNION ALL
    SELECT n + 1, path || (n + 1)
    FROM unlimited
    WHERE n < 1000000  -- Huge limit
)
SELECT * FROM unlimited;

-- GOOD: Reasonable depth limit with early termination
WITH RECURSIVE limited AS (
    SELECT 1 AS n, ARRAY[1] AS path, 0 AS depth
    UNION ALL
    SELECT n + 1, path || (n + 1), depth + 1
    FROM limited
    WHERE depth < 100  -- Hard limit on recursion depth
)
SELECT * FROM limited;

-- Add indexes on join columns
CREATE INDEX idx_employees_manager ON employees(manager_id);

-- Use LIMIT to stop early if you only need a few results
WITH RECURSIVE subordinates AS (
    SELECT id, name, manager_id FROM employees WHERE id = 5
    UNION ALL
    SELECT e.id, e.name, e.manager_id
    FROM employees e
    INNER JOIN subordinates s ON e.manager_id = s.id
)
SELECT * FROM subordinates LIMIT 10;

Check execution plans with EXPLAIN ANALYZE. Look for:

Sequential scans that should be index scans
Large row counts in intermediate iterations
Missing join conditions that create cartesian products

For very deep hierarchies or large graphs, consider materialized paths or closure tables as alternatives. Store the full path or all ancestor-descendant pairs in denormalized columns, trading write complexity for read performance.

Common Pitfalls and Best Practices

Infinite loops are the most common mistake:

-- WRONG: Will run forever (or until resource limits)
WITH RECURSIVE infinite AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1 FROM infinite  -- No WHERE clause!
)
SELECT * FROM infinite;

-- FIXED: Always include termination condition
WITH RECURSIVE finite AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1 FROM infinite WHERE n < 1000
)
SELECT * FROM infinite;

Best practices:

Always set a maximum depth: Add a depth counter and hard limit, even if you think your data won’t hit it
Track visited nodes in graphs: Use arrays or CTEs to prevent cycles
Use UNION ALL unless you need deduplication: It’s significantly faster
Index foreign key columns: Hierarchical joins depend heavily on indexed lookups
Test with production-scale data: Small test datasets hide performance problems
Consider alternatives for write-heavy workloads: If hierarchy changes frequently, materialized paths or ltree extension may be better

Debugging strategy: Add a depth counter and select it to see how many iterations ran:

WITH RECURSIVE debug AS (
    SELECT 1 AS n, 1 AS iteration
    UNION ALL
    SELECT n + 1, iteration + 1 
    FROM debug 
    WHERE n < 100
)
SELECT MAX(iteration) AS total_iterations FROM debug;

Recursive CTEs are powerful tools for hierarchical and graph data. Master the base case/recursive case pattern, always include termination conditions, and profile with realistic data sizes. When used appropriately, they eliminate complex procedural code and make tree/graph operations straightforward SQL queries.