How to Use Recursive CTEs in PostgreSQL
Common Table Expressions (CTEs) are temporary named result sets that exist only during query execution. They make complex queries more readable by breaking them into logical chunks. While standard...
Key Insights
- Recursive CTEs consist of two parts: a base case that provides initial rows and a recursive member that references itself until a termination condition is met
- Use recursive queries for hierarchical data (org charts, category trees), graph traversal (routes, dependencies), and generating sequences without helper tables
- Always include explicit termination conditions and depth limits to prevent infinite loops—PostgreSQL has no built-in recursion depth limit
Introduction to CTEs and Recursive Queries
Common Table Expressions (CTEs) are temporary named result sets that exist only during query execution. They make complex queries more readable by breaking them into logical chunks. While standard CTEs are essentially named subqueries, recursive CTEs can reference themselves, enabling you to solve problems that would otherwise require procedural code or complex self-joins.
You need recursion when working with data that has parent-child relationships of unknown depth: organizational hierarchies, bill-of-materials structures, category trees, network graphs, or when generating sequences dynamically.
Here’s the fundamental difference:
-- Standard CTE: just a named subquery
WITH recent_orders AS (
SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '7 days'
)
SELECT * FROM recent_orders WHERE total > 100;
-- Recursive CTE: references itself
WITH RECURSIVE numbers AS (
SELECT 1 AS n -- Base case
UNION ALL
SELECT n + 1 FROM numbers -- Recursive case references 'numbers'
WHERE n < 10
)
SELECT * FROM numbers;
The RECURSIVE keyword enables self-referencing. Without it, PostgreSQL would reject any CTE that references its own name.
Anatomy of a Recursive CTE
Every recursive CTE has two mandatory components connected by UNION or UNION ALL:
- Base case (anchor member): Returns initial rows without recursion
- Recursive member: References the CTE itself and generates new rows
Here’s the execution model:
WITH RECURSIVE cte_name AS (
-- BASE CASE: executed once, provides initial working table
SELECT initial_value AS column_name
FROM source_table
WHERE initial_conditions
UNION ALL -- Usually UNION ALL (faster, keeps duplicates)
-- RECURSIVE MEMBER: executed repeatedly on previous iteration's results
SELECT derived_value
FROM cte_name -- Self-reference
JOIN other_tables ON join_conditions
WHERE termination_condition -- CRITICAL: must eventually become false
)
SELECT * FROM cte_name;
PostgreSQL executes this iteratively:
- Run the base case, store results in a working table
- Run the recursive member using the working table
- Append new rows to the result set
- Replace working table with new rows
- Repeat steps 2-4 until no new rows are generated
Use UNION ALL instead of UNION unless you specifically need to eliminate duplicates. UNION adds overhead by checking for duplicates after each iteration.
The termination condition is crucial. Without it, you create an infinite loop. PostgreSQL will eventually error out, but only after consuming significant resources.
Generating Number Sequences and Date Ranges
One of the simplest applications is generating sequences without creating helper tables:
-- Generate numbers 1 to 100
WITH RECURSIVE numbers AS (
SELECT 1 AS n
UNION ALL
SELECT n + 1 FROM numbers WHERE n < 100
)
SELECT * FROM numbers;
-- Generate last 30 days
WITH RECURSIVE date_series AS (
SELECT CURRENT_DATE - INTERVAL '29 days' AS date
UNION ALL
SELECT date + INTERVAL '1 day'
FROM date_series
WHERE date < CURRENT_DATE
)
SELECT
date,
EXTRACT(DOW FROM date) AS day_of_week,
TO_CHAR(date, 'Day') AS day_name
FROM date_series;
-- Practical use: fill gaps in daily metrics
WITH RECURSIVE all_dates AS (
SELECT DATE '2024-01-01' AS date
UNION ALL
SELECT date + 1
FROM all_dates
WHERE date < DATE '2024-01-31'
)
SELECT
ad.date,
COALESCE(m.revenue, 0) AS revenue,
COALESCE(m.orders, 0) AS orders
FROM all_dates ad
LEFT JOIN daily_metrics m ON ad.date = m.date
ORDER BY ad.date;
This eliminates the need for generate_series() when you want more control or need to combine generation logic with complex queries.
Traversing Hierarchical Data
Hierarchical data is where recursive CTEs truly shine. Consider an employee table with manager relationships:
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name VARCHAR(100),
manager_id INTEGER REFERENCES employees(id),
title VARCHAR(100)
);
-- Find all subordinates under a specific manager (including indirect reports)
WITH RECURSIVE subordinates AS (
-- Base case: start with the target manager
SELECT id, name, manager_id, title, 1 AS level
FROM employees
WHERE id = 5 -- CEO or target manager
UNION ALL
-- Recursive case: find direct reports of current level
SELECT e.id, e.name, e.manager_id, e.title, s.level + 1
FROM employees e
INNER JOIN subordinates s ON e.manager_id = s.id
)
SELECT
REPEAT(' ', level - 1) || name AS org_chart,
title,
level
FROM subordinates
ORDER BY level, name;
-- Build full path from employee to CEO
WITH RECURSIVE org_path AS (
SELECT id, name, manager_id, name AS path, 0 AS depth
FROM employees
WHERE id = 42 -- Target employee
UNION ALL
SELECT e.id, e.name, e.manager_id, e.name || ' > ' || op.path, op.depth + 1
FROM employees e
INNER JOIN org_path op ON op.manager_id = e.id
)
SELECT path, depth
FROM org_path
WHERE manager_id IS NULL -- Reached the top
ORDER BY depth DESC
LIMIT 1;
This approach handles arbitrary hierarchy depths without knowing the structure in advance. You can calculate levels, build paths, and traverse in either direction (up or down the tree).
Graph Traversal and Path Finding
For network-like structures where nodes can have multiple connections, recursive CTEs can traverse graphs and find paths:
CREATE TABLE flights (
origin VARCHAR(3),
destination VARCHAR(3),
airline VARCHAR(50),
duration INTEGER -- minutes
);
-- Find all possible routes from JFK to LAX
WITH RECURSIVE routes AS (
-- Base case: direct flights from origin
SELECT
origin,
destination,
ARRAY[origin, destination] AS path,
duration AS total_duration,
1 AS stops
FROM flights
WHERE origin = 'JFK'
UNION ALL
-- Recursive case: extend existing routes
SELECT
r.origin,
f.destination,
r.path || f.destination,
r.total_duration + f.duration,
r.stops + 1
FROM routes r
INNER JOIN flights f ON r.destination = f.origin
WHERE
f.destination != ALL(r.path) -- Prevent cycles
AND r.stops < 3 -- Limit to 3 stops maximum
AND f.destination = 'LAX' -- Only complete paths
)
SELECT
path,
stops,
total_duration,
ROUND(total_duration / 60.0, 1) AS hours
FROM routes
ORDER BY stops, total_duration;
The key technique here is tracking visited nodes in an array to prevent infinite loops in cyclic graphs. The destination != ALL(path) condition ensures we never revisit a city.
Performance Considerations and Optimization
Recursive CTEs can be expensive. PostgreSQL materializes each iteration, and without proper constraints, they can explode exponentially.
Critical optimizations:
-- BAD: No depth limit, could run forever
WITH RECURSIVE unlimited AS (
SELECT 1 AS n, ARRAY[1] AS path
UNION ALL
SELECT n + 1, path || (n + 1)
FROM unlimited
WHERE n < 1000000 -- Huge limit
)
SELECT * FROM unlimited;
-- GOOD: Reasonable depth limit with early termination
WITH RECURSIVE limited AS (
SELECT 1 AS n, ARRAY[1] AS path, 0 AS depth
UNION ALL
SELECT n + 1, path || (n + 1), depth + 1
FROM limited
WHERE depth < 100 -- Hard limit on recursion depth
)
SELECT * FROM limited;
-- Add indexes on join columns
CREATE INDEX idx_employees_manager ON employees(manager_id);
-- Use LIMIT to stop early if you only need a few results
WITH RECURSIVE subordinates AS (
SELECT id, name, manager_id FROM employees WHERE id = 5
UNION ALL
SELECT e.id, e.name, e.manager_id
FROM employees e
INNER JOIN subordinates s ON e.manager_id = s.id
)
SELECT * FROM subordinates LIMIT 10;
Check execution plans with EXPLAIN ANALYZE. Look for:
- Sequential scans that should be index scans
- Large row counts in intermediate iterations
- Missing join conditions that create cartesian products
For very deep hierarchies or large graphs, consider materialized paths or closure tables as alternatives. Store the full path or all ancestor-descendant pairs in denormalized columns, trading write complexity for read performance.
Common Pitfalls and Best Practices
Infinite loops are the most common mistake:
-- WRONG: Will run forever (or until resource limits)
WITH RECURSIVE infinite AS (
SELECT 1 AS n
UNION ALL
SELECT n + 1 FROM infinite -- No WHERE clause!
)
SELECT * FROM infinite;
-- FIXED: Always include termination condition
WITH RECURSIVE finite AS (
SELECT 1 AS n
UNION ALL
SELECT n + 1 FROM infinite WHERE n < 1000
)
SELECT * FROM infinite;
Best practices:
- Always set a maximum depth: Add a depth counter and hard limit, even if you think your data won’t hit it
- Track visited nodes in graphs: Use arrays or CTEs to prevent cycles
- Use UNION ALL unless you need deduplication: It’s significantly faster
- Index foreign key columns: Hierarchical joins depend heavily on indexed lookups
- Test with production-scale data: Small test datasets hide performance problems
- Consider alternatives for write-heavy workloads: If hierarchy changes frequently, materialized paths or ltree extension may be better
Debugging strategy: Add a depth counter and select it to see how many iterations ran:
WITH RECURSIVE debug AS (
SELECT 1 AS n, 1 AS iteration
UNION ALL
SELECT n + 1, iteration + 1
FROM debug
WHERE n < 100
)
SELECT MAX(iteration) AS total_iterations FROM debug;
Recursive CTEs are powerful tools for hierarchical and graph data. Master the base case/recursive case pattern, always include termination conditions, and profile with realistic data sizes. When used appropriately, they eliminate complex procedural code and make tree/graph operations straightforward SQL queries.