How to Use CTEs in PostgreSQL
Common Table Expressions (CTEs) are temporary named result sets that exist only within the execution scope of a single query. You define them using the `WITH` clause, and they're particularly...
Key Insights
- CTEs make complex queries readable by breaking logic into named, sequential steps—treat them as temporary result sets that exist only for the duration of your query
- Recursive CTEs excel at hierarchical data traversal (org charts, category trees, graph structures) where subqueries fall short
- PostgreSQL 12+ treats CTEs as optimization fences by default; use NOT MATERIALIZED for better performance when you need the query planner to inline them
Introduction to CTEs (Common Table Expressions)
Common Table Expressions (CTEs) are temporary named result sets that exist only within the execution scope of a single query. You define them using the WITH clause, and they’re particularly valuable when you need to reference the same subquery multiple times or when working with recursive data structures.
CTEs shine in three scenarios: improving query readability by breaking complex logic into digestible chunks, enabling recursive queries for hierarchical data, and staging intermediate results for multi-step transformations. Unlike views, CTEs don’t persist in your database schema. Unlike subqueries, they can be referenced multiple times without redundant execution.
Here’s a direct comparison showing why you’d choose a CTE:
-- Using a subquery (harder to read, especially when complex)
SELECT e.name, e.salary, dept_avg.avg_salary
FROM employees e
JOIN (
SELECT department_id, AVG(salary) as avg_salary
FROM employees
GROUP BY department_id
) dept_avg ON e.department_id = dept_avg.department_id
WHERE e.salary > dept_avg.avg_salary;
-- Using a CTE (clearer intent, easier to modify)
WITH department_averages AS (
SELECT department_id, AVG(salary) as avg_salary
FROM employees
GROUP BY department_id
)
SELECT e.name, e.salary, da.avg_salary
FROM employees e
JOIN department_averages da ON e.department_id = da.department_id
WHERE e.salary > da.avg_salary;
The CTE version separates the “what” (calculating department averages) from the “how” (using those averages to filter employees). This matters more as queries grow in complexity.
Basic CTE Syntax and Simple Queries
The fundamental CTE structure follows this pattern:
WITH cte_name AS (
-- Your query here
)
SELECT * FROM cte_name;
You can stack multiple CTEs by separating them with commas—no need to repeat the WITH keyword:
WITH high_value_orders AS (
SELECT customer_id, order_id, total_amount
FROM orders
WHERE total_amount > 1000
AND order_date >= CURRENT_DATE - INTERVAL '90 days'
),
customer_summary AS (
SELECT
c.customer_id,
c.name,
COUNT(hvo.order_id) as high_value_count,
SUM(hvo.total_amount) as total_spent
FROM customers c
JOIN high_value_orders hvo ON c.customer_id = hvo.customer_id
GROUP BY c.customer_id, c.name
)
SELECT *
FROM customer_summary
WHERE high_value_count >= 3
ORDER BY total_spent DESC;
This query breaks down the problem: first isolate high-value recent orders, then aggregate them by customer, finally filter for your VIP segment. Each CTE represents one logical step.
CTEs can reference earlier CTEs in the same query but not later ones. Think of them as executing top-to-bottom:
WITH filtered_products AS (
SELECT product_id, name, category_id, price
FROM products
WHERE active = true
),
category_stats AS (
SELECT
category_id,
COUNT(*) as product_count,
AVG(price) as avg_price
FROM filtered_products -- References the first CTE
GROUP BY category_id
)
SELECT
fp.name,
fp.price,
cs.avg_price,
fp.price - cs.avg_price as price_difference
FROM filtered_products fp
JOIN category_stats cs ON fp.category_id = cs.category_id
WHERE fp.price > cs.avg_price * 1.2;
Recursive CTEs
Recursive CTEs handle hierarchical or graph-based data where you need to traverse relationships of unknown depth. The structure requires two parts: a base case (starting point) and a recursive term (how to traverse), joined with UNION ALL.
The classic example is an organizational hierarchy:
WITH RECURSIVE employee_hierarchy AS (
-- Base case: start with top-level managers
SELECT
employee_id,
name,
manager_id,
1 as level,
name::text as path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: find direct reports
SELECT
e.employee_id,
e.name,
e.manager_id,
eh.level + 1,
eh.path || ' -> ' || e.name
FROM employees e
JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM employee_hierarchy
ORDER BY level, name;
This query starts with employees who have no manager (CEOs, founders), then recursively finds their direct reports, building a complete org chart with depth levels and reporting paths.
Generate sequences without a numbers table:
WITH RECURSIVE date_series AS (
SELECT DATE '2024-01-01' as date
UNION ALL
SELECT date + INTERVAL '1 day'
FROM date_series
WHERE date < DATE '2024-12-31'
)
SELECT date, EXTRACT(DOW FROM date) as day_of_week
FROM date_series;
For graph traversal, include cycle detection to prevent infinite loops:
WITH RECURSIVE category_tree AS (
SELECT
category_id,
name,
parent_id,
ARRAY[category_id] as path,
false as is_cycle
FROM categories
WHERE parent_id IS NULL
UNION ALL
SELECT
c.category_id,
c.name,
c.parent_id,
ct.path || c.category_id,
c.category_id = ANY(ct.path)
FROM categories c
JOIN category_tree ct ON c.parent_id = ct.category_id
WHERE NOT ct.is_cycle
)
SELECT * FROM category_tree WHERE NOT is_cycle;
The path array tracks visited nodes, and is_cycle flags when we encounter a node already in our path.
Advanced CTE Techniques
CTEs can modify data and return results using the RETURNING clause. This is powerful for capturing generated IDs or modified values:
WITH inserted_orders AS (
INSERT INTO orders (customer_id, order_date, status)
VALUES (123, CURRENT_DATE, 'pending')
RETURNING order_id, customer_id
),
inserted_items AS (
INSERT INTO order_items (order_id, product_id, quantity, price)
SELECT
io.order_id,
p.product_id,
1,
p.price
FROM inserted_orders io
CROSS JOIN (VALUES (456), (789), (101)) AS p(product_id)
RETURNING order_id, product_id, price
)
SELECT
io.order_id,
COUNT(ii.product_id) as item_count,
SUM(ii.price) as total_amount
FROM inserted_orders io
JOIN inserted_items ii ON io.order_id = ii.order_id
GROUP BY io.order_id;
This atomically creates an order and its line items, then immediately returns a summary—all in one query.
Chain CTEs for complex transformations where each step depends on the previous:
WITH raw_data AS (
SELECT * FROM sales_import WHERE import_date = CURRENT_DATE
),
cleaned_data AS (
SELECT
TRIM(customer_name) as customer_name,
UPPER(product_code) as product_code,
COALESCE(quantity, 0) as quantity,
COALESCE(unit_price, 0) as unit_price
FROM raw_data
WHERE customer_name IS NOT NULL
),
enriched_data AS (
SELECT
cd.*,
c.customer_id,
p.product_id,
cd.quantity * cd.unit_price as line_total
FROM cleaned_data cd
LEFT JOIN customers c ON cd.customer_name = c.name
LEFT JOIN products p ON cd.product_code = p.code
),
validation_results AS (
SELECT
*,
CASE
WHEN customer_id IS NULL THEN 'missing_customer'
WHEN product_id IS NULL THEN 'missing_product'
WHEN quantity <= 0 THEN 'invalid_quantity'
ELSE 'valid'
END as validation_status
FROM enriched_data
)
SELECT * FROM validation_results;
Each CTE represents one transformation stage: extract, clean, enrich, validate. This pattern makes debugging trivial—just query any intermediate CTE to inspect that stage’s output.
Performance Considerations and Best Practices
PostgreSQL 12 introduced a critical change: CTEs are now optimization fences by default. The query planner won’t push predicates into CTEs or inline them, treating each as materialized. This prevents optimization but guarantees evaluation order.
For CTEs that should be inlined for better performance, use NOT MATERIALIZED:
WITH recent_orders AS NOT MATERIALIZED (
SELECT * FROM orders WHERE order_date >= CURRENT_DATE - INTERVAL '7 days'
)
SELECT * FROM recent_orders WHERE customer_id = 42;
Without NOT MATERIALIZED, PostgreSQL scans all recent orders then filters. With it, the planner can push the customer filter down to the base table.
Compare execution plans:
-- Materialized (default): scans all high-value orders
EXPLAIN ANALYZE
WITH high_value AS (
SELECT * FROM orders WHERE total_amount > 1000
)
SELECT * FROM high_value WHERE customer_id = 123;
-- Not materialized: uses customer index efficiently
EXPLAIN ANALYZE
WITH high_value AS NOT MATERIALIZED (
SELECT * FROM orders WHERE total_amount > 1000
)
SELECT * FROM high_value WHERE customer_id = 123;
Use MATERIALIZED explicitly when you want to evaluate a CTE once and reuse it:
WITH MATERIALIZED expensive_calculation AS (
SELECT
product_id,
AVG(price) OVER (ORDER BY date ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) as moving_avg
FROM price_history
)
SELECT * FROM expensive_calculation ec1
JOIN expensive_calculation ec2 ON ec1.product_id = ec2.product_id
WHERE ec1.date < ec2.date;
Here materialization prevents calculating the moving average twice.
Real-World Use Cases
Calculate running totals while maintaining readability:
WITH daily_sales AS (
SELECT
DATE(order_date) as sale_date,
SUM(total_amount) as daily_total
FROM orders
WHERE order_date >= DATE '2024-01-01'
GROUP BY DATE(order_date)
)
SELECT
sale_date,
daily_total,
SUM(daily_total) OVER (ORDER BY sale_date) as running_total,
AVG(daily_total) OVER (ORDER BY sale_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as seven_day_avg
FROM daily_sales
ORDER BY sale_date;
Deduplicate before processing to avoid expensive operations on duplicate rows:
WITH ranked_events AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY user_id, event_type ORDER BY event_time DESC) as rn
FROM user_events
WHERE event_date = CURRENT_DATE
),
unique_events AS (
SELECT * FROM ranked_events WHERE rn = 1
)
UPDATE user_profiles up
SET last_activity = ue.event_time
FROM unique_events ue
WHERE up.user_id = ue.user_id;
Build staged reports where each calculation depends on previous aggregations:
WITH monthly_metrics AS (
SELECT
DATE_TRUNC('month', order_date) as month,
COUNT(DISTINCT customer_id) as unique_customers,
COUNT(*) as total_orders,
SUM(total_amount) as revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
),
growth_metrics AS (
SELECT
month,
unique_customers,
total_orders,
revenue,
LAG(revenue) OVER (ORDER BY month) as prev_month_revenue
FROM monthly_metrics
)
SELECT
month,
unique_customers,
total_orders,
revenue,
ROUND((revenue - prev_month_revenue) / prev_month_revenue * 100, 2) as revenue_growth_pct,
ROUND(revenue / unique_customers, 2) as revenue_per_customer
FROM growth_metrics
ORDER BY month;
CTEs transform complex analytical queries from impenetrable nested subqueries into clear, maintainable SQL. Use them liberally for readability, but understand their performance characteristics and optimize with materialization hints when necessary.