How to Use CTEs in PostgreSQL

Common Table Expressions (CTEs) are temporary named result sets that exist only within the execution scope of a single query. You define them using the `WITH` clause, and they're particularly...

Key Insights

  • CTEs make complex queries readable by breaking logic into named, sequential steps—treat them as temporary result sets that exist only for the duration of your query
  • Recursive CTEs excel at hierarchical data traversal (org charts, category trees, graph structures) where subqueries fall short
  • PostgreSQL 12+ treats CTEs as optimization fences by default; use NOT MATERIALIZED for better performance when you need the query planner to inline them

Introduction to CTEs (Common Table Expressions)

Common Table Expressions (CTEs) are temporary named result sets that exist only within the execution scope of a single query. You define them using the WITH clause, and they’re particularly valuable when you need to reference the same subquery multiple times or when working with recursive data structures.

CTEs shine in three scenarios: improving query readability by breaking complex logic into digestible chunks, enabling recursive queries for hierarchical data, and staging intermediate results for multi-step transformations. Unlike views, CTEs don’t persist in your database schema. Unlike subqueries, they can be referenced multiple times without redundant execution.

Here’s a direct comparison showing why you’d choose a CTE:

-- Using a subquery (harder to read, especially when complex)
SELECT e.name, e.salary, dept_avg.avg_salary
FROM employees e
JOIN (
    SELECT department_id, AVG(salary) as avg_salary
    FROM employees
    GROUP BY department_id
) dept_avg ON e.department_id = dept_avg.department_id
WHERE e.salary > dept_avg.avg_salary;

-- Using a CTE (clearer intent, easier to modify)
WITH department_averages AS (
    SELECT department_id, AVG(salary) as avg_salary
    FROM employees
    GROUP BY department_id
)
SELECT e.name, e.salary, da.avg_salary
FROM employees e
JOIN department_averages da ON e.department_id = da.department_id
WHERE e.salary > da.avg_salary;

The CTE version separates the “what” (calculating department averages) from the “how” (using those averages to filter employees). This matters more as queries grow in complexity.

Basic CTE Syntax and Simple Queries

The fundamental CTE structure follows this pattern:

WITH cte_name AS (
    -- Your query here
)
SELECT * FROM cte_name;

You can stack multiple CTEs by separating them with commas—no need to repeat the WITH keyword:

WITH high_value_orders AS (
    SELECT customer_id, order_id, total_amount
    FROM orders
    WHERE total_amount > 1000
    AND order_date >= CURRENT_DATE - INTERVAL '90 days'
),
customer_summary AS (
    SELECT 
        c.customer_id,
        c.name,
        COUNT(hvo.order_id) as high_value_count,
        SUM(hvo.total_amount) as total_spent
    FROM customers c
    JOIN high_value_orders hvo ON c.customer_id = hvo.customer_id
    GROUP BY c.customer_id, c.name
)
SELECT *
FROM customer_summary
WHERE high_value_count >= 3
ORDER BY total_spent DESC;

This query breaks down the problem: first isolate high-value recent orders, then aggregate them by customer, finally filter for your VIP segment. Each CTE represents one logical step.

CTEs can reference earlier CTEs in the same query but not later ones. Think of them as executing top-to-bottom:

WITH filtered_products AS (
    SELECT product_id, name, category_id, price
    FROM products
    WHERE active = true
),
category_stats AS (
    SELECT 
        category_id,
        COUNT(*) as product_count,
        AVG(price) as avg_price
    FROM filtered_products  -- References the first CTE
    GROUP BY category_id
)
SELECT 
    fp.name,
    fp.price,
    cs.avg_price,
    fp.price - cs.avg_price as price_difference
FROM filtered_products fp
JOIN category_stats cs ON fp.category_id = cs.category_id
WHERE fp.price > cs.avg_price * 1.2;

Recursive CTEs

Recursive CTEs handle hierarchical or graph-based data where you need to traverse relationships of unknown depth. The structure requires two parts: a base case (starting point) and a recursive term (how to traverse), joined with UNION ALL.

The classic example is an organizational hierarchy:

WITH RECURSIVE employee_hierarchy AS (
    -- Base case: start with top-level managers
    SELECT 
        employee_id,
        name,
        manager_id,
        1 as level,
        name::text as path
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive case: find direct reports
    SELECT 
        e.employee_id,
        e.name,
        e.manager_id,
        eh.level + 1,
        eh.path || ' -> ' || e.name
    FROM employees e
    JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM employee_hierarchy
ORDER BY level, name;

This query starts with employees who have no manager (CEOs, founders), then recursively finds their direct reports, building a complete org chart with depth levels and reporting paths.

Generate sequences without a numbers table:

WITH RECURSIVE date_series AS (
    SELECT DATE '2024-01-01' as date
    UNION ALL
    SELECT date + INTERVAL '1 day'
    FROM date_series
    WHERE date < DATE '2024-12-31'
)
SELECT date, EXTRACT(DOW FROM date) as day_of_week
FROM date_series;

For graph traversal, include cycle detection to prevent infinite loops:

WITH RECURSIVE category_tree AS (
    SELECT 
        category_id,
        name,
        parent_id,
        ARRAY[category_id] as path,
        false as is_cycle
    FROM categories
    WHERE parent_id IS NULL
    
    UNION ALL
    
    SELECT 
        c.category_id,
        c.name,
        c.parent_id,
        ct.path || c.category_id,
        c.category_id = ANY(ct.path)
    FROM categories c
    JOIN category_tree ct ON c.parent_id = ct.category_id
    WHERE NOT ct.is_cycle
)
SELECT * FROM category_tree WHERE NOT is_cycle;

The path array tracks visited nodes, and is_cycle flags when we encounter a node already in our path.

Advanced CTE Techniques

CTEs can modify data and return results using the RETURNING clause. This is powerful for capturing generated IDs or modified values:

WITH inserted_orders AS (
    INSERT INTO orders (customer_id, order_date, status)
    VALUES (123, CURRENT_DATE, 'pending')
    RETURNING order_id, customer_id
),
inserted_items AS (
    INSERT INTO order_items (order_id, product_id, quantity, price)
    SELECT 
        io.order_id,
        p.product_id,
        1,
        p.price
    FROM inserted_orders io
    CROSS JOIN (VALUES (456), (789), (101)) AS p(product_id)
    RETURNING order_id, product_id, price
)
SELECT 
    io.order_id,
    COUNT(ii.product_id) as item_count,
    SUM(ii.price) as total_amount
FROM inserted_orders io
JOIN inserted_items ii ON io.order_id = ii.order_id
GROUP BY io.order_id;

This atomically creates an order and its line items, then immediately returns a summary—all in one query.

Chain CTEs for complex transformations where each step depends on the previous:

WITH raw_data AS (
    SELECT * FROM sales_import WHERE import_date = CURRENT_DATE
),
cleaned_data AS (
    SELECT 
        TRIM(customer_name) as customer_name,
        UPPER(product_code) as product_code,
        COALESCE(quantity, 0) as quantity,
        COALESCE(unit_price, 0) as unit_price
    FROM raw_data
    WHERE customer_name IS NOT NULL
),
enriched_data AS (
    SELECT 
        cd.*,
        c.customer_id,
        p.product_id,
        cd.quantity * cd.unit_price as line_total
    FROM cleaned_data cd
    LEFT JOIN customers c ON cd.customer_name = c.name
    LEFT JOIN products p ON cd.product_code = p.code
),
validation_results AS (
    SELECT 
        *,
        CASE 
            WHEN customer_id IS NULL THEN 'missing_customer'
            WHEN product_id IS NULL THEN 'missing_product'
            WHEN quantity <= 0 THEN 'invalid_quantity'
            ELSE 'valid'
        END as validation_status
    FROM enriched_data
)
SELECT * FROM validation_results;

Each CTE represents one transformation stage: extract, clean, enrich, validate. This pattern makes debugging trivial—just query any intermediate CTE to inspect that stage’s output.

Performance Considerations and Best Practices

PostgreSQL 12 introduced a critical change: CTEs are now optimization fences by default. The query planner won’t push predicates into CTEs or inline them, treating each as materialized. This prevents optimization but guarantees evaluation order.

For CTEs that should be inlined for better performance, use NOT MATERIALIZED:

WITH recent_orders AS NOT MATERIALIZED (
    SELECT * FROM orders WHERE order_date >= CURRENT_DATE - INTERVAL '7 days'
)
SELECT * FROM recent_orders WHERE customer_id = 42;

Without NOT MATERIALIZED, PostgreSQL scans all recent orders then filters. With it, the planner can push the customer filter down to the base table.

Compare execution plans:

-- Materialized (default): scans all high-value orders
EXPLAIN ANALYZE
WITH high_value AS (
    SELECT * FROM orders WHERE total_amount > 1000
)
SELECT * FROM high_value WHERE customer_id = 123;

-- Not materialized: uses customer index efficiently
EXPLAIN ANALYZE
WITH high_value AS NOT MATERIALIZED (
    SELECT * FROM orders WHERE total_amount > 1000
)
SELECT * FROM high_value WHERE customer_id = 123;

Use MATERIALIZED explicitly when you want to evaluate a CTE once and reuse it:

WITH MATERIALIZED expensive_calculation AS (
    SELECT 
        product_id,
        AVG(price) OVER (ORDER BY date ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) as moving_avg
    FROM price_history
)
SELECT * FROM expensive_calculation ec1
JOIN expensive_calculation ec2 ON ec1.product_id = ec2.product_id
WHERE ec1.date < ec2.date;

Here materialization prevents calculating the moving average twice.

Real-World Use Cases

Calculate running totals while maintaining readability:

WITH daily_sales AS (
    SELECT 
        DATE(order_date) as sale_date,
        SUM(total_amount) as daily_total
    FROM orders
    WHERE order_date >= DATE '2024-01-01'
    GROUP BY DATE(order_date)
)
SELECT 
    sale_date,
    daily_total,
    SUM(daily_total) OVER (ORDER BY sale_date) as running_total,
    AVG(daily_total) OVER (ORDER BY sale_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as seven_day_avg
FROM daily_sales
ORDER BY sale_date;

Deduplicate before processing to avoid expensive operations on duplicate rows:

WITH ranked_events AS (
    SELECT 
        *,
        ROW_NUMBER() OVER (PARTITION BY user_id, event_type ORDER BY event_time DESC) as rn
    FROM user_events
    WHERE event_date = CURRENT_DATE
),
unique_events AS (
    SELECT * FROM ranked_events WHERE rn = 1
)
UPDATE user_profiles up
SET last_activity = ue.event_time
FROM unique_events ue
WHERE up.user_id = ue.user_id;

Build staged reports where each calculation depends on previous aggregations:

WITH monthly_metrics AS (
    SELECT 
        DATE_TRUNC('month', order_date) as month,
        COUNT(DISTINCT customer_id) as unique_customers,
        COUNT(*) as total_orders,
        SUM(total_amount) as revenue
    FROM orders
    GROUP BY DATE_TRUNC('month', order_date)
),
growth_metrics AS (
    SELECT 
        month,
        unique_customers,
        total_orders,
        revenue,
        LAG(revenue) OVER (ORDER BY month) as prev_month_revenue
    FROM monthly_metrics
)
SELECT 
    month,
    unique_customers,
    total_orders,
    revenue,
    ROUND((revenue - prev_month_revenue) / prev_month_revenue * 100, 2) as revenue_growth_pct,
    ROUND(revenue / unique_customers, 2) as revenue_per_customer
FROM growth_metrics
ORDER BY month;

CTEs transform complex analytical queries from impenetrable nested subqueries into clear, maintainable SQL. Use them liberally for readability, but understand their performance characteristics and optimize with materialization hints when necessary.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.