How to Use CTEs (Common Table Expressions) in MySQL
Common Table Expressions (CTEs) are temporary named result sets that exist only within the execution scope of a single SQL statement. Introduced in MySQL 8.0, CTEs provide a cleaner alternative to...
Key Insights
- CTEs improve query readability by breaking complex logic into named, reusable components—think of them as query-scoped views that exist only for the duration of a single statement.
- Recursive CTEs unlock hierarchical queries in MySQL 8.0+, eliminating the need for stored procedures or application-level recursion when traversing organizational charts, category trees, or bill-of-materials structures.
- While CTEs excel at code organization, they aren’t always faster than subqueries or temporary tables—MySQL may materialize them differently depending on the optimizer’s decisions, so profile your specific use case.
Introduction to CTEs
Common Table Expressions (CTEs) are temporary named result sets that exist only within the execution scope of a single SQL statement. Introduced in MySQL 8.0, CTEs provide a cleaner alternative to nested subqueries and temporary tables for organizing complex query logic.
Before CTEs, you’d write deeply nested subqueries that became increasingly difficult to read and maintain:
-- Without CTE: nested subquery approach
SELECT e.name, e.salary
FROM employees e
WHERE e.salary > (
SELECT AVG(salary)
FROM employees
WHERE department_id = e.department_id
)
AND e.department_id IN (
SELECT department_id
FROM departments
WHERE region = 'West'
);
With CTEs, you break this logic into digestible, named components:
-- With CTE: clearer intent and structure
WITH dept_averages AS (
SELECT department_id, AVG(salary) as avg_salary
FROM employees
GROUP BY department_id
),
west_departments AS (
SELECT department_id
FROM departments
WHERE region = 'West'
)
SELECT e.name, e.salary
FROM employees e
JOIN dept_averages da ON e.department_id = da.department_id
JOIN west_departments wd ON e.department_id = wd.department_id
WHERE e.salary > da.avg_salary;
The CTE version reads like a story: first we calculate department averages, then identify western departments, finally filter employees. Each step has a meaningful name that documents its purpose.
Basic CTE Syntax and Simple Examples
The fundamental syntax uses the WITH clause followed by your CTE definition:
WITH cte_name AS (
SELECT column1, column2
FROM table_name
WHERE condition
)
SELECT *
FROM cte_name
WHERE additional_condition;
Here’s a practical example filtering high-performing employees:
WITH high_performers AS (
SELECT
employee_id,
name,
salary,
performance_score
FROM employees
WHERE performance_score >= 4.0
AND tenure_years >= 2
)
SELECT
name,
salary,
performance_score
FROM high_performers
WHERE salary < 100000
ORDER BY performance_score DESC;
The CTE high_performers isolates the business logic for identifying qualified employees, while the main query focuses on the specific reporting requirement. This separation makes the query self-documenting and easier to modify.
Multiple CTEs and Chaining
You can define multiple CTEs in a single query by separating them with commas. Later CTEs can reference earlier ones, enabling step-by-step data transformation:
WITH monthly_sales AS (
SELECT
DATE_FORMAT(order_date, '%Y-%m') as month,
product_id,
SUM(quantity * unit_price) as revenue
FROM orders
GROUP BY month, product_id
),
product_rankings AS (
SELECT
month,
product_id,
revenue,
RANK() OVER (PARTITION BY month ORDER BY revenue DESC) as rank
FROM monthly_sales
),
top_products AS (
SELECT
pr.month,
p.product_name,
pr.revenue
FROM product_rankings pr
JOIN products p ON pr.product_id = p.product_id
WHERE pr.rank <= 5
)
SELECT
month,
product_name,
CONCAT('$', FORMAT(revenue, 2)) as formatted_revenue
FROM top_products
ORDER BY month DESC, revenue DESC;
This example chains three CTEs: monthly_sales aggregates raw data, product_rankings applies window functions, and top_products joins with dimension tables and filters. Each CTE builds on the previous one, creating a clear data pipeline.
Recursive CTEs
Recursive CTEs handle hierarchical data by repeatedly applying a query to its own output. They consist of two parts: an anchor member (base case) and a recursive member (iterative step), connected by UNION ALL:
WITH RECURSIVE employee_hierarchy AS (
-- Anchor member: start with CEO
SELECT
employee_id,
name,
manager_id,
1 as level,
CAST(name AS CHAR(500)) as path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive member: find direct reports
SELECT
e.employee_id,
e.name,
e.manager_id,
eh.level + 1,
CONCAT(eh.path, ' > ', e.name)
FROM employees e
JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
WHERE eh.level < 10 -- Prevent infinite loops
)
SELECT
REPEAT(' ', level - 1) as indent,
name,
level,
path
FROM employee_hierarchy
ORDER BY path;
The anchor member selects the root node (CEO with no manager). The recursive member joins employees to the CTE itself, finding each level of reports. The level < 10 condition prevents infinite loops if your data contains circular references.
Here’s a bill-of-materials example showing component hierarchies:
WITH RECURSIVE parts_explosion AS (
SELECT
component_id,
parent_id,
component_name,
quantity,
1 as level
FROM bill_of_materials
WHERE parent_id = 'PRODUCT-001' -- Top-level product
UNION ALL
SELECT
bom.component_id,
bom.parent_id,
bom.component_name,
bom.quantity * pe.quantity, -- Cumulative quantity
pe.level + 1
FROM bill_of_materials bom
JOIN parts_explosion pe ON bom.parent_id = pe.component_id
WHERE pe.level < 20
)
SELECT
component_id,
component_name,
quantity as total_quantity_needed,
level
FROM parts_explosion
ORDER BY level, component_name;
Practical Use Cases
CTEs shine in real-world scenarios requiring multi-step transformations. Here’s a customer order analysis with running totals:
WITH customer_orders AS (
SELECT
customer_id,
order_id,
order_date,
total_amount
FROM orders
WHERE order_date >= DATE_SUB(CURDATE(), INTERVAL 1 YEAR)
),
order_sequence AS (
SELECT
customer_id,
order_id,
order_date,
total_amount,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_date
) as order_number
FROM customer_orders
),
running_totals AS (
SELECT
customer_id,
order_id,
order_date,
total_amount,
order_number,
SUM(total_amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as cumulative_spent
FROM order_sequence
)
SELECT
c.customer_name,
rt.order_date,
rt.total_amount,
rt.order_number,
rt.cumulative_spent,
CASE
WHEN rt.cumulative_spent >= 10000 THEN 'Platinum'
WHEN rt.cumulative_spent >= 5000 THEN 'Gold'
WHEN rt.cumulative_spent >= 1000 THEN 'Silver'
ELSE 'Bronze'
END as loyalty_tier
FROM running_totals rt
JOIN customers c ON rt.customer_id = c.customer_id
WHERE rt.order_number <= 10 -- First 10 orders
ORDER BY c.customer_name, rt.order_date;
This query calculates loyalty tiers based on cumulative spending, showing how customers progress through tiers over time.
Performance Considerations and Best Practices
CTEs aren’t magic performance boosters. MySQL may materialize them as temporary tables or inline them like subqueries, depending on the optimizer’s cost analysis. Use EXPLAIN to understand execution plans:
EXPLAIN FORMAT=TREE
WITH sales_summary AS (
SELECT
product_id,
SUM(quantity) as total_quantity
FROM order_items
WHERE order_date >= '2024-01-01'
GROUP BY product_id
)
SELECT
p.product_name,
ss.total_quantity
FROM sales_summary ss
JOIN products p ON ss.product_id = p.product_id
WHERE ss.total_quantity > 100;
Best practices:
-
Index appropriately: CTEs benefit from the same indexes as regular queries. Ensure columns used in
WHERE,JOIN, andGROUP BYclauses within CTEs are indexed. -
Avoid unnecessary CTEs: Don’t use CTEs just for style. If a simple subquery is clearer, use it.
-
Limit recursive depth: Always include termination conditions in recursive CTEs to prevent runaway queries.
-
Consider materialization: For CTEs referenced multiple times, MySQL typically materializes them once. This is beneficial, but verify with
EXPLAIN. -
Use meaningful names: CTEs are documentation. Names like
active_customersormonthly_revenuemake queries self-explanatory.
When CTEs perform poorly, consider alternatives: derived tables for simple cases, temporary tables for complex multi-step transformations requiring indexes, or breaking logic into multiple queries in application code.
Conclusion
CTEs transform how you write complex SQL in MySQL 8.0+. They improve readability, enable recursive queries, and help you think about data transformations as pipelines rather than nested tangles. Use them when query logic benefits from named intermediate results, especially for hierarchical data, multi-step aggregations, and complex reporting.
Start simple: replace your next nested subquery with a CTE and notice how much easier it is to understand and modify. Then explore recursive CTEs for hierarchical data that previously required stored procedures. Just remember to profile performance—clarity is valuable, but not at the cost of unacceptable query times.