SQL - Subquery in FROM Clause (Derived Table)
When you write a SQL query, the FROM clause typically references physical tables or views. But SQL allows something more powerful: you can place an entire subquery in the FROM clause, creating what's...
Key Insights
- Derived tables are subqueries in the FROM clause that create temporary, inline result sets—they must always have an alias and exist only for the duration of the outer query.
- Use derived tables to pre-aggregate data before joins, break complex logic into digestible steps, or filter on calculated values without resorting to HAVING clauses.
- Derived tables and CTEs typically produce identical execution plans, so choose based on readability; use derived tables for simple, one-off transformations and CTEs when you need to reference the same result set multiple times.
Introduction to Derived Tables
When you write a SQL query, the FROM clause typically references physical tables or views. But SQL allows something more powerful: you can place an entire subquery in the FROM clause, creating what’s called a derived table (also known as an inline view or subquery table).
A derived table is a temporary result set that exists only during query execution. Unlike temporary tables or views, it has no persistent storage. The database engine executes the subquery, holds the results in memory, and makes them available to the outer query as if they were a regular table.
This technique transforms how you approach complex queries. Instead of cramming everything into a single SELECT statement with convoluted WHERE clauses and multiple GROUP BY expressions, you can build your query in logical layers. Each derived table handles one piece of the puzzle.
Basic Syntax and Structure
The syntax is straightforward but has strict requirements:
SELECT outer_columns
FROM (
SELECT inner_columns
FROM some_table
WHERE some_condition
) AS alias_name
WHERE outer_condition;
Two rules are non-negotiable:
- Parentheses are mandatory. The subquery must be enclosed in parentheses.
- An alias is required. Every derived table needs a name. Without it, you’ll get a syntax error.
The alias creates a namespace for the derived table’s columns. You reference them as alias_name.column_name in the outer query.
Here’s a practical example that aggregates order data:
SELECT
order_summary.customer_id,
order_summary.total_orders,
order_summary.total_spent
FROM (
SELECT
customer_id,
COUNT(*) AS total_orders,
SUM(order_total) AS total_spent
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY customer_id
) AS order_summary
WHERE order_summary.total_spent > 1000;
The inner query aggregates orders by customer. The outer query filters to high-value customers. Notice how the derived table’s alias (order_summary) makes the outer query readable and self-documenting.
Scope matters here. Columns from the derived table are visible to the outer query, but the outer query’s columns aren’t visible inside the derived table. This one-way visibility is intentional—it keeps the subquery self-contained and predictable.
Common Use Cases
Derived tables solve several recurring problems elegantly.
Pre-Aggregating Data Before Joins
When you need to join aggregated data back to detail records, derived tables keep your logic clean:
SELECT
e.employee_id,
e.employee_name,
e.salary,
dept_avg.avg_salary,
e.salary - dept_avg.avg_salary AS variance_from_avg
FROM employees e
INNER JOIN (
SELECT
department_id,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
) AS dept_avg ON e.department_id = dept_avg.department_id
WHERE e.salary > dept_avg.avg_salary;
This query finds employees earning above their department’s average. The derived table calculates department averages, then the outer query joins back to individual employees. Trying to do this without a derived table would require correlated subqueries or window functions—both less intuitive for this use case.
Filtering on Aggregated Values
Sometimes you need to filter based on an aggregate, but the condition doesn’t fit naturally in a HAVING clause:
SELECT
product_stats.product_id,
product_stats.product_name,
product_stats.units_sold
FROM (
SELECT
p.product_id,
p.product_name,
SUM(oi.quantity) AS units_sold
FROM products p
LEFT JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY p.product_id, p.product_name
) AS product_stats
WHERE product_stats.units_sold BETWEEN 100 AND 500
OR product_stats.units_sold IS NULL;
Top-N Per Group
Finding the top records within each group is a classic derived table application:
SELECT
ranked_sales.region,
ranked_sales.salesperson,
ranked_sales.total_sales
FROM (
SELECT
region,
salesperson,
total_sales,
ROW_NUMBER() OVER (PARTITION BY region ORDER BY total_sales DESC) AS rank
FROM sales_summary
) AS ranked_sales
WHERE ranked_sales.rank <= 3;
The inner query assigns rankings. The outer query filters to the top 3 per region. Clean and efficient.
Derived Tables vs. CTEs
Common Table Expressions (CTEs) offer an alternative syntax for the same concept. Here’s the department average query rewritten as a CTE:
-- CTE version
WITH dept_avg AS (
SELECT
department_id,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
)
SELECT
e.employee_id,
e.employee_name,
e.salary,
dept_avg.avg_salary
FROM employees e
INNER JOIN dept_avg ON e.department_id = dept_avg.department_id
WHERE e.salary > dept_avg.avg_salary;
-- Derived table version (same logic)
SELECT
e.employee_id,
e.employee_name,
e.salary,
dept_avg.avg_salary
FROM employees e
INNER JOIN (
SELECT
department_id,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
) AS dept_avg ON e.department_id = dept_avg.department_id
WHERE e.salary > dept_avg.avg_salary;
When to choose CTEs:
- You need to reference the same result set multiple times
- The query has many layers of nesting (CTEs flatten the structure)
- You want to name intermediate steps for documentation purposes
When to choose derived tables:
- Simple, one-off transformations
- You prefer seeing the subquery inline with where it’s used
- You’re working with older database systems that don’t support CTEs
Performance is typically identical. Modern query optimizers treat both constructs the same way—they analyze the logical intent and generate equivalent execution plans. Don’t choose based on perceived performance differences; choose based on readability.
Performance Considerations
Understanding how databases execute derived tables helps you write efficient queries.
Merging vs. Materialization
The optimizer has two strategies:
-
Merging: The optimizer “flattens” the derived table, incorporating its logic directly into the outer query. This allows better optimization across the entire query.
-
Materialization: The optimizer executes the derived table first, stores results in a temporary structure, then processes the outer query against that structure.
Most optimizers prefer merging when possible. Materialization happens when merging isn’t feasible—typically with DISTINCT, GROUP BY, LIMIT, or window functions in the derived table.
You can examine the optimizer’s choice using EXPLAIN:
EXPLAIN ANALYZE
SELECT
customer_summary.customer_id,
customer_summary.order_count
FROM (
SELECT
customer_id,
COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
) AS customer_summary
WHERE customer_summary.order_count > 10;
Look for terms like “Materialize” or “Subquery Scan” in the output to understand how your derived table is being processed.
Index Usage
Derived tables don’t have indexes. If the outer query needs to filter or join on derived table columns, the database must scan the entire result set. Keep derived tables as small as possible by:
- Filtering aggressively in the inner query
- Selecting only needed columns
- Applying LIMIT when appropriate
Practical Example: Multi-Level Aggregation
Real-world reporting often requires aggregating aggregates. Consider this scenario: you need the average monthly sales per region, but your data is at the transaction level.
SELECT
regional_monthly.region,
ROUND(AVG(regional_monthly.monthly_total), 2) AS avg_monthly_sales,
MIN(regional_monthly.monthly_total) AS lowest_month,
MAX(regional_monthly.monthly_total) AS highest_month
FROM (
SELECT
s.region,
DATE_TRUNC('month', s.sale_date) AS sale_month,
SUM(s.amount) AS monthly_total
FROM sales s
WHERE s.sale_date >= '2024-01-01'
AND s.sale_date < '2025-01-01'
GROUP BY s.region, DATE_TRUNC('month', s.sale_date)
) AS regional_monthly
GROUP BY regional_monthly.region
ORDER BY avg_monthly_sales DESC;
The inner derived table aggregates daily sales into monthly totals by region. The outer query then calculates statistics across those monthly figures. Without this two-stage approach, you’d be stuck trying to express “average of sums” in a single GROUP BY—which isn’t possible.
For even more complex scenarios, you can nest derived tables:
SELECT
yearly_avg.region,
yearly_avg.year,
yearly_avg.avg_monthly_sales
FROM (
SELECT
regional_monthly.region,
EXTRACT(YEAR FROM regional_monthly.sale_month) AS year,
AVG(regional_monthly.monthly_total) AS avg_monthly_sales
FROM (
SELECT
region,
DATE_TRUNC('month', sale_date) AS sale_month,
SUM(amount) AS monthly_total
FROM sales
GROUP BY region, DATE_TRUNC('month', sale_date)
) AS regional_monthly
GROUP BY regional_monthly.region, EXTRACT(YEAR FROM regional_monthly.sale_month)
) AS yearly_avg
WHERE yearly_avg.avg_monthly_sales > 50000;
This calculates average monthly sales per region per year, then filters to high-performing combinations. At this nesting level, consider switching to CTEs for readability.
Summary and Best Practices
Derived tables are a fundamental SQL technique that every developer should master. They let you decompose complex queries into manageable pieces, pre-aggregate data for joins, and express multi-level calculations cleanly.
Remember these guidelines:
- Always provide an alias—it’s not optional
- Keep derived tables focused on a single transformation
- Filter early and select only needed columns
- Switch to CTEs when nesting exceeds two levels or when you need to reuse results
- Use EXPLAIN to verify the optimizer handles your derived table efficiently
Derived tables won’t solve every problem. For truly complex transformations, temporary tables give you indexes and statistics. For reusable logic, views provide persistence. But for inline, query-specific transformations, derived tables hit the sweet spot between power and simplicity.