How to Use Subqueries in SQLite

Key Insights

Subqueries let you nest queries within queries, enabling complex filtering and calculations that would be difficult with simple SQL statements—use them in WHERE, SELECT, and FROM clauses to solve multi-step data problems.
Correlated subqueries reference the outer query and execute once per row (slower), while non-correlated subqueries run independently once (faster)—choose wisely based on your data relationships and performance needs.
JOINs typically outperform subqueries for simple lookups, but subqueries excel at aggregations, existence checks, and complex filtering where you need intermediate result sets or row-by-row comparisons.

Introduction to Subqueries

A subquery is simply a SELECT statement nested inside another SQL statement. Think of it as a query that provides data to another query, allowing you to break complex problems into manageable pieces. SQLite fully supports subqueries in WHERE, SELECT, FROM, and HAVING clauses, making them a powerful tool for data analysis and transformation.

The basic syntax looks like this:

SELECT column1, column2
FROM table1
WHERE column1 IN (SELECT column1 FROM table2 WHERE condition);

Use subqueries when you need to filter based on aggregated data, check for existence of related records, or create intermediate result sets. While JOINs often perform better for simple lookups, subqueries provide clearer logic for complex conditions and are sometimes the only practical solution.

Here’s a straightforward example finding employees who earn more than the company average:

SELECT employee_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

This subquery calculates the average once, then the outer query uses that value to filter results. Clean and readable.

Subqueries in the WHERE Clause

The WHERE clause is where subqueries shine for filtering. You can use them with IN, NOT IN, EXISTS, and comparison operators to create sophisticated filters.

Using IN to find related records:

SELECT customer_id, customer_name
FROM customers
WHERE customer_id IN (
    SELECT DISTINCT customer_id 
    FROM orders 
    WHERE order_date >= '2024-01-01'
);

This finds all customers who placed orders in 2024. The subquery returns a list of customer IDs, and the outer query filters to only those customers.

Using EXISTS for existence checks:

EXISTS is more efficient when you only care whether related records exist, not what they contain:

SELECT p.product_name, p.price
FROM products p
WHERE EXISTS (
    SELECT 1 
    FROM order_items oi 
    WHERE oi.product_id = p.product_id
);

This finds products that have been ordered at least once. EXISTS stops searching as soon as it finds a match, making it faster than IN for large datasets. The SELECT 1 is a convention—EXISTS only checks for row existence, not the actual values returned.

Scalar subquery comparisons:

When your subquery returns a single value, you can use standard comparison operators:

SELECT product_name, price
FROM products
WHERE price > (SELECT AVG(price) FROM products)
  AND category_id = 5;

This finds above-average priced products in a specific category. The subquery must return exactly one row and one column, or SQLite will throw an error.

Subqueries in the SELECT Clause

Scalar subqueries in the SELECT list add calculated columns based on related data. This is useful when you need per-row calculations that reference other tables.

SELECT 
    p.product_name,
    p.price,
    p.price - (
        SELECT AVG(price) 
        FROM products p2 
        WHERE p2.category_id = p.category_id
    ) AS price_vs_category_avg
FROM products p;

This shows each product’s price difference from its category average. The subquery runs for each row in the outer query, calculating the category average based on that row’s category_id. This is a correlated subquery—more on that later.

You can also use subqueries to pull single values from related tables:

SELECT 
    o.order_id,
    o.order_date,
    (SELECT customer_name FROM customers WHERE customer_id = o.customer_id) AS customer_name,
    (SELECT COUNT(*) FROM order_items WHERE order_id = o.order_id) AS item_count
FROM orders o;

While this works, a JOIN would be more efficient for the customer name lookup. Reserve SELECT clause subqueries for calculations and aggregations, not simple lookups.

Subqueries in the FROM Clause (Derived Tables)

Using a subquery in the FROM clause creates a derived table—a temporary result set you can query like any other table. This is essential for multi-step transformations and complex aggregations.

SELECT 
    customer_totals.customer_id,
    customer_totals.customer_name,
    customer_totals.total_spent
FROM (
    SELECT 
        c.customer_id,
        c.customer_name,
        SUM(oi.quantity * oi.unit_price) AS total_spent
    FROM customers c
    JOIN orders o ON c.customer_id = o.customer_id
    JOIN order_items oi ON o.order_id = oi.order_id
    GROUP BY c.customer_id, c.customer_name
) AS customer_totals
WHERE customer_totals.total_spent > 1000
ORDER BY customer_totals.total_spent DESC;

This first aggregates order totals by customer, then filters for high-value customers. You can’t filter on aggregated values directly in the same query level, so the subquery handles aggregation while the outer query handles filtering.

Derived tables must have an alias (customer_totals in this example). You can reference columns from the subquery just like a regular table.

Correlated vs Non-Correlated Subqueries

Understanding the difference between correlated and non-correlated subqueries is crucial for writing efficient SQL.

Non-correlated subqueries are independent—they run once and return a result set:

SELECT product_name, price
FROM products
WHERE category_id IN (
    SELECT category_id 
    FROM categories 
    WHERE category_name LIKE '%Electronics%'
);

The subquery executes once, finds all electronics category IDs, then the outer query uses those IDs. Simple and efficient.

Correlated subqueries reference columns from the outer query and execute once per row:

SELECT 
    e.employee_name,
    e.department_id,
    e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.department_id = e.department_id
);

This finds employees earning above their department average. For each employee in the outer query, the subquery calculates the average salary for that employee’s department. If you have 1000 employees across 10 departments, this subquery runs 1000 times.

Correlated subqueries are powerful but can be slow. When possible, rewrite them as JOINs or use window functions (available in SQLite 3.25.0+):

SELECT 
    employee_name,
    department_id,
    salary
FROM (
    SELECT 
        employee_name,
        department_id,
        salary,
        AVG(salary) OVER (PARTITION BY department_id) AS dept_avg
    FROM employees
)
WHERE salary > dept_avg;

This window function approach calculates department averages in a single pass, significantly faster for large datasets.

Performance Considerations and Best Practices

Subqueries aren’t always the best choice. Here’s when to use them and when to look for alternatives:

Use subqueries when:

You need to filter based on aggregated data
Checking for existence with EXISTS
Creating intermediate result sets for complex logic
The subquery result set is small

Use JOINs when:

Simple lookups from related tables
Returning columns from multiple tables
Better performance is critical

Side-by-side comparison:

-- Subquery approach
SELECT customer_name
FROM customers
WHERE customer_id IN (
    SELECT customer_id FROM orders WHERE order_date >= '2024-01-01'
);

-- JOIN approach (typically faster)
SELECT DISTINCT c.customer_name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2024-01-01';

The JOIN version usually performs better because SQLite can optimize the join operation and use indexes more effectively. However, the subquery version is sometimes clearer and more maintainable.

Always index columns used in subquery conditions:

CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);
CREATE INDEX idx_order_items_order ON order_items(order_id);

Use EXPLAIN QUERY PLAN to understand query execution:

EXPLAIN QUERY PLAN
SELECT customer_name
FROM customers
WHERE customer_id IN (SELECT customer_id FROM orders);

This shows whether SQLite is using indexes and how it’s executing the query.

Common Use Cases and Patterns

Finding duplicates:

SELECT email, COUNT(*) as count
FROM users
WHERE email IN (
    SELECT email
    FROM users
    GROUP BY email
    HAVING COUNT(*) > 1
)
GROUP BY email;

Top-N per group:

SELECT *
FROM products p
WHERE (
    SELECT COUNT(*)
    FROM products p2
    WHERE p2.category_id = p.category_id
      AND p2.price > p.price
) < 3
ORDER BY category_id, price DESC;

This finds the top 3 most expensive products per category using a correlated subquery that counts how many products in the same category cost more.

Complex filtering with aggregations:

SELECT 
    c.customer_id,
    c.customer_name,
    c.region
FROM customers c
WHERE (
    SELECT COUNT(*)
    FROM orders o
    WHERE o.customer_id = c.customer_id
) > (
    SELECT AVG(order_count)
    FROM (
        SELECT customer_id, COUNT(*) as order_count
        FROM orders
        GROUP BY customer_id
    )
)
AND c.region = 'West';

This finds Western region customers with above-average order counts—a multi-level subquery problem that’s difficult to express with JOINs alone.

Subqueries are an essential tool in your SQLite toolkit. Master them, understand their performance characteristics, and know when to use alternatives. Start with simple subqueries in WHERE clauses, then progress to derived tables and correlated subqueries as your needs grow. Your queries will become more powerful and your data analysis more sophisticated.