Database Query Planning: Cost-Based Optimization
When you execute a SQL query, the database doesn't just naively fetch data row by row. Between your SQL statement and actual data retrieval sits the query optimizer—a sophisticated component that...
Key Insights
- Query optimizers use cost models based on I/O, CPU, and memory estimates to choose between execution strategies—understanding these costs lets you predict and influence optimizer decisions
- Statistics freshness directly determines plan quality; outdated statistics cause the optimizer to make wrong choices about join algorithms, index usage, and scan methods
- The optimizer doesn’t always choose optimally, especially with complex queries or unusual data distributions—knowing when and how to intervene separates competent DBAs from great ones
Introduction to Query Planning
When you execute a SQL query, the database doesn’t just naively fetch data row by row. Between your SQL statement and actual data retrieval sits the query optimizer—a sophisticated component that transforms declarative SQL into an optimal execution plan.
The optimizer’s job is to answer: “What’s the cheapest way to get this data?” It evaluates multiple execution strategies, estimates their costs, and selects the plan with the lowest expected expense. This process happens in milliseconds for simple queries but can involve evaluating thousands of potential plans for complex joins.
Let’s see this in action with PostgreSQL:
-- Create a test table
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT,
order_date DATE,
amount DECIMAL(10,2)
);
-- Insert sample data
INSERT INTO orders (customer_id, order_date, amount)
SELECT
(random() * 1000)::INT,
CURRENT_DATE - (random() * 365)::INT,
(random() * 1000)::DECIMAL(10,2)
FROM generate_series(1, 100000);
-- Without index
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 500;
-- Output shows Seq Scan with cost=0.00..2084.00
-- Execution time: ~25ms
-- Add index
CREATE INDEX idx_customer ON orders(customer_id);
-- Same query now uses index
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 500;
-- Output shows Index Scan with cost=0.42..120.15
-- Execution time: ~2ms
The optimizer recognized that after creating an index, the cost of an index scan (120.15) is significantly lower than a sequential scan (2084.00), and execution time improved by 10x.
Understanding Query Costs
Database cost models assign numerical values to operations. These aren’t real-time measurements—they’re estimates based on formulas that consider:
- I/O costs: Reading pages from disk (or cache)
- CPU costs: Processing rows, evaluating conditions, computing joins
- Memory costs: Sorting, hashing, materializing intermediate results
In PostgreSQL, you’ll see costs like cost=0.42..120.15. The first number is startup cost (work before returning the first row), and the second is total cost. These are arbitrary units calibrated against sequential page reads.
-- Compare access methods
EXPLAIN (COSTS ON, BUFFERS ON)
SELECT * FROM orders WHERE customer_id = 500;
-- Index Scan using idx_customer on orders
-- cost=0.42..120.15 rows=98 width=24
-- Startup cost: 0.42 (navigating B-tree)
-- Total cost: 120.15 (reading ~98 rows)
-- Force sequential scan to compare
SET enable_indexscan = off;
EXPLAIN (COSTS ON)
SELECT * FROM orders WHERE customer_id = 500;
-- Seq Scan on orders
-- cost=0.00..2084.00 rows=98 width=24
-- Startup cost: 0 (no index navigation)
-- Total cost: 2084.00 (scanning all 100k rows)
SET enable_indexscan = on;
The optimizer correctly chooses the index scan because even with startup overhead, the total cost is dramatically lower. But if you were selecting 50% of the table, a sequential scan would actually be cheaper—the optimizer knows this.
Statistics and Cardinality Estimation
The optimizer’s cost estimates depend entirely on statistics about your data. Without accurate statistics, it’s flying blind.
Databases maintain statistics tables tracking:
- Row counts per table
- Distinct value counts per column
- Most common values and their frequencies
- Histograms showing data distribution
-- Check current statistics
SELECT
tablename,
attname,
n_distinct,
most_common_vals,
most_common_freqs
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'customer_id';
-- Simulate stale statistics by massive data change
INSERT INTO orders (customer_id, order_date, amount)
SELECT
9999, -- Single customer_id value
CURRENT_DATE,
500.00
FROM generate_series(1, 100000);
-- Query for this new customer before ANALYZE
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 9999;
-- Shows: rows=98 (estimate) but actual rows=100000
-- Optimizer chose index scan expecting 98 rows
-- Actual execution is terrible because of massive underestimate
-- Update statistics
ANALYZE orders;
-- Same query now
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 9999;
-- Shows: rows=100000 (estimate) actual rows=100000
-- Optimizer now chooses Seq Scan—correct for this cardinality
This example shows why ANALYZE is critical. Stale statistics cause the optimizer to choose plans optimized for data that no longer exists. Schedule regular ANALYZE operations, especially after bulk loads or significant updates.
Join Optimization Strategies
Join algorithms have vastly different performance characteristics. The optimizer chooses based on estimated row counts, available memory, and index availability.
Nested Loop Join: Efficient for small outer tables with indexed inner tables. For each outer row, perform an index lookup on the inner table.
Hash Join: Build a hash table from the smaller table, probe with the larger. Requires memory but excellent for medium-to-large equi-joins.
Merge Join: Sort both inputs, then merge. Efficient when inputs are already sorted or can use indexes.
-- Create second table
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
region VARCHAR(50)
);
INSERT INTO customers (name, region)
SELECT
'Customer ' || i,
CASE (random() * 4)::INT
WHEN 0 THEN 'North'
WHEN 1 THEN 'South'
WHEN 2 THEN 'East'
ELSE 'West'
END
FROM generate_series(1, 1000) i;
-- Small result set: Nested Loop
EXPLAIN ANALYZE
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.id < 10;
-- Nested Loop
-- Inner loop uses index on customers(id)
-- Efficient because outer produces few rows
-- Large result set: Hash Join
EXPLAIN ANALYZE
SELECT c.region, COUNT(*), SUM(o.amount)
FROM orders o
JOIN customers c ON o.customer_id = c.id
GROUP BY c.region;
-- Hash Join
-- Builds hash table from customers (smaller)
-- Probes with orders (larger)
-- Optimal for full table join
-- Force different algorithm to see cost difference
SET enable_hashjoin = off;
EXPLAIN ANALYZE
SELECT c.region, COUNT(*), SUM(o.amount)
FROM orders o
JOIN customers c ON o.customer_id = c.id
GROUP BY c.region;
-- Merge Join or Nested Loop (much slower)
SET enable_hashjoin = on;
Join order matters enormously. The optimizer evaluates different orderings, but with many tables (6+), it uses heuristics rather than exhaustive search to keep planning time reasonable.
Index Selection and Access Paths
Not every index gets used, and sometimes no index is the right choice. The optimizer considers selectivity—what percentage of rows match your predicate.
-- Create composite index
CREATE INDEX idx_customer_date ON orders(customer_id, order_date);
-- Index-only scan (covering index)
EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_id, order_date
FROM orders
WHERE customer_id = 500 AND order_date > '2024-01-01';
-- Index Only Scan using idx_customer_date
-- Heap Fetches: 0
-- All data retrieved from index, no table access
-- Index scan with table lookup
EXPLAIN (ANALYZE, BUFFERS)
SELECT *
FROM orders
WHERE customer_id = 500 AND order_date > '2024-01-01';
-- Index Scan using idx_customer_date
-- Must access table to retrieve amount column
-- Low selectivity: sequential scan wins
EXPLAIN ANALYZE
SELECT * FROM orders WHERE amount > 100;
-- Seq Scan (if predicate matches >5-10% of rows)
-- Index would require too many random I/Os
The crossover point where sequential scans become cheaper than index scans typically occurs around 5-10% selectivity, but this depends on random_page_cost settings and whether data is cached.
Influencing the Optimizer
Sometimes you know better than the optimizer. Use these techniques judiciously:
-- Adjust cost parameters
SET random_page_cost = 1.1; -- Lower for SSD (default 4.0)
SET effective_cache_size = '8GB'; -- Hint about available cache
-- Query rewriting: break complex query into CTEs
-- Forces materialization boundaries
WITH recent_orders AS (
SELECT customer_id, SUM(amount) as total
FROM orders
WHERE order_date > CURRENT_DATE - 30
GROUP BY customer_id
)
SELECT c.name, ro.total
FROM customers c
JOIN recent_orders ro ON c.id = ro.customer_id
WHERE ro.total > 1000;
-- Index hints (PostgreSQL doesn't support, but others do)
-- SQL Server example:
-- SELECT * FROM orders WITH (INDEX(idx_customer))
-- WHERE customer_id = 500;
-- Disable specific operations for testing
SET enable_seqscan = off; -- Force index usage
SET enable_nestloop = off; -- Avoid nested loops
-- Remember to reset these!
In production, prefer query rewriting over hints. Hints are brittle and can cause problems when data volumes change.
Monitoring and Troubleshooting
Production systems require ongoing plan monitoring. Plans regress due to data drift, statistics staleness, or version upgrades.
-- Enable pg_stat_statements extension
CREATE EXTENSION pg_stat_statements;
-- Find queries with high execution time variance
SELECT
query,
calls,
mean_exec_time,
stddev_exec_time,
max_exec_time
FROM pg_stat_statements
WHERE stddev_exec_time > mean_exec_time * 0.5
ORDER BY stddev_exec_time DESC
LIMIT 20;
-- Capture plans for later comparison
CREATE TABLE plan_history (
query_id TEXT,
captured_at TIMESTAMP,
plan_text TEXT
);
-- Store plan
INSERT INTO plan_history
SELECT
md5($query$SELECT * FROM orders WHERE customer_id = 500$query$),
NOW(),
plan
FROM (
EXPLAIN (FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 500
) AS plan;
-- Compare plans over time
SELECT
captured_at,
plan_text
FROM plan_history
WHERE query_id = md5($query$SELECT * FROM orders WHERE customer_id = 500$query$)
ORDER BY captured_at DESC;
Set up alerting for queries whose execution time suddenly increases 2-3x. This often indicates plan regression requiring immediate investigation.
The query optimizer is sophisticated but not omniscient. Understanding cost-based optimization transforms you from someone who writes SQL to someone who writes SQL the database can execute efficiently. Monitor your plans, keep statistics fresh, and intervene when the optimizer makes poor choices. Your production database will thank you.