How to Optimize Queries in PostgreSQL
PostgreSQL's query execution follows a predictable pattern: parse, plan, execute. The planner's job is to evaluate possible execution strategies and choose the cheapest one based on estimated costs....
Key Insights
- EXPLAIN ANALYZE reveals the actual execution plan and bottlenecks—use it religiously before optimizing any query, as PostgreSQL’s planner decisions often surprise even experienced developers
- Index strategy matters more than index quantity: a well-ordered multi-column index on (status, created_at) performs radically differently than (created_at, status) depending on your WHERE clause
- The pg_stat_statements extension exposes your actual performance problems in production, not the queries you think are slow—enable it immediately and let data drive your optimization efforts
Understanding Query Performance Fundamentals
PostgreSQL’s query execution follows a predictable pattern: parse, plan, execute. The planner’s job is to evaluate possible execution strategies and choose the cheapest one based on estimated costs. When queries run slowly, the planner either chose poorly or chose correctly based on outdated statistics.
EXPLAIN shows you the planned execution without running the query. EXPLAIN ANALYZE actually executes the query and shows real timing data. Never optimize without this data—you’re flying blind otherwise.
-- Unoptimized query
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE customer_id = 12345
AND status = 'pending';
-- Output shows Seq Scan on orders (cost=0.00..18334.00 rows=1 width=180)
-- (actual time=234.567..234.568 rows=1 loops=1)
-- Execution Time: 235.123 ms
-- After adding index
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE customer_id = 12345
AND status = 'pending';
-- Output shows Index Scan using idx_orders_customer_status
-- (cost=0.42..8.44 rows=1 width=180)
-- (actual time=0.034..0.035 rows=1 loops=1)
-- Execution Time: 0.067 ms
The difference between 235ms and 0.067ms represents a 3,500x improvement. This is typical when adding appropriate indexes. Focus on the “actual time” values and row counts—these tell you where time is actually spent.
Index Strategies for Query Optimization
Indexes are your primary optimization tool, but poor index design wastes disk space and slows down writes. PostgreSQL defaults to B-tree indexes, which handle equality and range queries efficiently.
Multi-column index order matters enormously. PostgreSQL can use an index on (a, b, c) for queries filtering on “a”, “a and b”, or “a and b and c”, but not for “b” or “c” alone.
-- Common query pattern: filter by status, sort by date
SELECT id, customer_name, total
FROM orders
WHERE status = 'pending'
ORDER BY created_at DESC
LIMIT 20;
-- Optimal index: status first (filters rows), then created_at (provides sort order)
CREATE INDEX idx_orders_status_created ON orders(status, created_at DESC);
-- This index eliminates both the filter AND the sort operation
Partial indexes reduce size and improve performance for queries with consistent filter conditions:
-- If 95% of queries only care about active users
CREATE INDEX idx_active_users_email ON users(email)
WHERE status = 'active';
-- This index is much smaller and faster than indexing all users
SELECT * FROM users WHERE status = 'active' AND email = 'user@example.com';
For computed columns or function calls in WHERE clauses, use expression indexes:
-- Queries filtering on lowercase email
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
-- Now this query can use the index
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
Query Rewriting Techniques
How you write queries dramatically affects performance. PostgreSQL’s optimizer is sophisticated but not magical.
EXISTS typically outperforms IN for correlated subqueries because it can short-circuit after finding the first match:
-- Slower with large subquery results
SELECT * FROM customers
WHERE id IN (SELECT customer_id FROM orders WHERE total > 1000);
-- Faster: stops searching after first match
SELECT * FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.id AND o.total > 1000
);
Never use SELECT * in production code. Specify columns explicitly to reduce I/O and enable index-only scans:
-- Bad: reads entire row from disk
SELECT * FROM orders WHERE status = 'pending';
-- Good: can be satisfied by index alone
CREATE INDEX idx_orders_status_covering ON orders(status)
INCLUDE (id, created_at, total);
SELECT id, created_at, total FROM orders WHERE status = 'pending';
CTEs (WITH clauses) in PostgreSQL 12+ are optimized better than in earlier versions, but understand the difference. CTEs create optimization fences—the planner can’t push predicates into them:
-- CTE: materialized separately, then filtered
WITH pending_orders AS (
SELECT * FROM orders WHERE status = 'pending'
)
SELECT * FROM pending_orders WHERE customer_id = 123;
-- Subquery: planner can optimize the entire query together
SELECT * FROM (
SELECT * FROM orders WHERE status = 'pending'
) AS pending_orders
WHERE customer_id = 123;
The subquery version allows PostgreSQL to use an index on (status, customer_id) if available.
Statistics and Query Planner Configuration
PostgreSQL’s planner relies on table statistics to estimate costs. Outdated statistics lead to poor execution plans. Run ANALYZE regularly, especially after bulk data changes:
-- Update statistics for a specific table
ANALYZE orders;
-- Update statistics for entire database
ANALYZE;
-- Check when statistics were last updated
SELECT schemaname, tablename, last_analyze, last_autoanalyze
FROM pg_stat_user_tables
WHERE tablename = 'orders';
Key planner configuration parameters affect query performance:
-- Increase memory for sorts and hash operations (per operation)
SET work_mem = '256MB'; -- Default is often 4MB
-- Tell planner about your actual cache (set to 50-75% of RAM)
SET effective_cache_size = '8GB';
-- Adjust for SSD storage (lower = faster random access)
SET random_page_cost = 1.1; -- Default is 4.0 (for spinning disks)
These settings dramatically affect the planner’s cost calculations. On SSDs, lowering random_page_cost makes index scans more attractive versus sequential scans.
Identifying and Fixing Common Performance Anti-patterns
The N+1 query problem plagues ORMs and lazy developers. One query fetches parent records, then N queries fetch related children:
-- Anti-pattern: N+1 queries
SELECT * FROM customers LIMIT 100; -- 1 query
-- Then in application code: 100 queries like this
SELECT * FROM orders WHERE customer_id = ?;
-- Solution: single query with JOIN
SELECT c.*, json_agg(o.*) as orders
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id
LIMIT 100;
-- Or use array aggregation for simpler data
SELECT c.*, array_agg(o.id) as order_ids
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id
LIMIT 100;
Functions in WHERE clauses prevent index usage unless you have expression indexes:
-- Can't use index on created_at
SELECT * FROM orders WHERE DATE(created_at) = '2024-01-15';
-- Can use index on created_at
SELECT * FROM orders
WHERE created_at >= '2024-01-15'
AND created_at < '2024-01-16';
Monitoring and Maintenance Best Practices
Enable pg_stat_statements to track actual query performance in production:
-- Add to postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
-- Restart PostgreSQL, then create extension
CREATE EXTENSION pg_stat_statements;
-- Find slowest queries by total time
SELECT
query,
calls,
total_exec_time,
mean_exec_time,
max_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
-- Find queries with high variance
SELECT
query,
calls,
mean_exec_time,
stddev_exec_time
FROM pg_stat_statements
WHERE calls > 100
ORDER BY stddev_exec_time DESC
LIMIT 20;
Regular VACUUM operations prevent table bloat and maintain statistics:
-- Check for tables needing vacuum
SELECT
schemaname,
tablename,
n_dead_tup,
n_live_tup,
round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) as dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;
-- Manual vacuum if autovacuum isn't keeping up
VACUUM ANALYZE orders;
Set up monitoring for long-running queries:
-- Find queries running longer than 5 minutes
SELECT
pid,
now() - pg_stat_activity.query_start AS duration,
query,
state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '5 minutes'
AND state != 'idle';
PostgreSQL query optimization is iterative. Measure with EXPLAIN ANALYZE, implement targeted indexes, rewrite problematic queries, and monitor production performance with pg_stat_statements. Focus on the queries that actually matter—the ones consuming the most total time in your production workload. Optimizing rarely-run queries wastes effort that should go toward the critical 20% of queries consuming 80% of your database resources.