SQL - Query Performance Optimization Best Practices

Key Insights

Execution plans reveal the true cost of your queries—learn to read them before blindly adding indexes or rewriting SQL
SARGability (Search ARGument ABLE) is the single most important concept for WHERE clause performance; wrapping indexed columns in functions destroys query optimization
Offset-based pagination becomes exponentially slower with depth; keyset pagination maintains constant performance regardless of page number

Understanding Query Execution Plans

Every database optimization effort should start with execution plans. They tell you exactly what the database engine is doing—not what you think it’s doing.

In PostgreSQL, use EXPLAIN ANALYZE to get actual execution statistics:

EXPLAIN ANALYZE
SELECT u.name, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at > '2024-01-01'
  AND o.status = 'completed';

The output reveals critical information:

Nested Loop  (cost=0.85..1247.33 rows=156 width=48) (actual time=0.089..45.234 rows=1523 loops=1)
  ->  Seq Scan on orders o  (cost=0.00..1125.00 rows=156 width=16) (actual time=0.045..38.123 rows=1523 loops=1)
        Filter: ((created_at > '2024-01-01') AND (status = 'completed'))
        Rows Removed by Filter: 48477
  ->  Index Scan using users_pkey on users u  (cost=0.42..0.78 rows=1 width=36) (actual time=0.004..0.004 rows=1 loops=1523)
Planning Time: 0.234 ms
Execution Time: 45.567 ms

Red flags to watch for:

Seq Scan on large tables indicates missing indexes
Rows Removed by Filter showing high numbers means you’re scanning too much data
Nested Loop with high loop counts suggests join optimization opportunities
Large gaps between estimated rows and actual rows indicate stale statistics

Run ANALYZE tablename to update statistics when estimates are wildly off.

Indexing Strategies That Actually Work

Indexes aren’t magic. They’re data structures that trade write performance and storage for read performance. Choose wisely.

Selectivity matters. An index on a boolean column with 50/50 distribution is nearly useless. An index on email with unique values is highly effective.

Column order in composite indexes is critical. The index (status, created_at) serves different queries than (created_at, status).

-- This query benefits from (status, created_at) order
SELECT * FROM orders 
WHERE status = 'pending' 
  AND created_at > '2024-01-01';

-- Create the composite index with equality columns first
CREATE INDEX idx_orders_status_created 
ON orders (status, created_at);

-- This index also supports queries filtering only on status
SELECT * FROM orders WHERE status = 'pending';

-- But NOT queries filtering only on created_at
-- For that, you'd need a separate index or reversed column order

Over-indexing kills write performance. Every INSERT, UPDATE, and DELETE must maintain all indexes. I’ve seen tables with 15+ indexes where removing redundant ones improved write throughput by 40%.

Audit your indexes periodically:

-- PostgreSQL: Find unused indexes
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexname NOT LIKE '%_pkey'
ORDER BY pg_relation_size(indexrelid) DESC;

Writing Efficient WHERE Clauses

SARGability determines whether your WHERE clause can use indexes. Non-SARGable queries force full table scans.

The rule: Never apply functions to indexed columns.

-- BAD: Function on indexed column prevents index usage
SELECT * FROM orders 
WHERE YEAR(created_at) = 2024;

-- GOOD: Rewrite to keep column clean
SELECT * FROM orders 
WHERE created_at >= '2024-01-01' 
  AND created_at < '2025-01-01';

-- BAD: Implicit conversion destroys SARGability
SELECT * FROM users 
WHERE phone = 5551234567;  -- phone is VARCHAR

-- GOOD: Match the column type
SELECT * FROM users 
WHERE phone = '5551234567';

-- BAD: LIKE with leading wildcard
SELECT * FROM products 
WHERE name LIKE '%widget%';

-- GOOD: Full-text search for this pattern
SELECT * FROM products 
WHERE to_tsvector('english', name) @@ to_tsquery('widget');

IN vs EXISTS vs JOIN: Use EXISTS for correlated subqueries checking existence, IN for small static lists, and JOINs when you need data from both tables.

-- EXISTS is typically faster for "does related row exist?" checks
SELECT * FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.customer_id = c.id 
      AND o.total > 1000
);

-- Equivalent JOIN (often same execution plan with modern optimizers)
SELECT DISTINCT c.* FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE o.total > 1000;

JOIN Optimization Techniques

Join order matters less in modern optimizers—they’ll reorder joins for efficiency. But you can still help.

Filter early. Reduce row counts before joining:

-- SLOW: Joining full tables, then filtering
SELECT c.name, o.total
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.created_at > '2024-01-01'
  AND c.region = 'US';

-- FASTER: Filter in subqueries/CTEs first (optimizer often does this anyway)
WITH recent_orders AS (
    SELECT customer_id, total
    FROM orders
    WHERE created_at > '2024-01-01'
),
us_customers AS (
    SELECT id, name
    FROM customers
    WHERE region = 'US'
)
SELECT c.name, o.total
FROM us_customers c
JOIN recent_orders o ON c.id = o.customer_id;

Watch for implicit type conversions:

-- BAD: customer_id is INT, but compared to string
SELECT * FROM orders o
JOIN customers c ON o.customer_id = CAST(c.id AS VARCHAR);

-- GOOD: Compare like types
SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id;

Pagination and Large Result Sets

Offset pagination is intuitive but fundamentally broken for large datasets.

-- Page 1: Fast
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 0;

-- Page 5000: Slow - database must skip 100,000 rows
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 100000;

The database reads and discards 100,000 rows to return 20. This gets worse linearly with page depth.

Keyset pagination maintains constant performance:

-- First page
SELECT id, name, price 
FROM products 
ORDER BY id 
LIMIT 20;

-- Next page: use last seen ID instead of offset
SELECT id, name, price 
FROM products 
WHERE id > 12345  -- last ID from previous page
ORDER BY id 
LIMIT 20;

For multi-column sorting, include all sort columns in the WHERE clause:

-- Sorting by created_at DESC, id DESC
SELECT id, name, created_at 
FROM products 
WHERE (created_at, id) < ('2024-06-15 10:30:00', 98765)
ORDER BY created_at DESC, id DESC 
LIMIT 20;

Query Refactoring Patterns

CTEs vs Subqueries: CTEs improve readability. In PostgreSQL 12+, they’re optimized inline like subqueries. Use them freely for clarity.

-- Clear intent with CTEs
WITH monthly_totals AS (
    SELECT 
        customer_id,
        DATE_TRUNC('month', created_at) AS month,
        SUM(total) AS monthly_total
    FROM orders
    GROUP BY customer_id, DATE_TRUNC('month', created_at)
),
customer_averages AS (
    SELECT 
        customer_id,
        AVG(monthly_total) AS avg_monthly
    FROM monthly_totals
    GROUP BY customer_id
)
SELECT c.name, ca.avg_monthly
FROM customers c
JOIN customer_averages ca ON c.id = ca.customer_id
WHERE ca.avg_monthly > 1000;

Eliminate N+1 queries by fetching related data in bulk:

-- N+1 PATTERN (application code)
-- 1 query: SELECT * FROM orders WHERE status = 'pending'
-- N queries: SELECT * FROM customers WHERE id = ? (for each order)

-- FIXED: Single query with JOIN
SELECT o.*, c.name, c.email
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'pending';

-- Or batch the lookups
SELECT * FROM customers 
WHERE id IN (1, 2, 3, 4, 5);  -- IDs from first query

Stop using SELECT * in production code. Fetch only needed columns to reduce I/O and memory usage.

Monitoring and Continuous Improvement

Enable slow query logging. In PostgreSQL:

-- postgresql.conf
log_min_duration_statement = 1000  -- Log queries over 1 second

In MySQL:

SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;

Track these metrics for your critical queries:

Execution time (p50, p95, p99)
Rows examined vs rows returned (high ratio = inefficient)
Lock wait time
Query frequency

Build a review process: flag queries exceeding thresholds, review execution plans weekly, and optimize the top offenders. A single poorly-optimized query running thousands of times daily will hurt more than an occasional slow report.

Query optimization isn’t a one-time task. Data grows, access patterns change, and yesterday’s fast query becomes tomorrow’s bottleneck. Instrument, measure, and iterate.