SQL - Query Optimization Tips
• Query performance depends on index usage, execution plan analysis, and understanding how the database engine processes your SQL statements
Key Insights
• Query performance depends on index usage, execution plan analysis, and understanding how the database engine processes your SQL statements • The biggest performance gains come from reducing data scanning through proper indexing, filtering early in the query pipeline, and avoiding implicit conversions • Modern databases provide powerful tools like execution plans, query stores, and statistics that reveal exactly where optimization efforts should focus
Understanding Execution Plans
Execution plans show how the database engine processes your query. Before optimizing anything, examine the actual execution plan to identify bottlenecks.
-- SQL Server
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
SELECT o.OrderID, c.CustomerName, o.OrderDate
FROM Orders o
INNER JOIN Customers c ON o.CustomerID = c.CustomerID
WHERE o.OrderDate >= '2024-01-01';
-- PostgreSQL
EXPLAIN ANALYZE
SELECT o.order_id, c.customer_name, o.order_date
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date >= '2024-01-01';
Look for table scans, key lookups, and high-cost operations. A table scan on a million-row table signals missing indexes. Key lookups indicate the index doesn’t cover all required columns.
Index Strategy
Indexes accelerate reads but slow writes. Create indexes on columns used in WHERE clauses, JOIN conditions, and ORDER BY statements.
-- Poor: No index on frequently filtered column
SELECT * FROM Orders WHERE OrderDate >= '2024-01-01';
-- Create appropriate index
CREATE INDEX IX_Orders_OrderDate ON Orders(OrderDate);
-- Better: Covering index includes all selected columns
CREATE INDEX IX_Orders_OrderDate_Covering
ON Orders(OrderDate)
INCLUDE (OrderID, CustomerID, TotalAmount);
Composite indexes matter for multi-column filters. Column order affects performance:
-- Query filters on Status then Date
SELECT OrderID, CustomerID
FROM Orders
WHERE Status = 'Pending'
AND OrderDate >= '2024-01-01';
-- Correct index order: most selective column first
CREATE INDEX IX_Orders_Status_Date ON Orders(Status, OrderDate);
-- Wrong order reduces effectiveness
CREATE INDEX IX_Orders_Date_Status ON Orders(OrderDate, Status);
The Status column appears first because it’s more selective—fewer rows match ‘Pending’ than a date range.
Avoid SELECT *
Retrieving unnecessary columns wastes I/O and memory. Specify only required columns.
-- Bad: Returns all 50 columns
SELECT * FROM Products WHERE CategoryID = 5;
-- Good: Returns only needed data
SELECT ProductID, ProductName, Price
FROM Products
WHERE CategoryID = 5;
This becomes critical with covering indexes. The database can satisfy the query entirely from the index without touching the table:
CREATE INDEX IX_Products_Category_Covering
ON Products(CategoryID)
INCLUDE (ProductID, ProductName, Price);
-- This query uses only the index
SELECT ProductID, ProductName, Price
FROM Products
WHERE CategoryID = 5;
Filter Early and Effectively
Apply WHERE conditions before JOINs when possible. Use EXISTS instead of IN for subqueries against large datasets.
-- Inefficient: IN with large subquery
SELECT CustomerID, CustomerName
FROM Customers
WHERE CustomerID IN (
SELECT CustomerID FROM Orders WHERE OrderDate >= '2024-01-01'
);
-- Better: EXISTS stops at first match
SELECT CustomerID, CustomerName
FROM Customers c
WHERE EXISTS (
SELECT 1 FROM Orders o
WHERE o.CustomerID = c.CustomerID
AND o.OrderDate >= '2024-01-01'
);
-- Best: JOIN with appropriate indexes
SELECT DISTINCT c.CustomerID, c.CustomerName
FROM Customers c
INNER JOIN Orders o ON c.CustomerID = o.CustomerID
WHERE o.OrderDate >= '2024-01-01';
Avoid Implicit Conversions
Type mismatches force the database to convert data, preventing index usage.
-- Bad: CustomerID is INT, but comparing to string
SELECT * FROM Orders WHERE CustomerID = '12345';
-- Good: Correct data type
SELECT * FROM Orders WHERE CustomerID = 12345;
-- Bad: Function on indexed column prevents index seek
SELECT * FROM Orders WHERE YEAR(OrderDate) = 2024;
-- Good: Rewrite to use index
SELECT * FROM Orders
WHERE OrderDate >= '2024-01-01' AND OrderDate < '2025-01-01';
The execution plan shows the difference—the first query scans the entire table while the second performs an index seek.
Optimize JOINs
JOIN order and type affect performance. Start with the smallest result set.
-- Inefficient: Large table first
SELECT o.OrderID, c.CustomerName, p.ProductName
FROM Orders o
INNER JOIN OrderDetails od ON o.OrderID = od.OrderID
INNER JOIN Products p ON od.ProductID = p.ProductID
INNER JOIN Customers c ON o.CustomerID = c.CustomerID
WHERE o.OrderDate >= '2024-01-01';
-- Better: Filter first to reduce dataset
SELECT o.OrderID, c.CustomerName, p.ProductName
FROM Orders o
INNER JOIN Customers c ON o.CustomerID = c.CustomerID
INNER JOIN OrderDetails od ON o.OrderID = od.OrderID
INNER JOIN Products p ON od.ProductID = p.ProductID
WHERE o.OrderDate >= '2024-01-01';
Use appropriate JOIN types. LEFT JOIN is slower than INNER JOIN when you don’t need unmatched rows.
Pagination Done Right
OFFSET/FETCH becomes slower as you page deeper into results.
-- Slow for large offsets
SELECT OrderID, OrderDate, TotalAmount
FROM Orders
ORDER BY OrderDate DESC
OFFSET 100000 ROWS FETCH NEXT 20 ROWS ONLY;
-- Better: Keyset pagination
SELECT OrderID, OrderDate, TotalAmount
FROM Orders
WHERE OrderDate < @LastOrderDate
OR (OrderDate = @LastOrderDate AND OrderID < @LastOrderID)
ORDER BY OrderDate DESC, OrderID DESC
FETCH NEXT 20 ROWS ONLY;
Keyset pagination uses indexed columns to seek directly to the position, avoiding the need to scan and skip rows.
Batch Operations
Process data in batches instead of row-by-row operations.
-- Terrible: Row-by-row updates
DECLARE @OrderID INT;
DECLARE order_cursor CURSOR FOR
SELECT OrderID FROM Orders WHERE Status = 'Pending';
OPEN order_cursor;
FETCH NEXT FROM order_cursor INTO @OrderID;
WHILE @@FETCH_STATUS = 0
BEGIN
UPDATE Orders SET Status = 'Processing' WHERE OrderID = @OrderID;
FETCH NEXT FROM order_cursor INTO @OrderID;
END;
CLOSE order_cursor;
DEALLOCATE order_cursor;
-- Excellent: Set-based operation
UPDATE Orders
SET Status = 'Processing'
WHERE Status = 'Pending';
Set-based operations are orders of magnitude faster than cursors.
Use CTEs and Temp Tables Strategically
Common Table Expressions improve readability but don’t always optimize performance. For complex queries or reused result sets, temp tables can be faster.
-- CTE executed multiple times
WITH RecentOrders AS (
SELECT CustomerID, COUNT(*) as OrderCount
FROM Orders
WHERE OrderDate >= '2024-01-01'
GROUP BY CustomerID
)
SELECT c.CustomerName, ro.OrderCount
FROM Customers c
INNER JOIN RecentOrders ro ON c.CustomerID = ro.CustomerID
WHERE ro.OrderCount > 5;
-- Temp table with index for better performance
SELECT CustomerID, COUNT(*) as OrderCount
INTO #RecentOrders
FROM Orders
WHERE OrderDate >= '2024-01-01'
GROUP BY CustomerID;
CREATE INDEX IX_Temp ON #RecentOrders(CustomerID);
SELECT c.CustomerName, ro.OrderCount
FROM Customers c
INNER JOIN #RecentOrders ro ON c.CustomerID = ro.CustomerID
WHERE ro.OrderCount > 5;
DROP TABLE #RecentOrders;
Update Statistics
Outdated statistics cause poor execution plans. Keep statistics current on high-traffic tables.
-- SQL Server
UPDATE STATISTICS Orders WITH FULLSCAN;
-- PostgreSQL
ANALYZE orders;
-- Check when statistics were last updated
SELECT
OBJECT_NAME(object_id) AS TableName,
name AS IndexName,
STATS_DATE(object_id, index_id) AS LastUpdated
FROM sys.indexes
WHERE OBJECT_NAME(object_id) = 'Orders';
Configure automatic statistics updates, but manually update after bulk data changes.
Query Hints as Last Resort
Query hints override the optimizer. Use them only when you’ve proven the optimizer chooses poorly.
-- Force index usage when optimizer misses it
SELECT OrderID, CustomerID, OrderDate
FROM Orders WITH (INDEX(IX_Orders_OrderDate))
WHERE OrderDate >= '2024-01-01';
-- Force join order
SELECT o.OrderID, c.CustomerName
FROM Orders o
INNER LOOP JOIN Customers c ON o.CustomerID = c.CustomerID
WHERE o.OrderDate >= '2024-01-01';
Document why you added hints. The optimizer improves with database updates, so hints that help today might hurt tomorrow.