SQL - CREATE INDEX and DROP INDEX | Application Architect

Key Insights

Indexes dramatically improve query performance by creating auxiliary data structures that allow the database to locate rows without scanning entire tables, but they come with storage overhead and slower write operations
Strategic index placement on foreign keys, WHERE clause columns, and JOIN conditions can reduce query execution time from seconds to milliseconds, while over-indexing degrades INSERT/UPDATE/DELETE performance
Understanding index types (B-tree, hash, bitmap, full-text) and their use cases prevents common mistakes like indexing low-cardinality columns or creating redundant composite indexes

Understanding Index Fundamentals

Indexes function as lookup tables that map column values to physical row locations. Without an index, the database performs a full table scan, examining every row sequentially. With a proper index, the database navigates directly to relevant rows using a tree or hash structure.

Consider a table with 10 million customer records. Finding a customer by email without an index requires scanning all 10 million rows. With an index on the email column, the database locates the record in logarithmic time—typically under 20 comparisons.

-- Create a basic single-column index
CREATE INDEX idx_customers_email ON customers(email);

-- Create a unique index (enforces uniqueness constraint)
CREATE UNIQUE INDEX idx_customers_username ON customers(username);

-- Create a composite index (multiple columns)
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);

The composite index above optimizes queries filtering by customer_id alone or by both customer_id and order_date, but not queries filtering only by order_date due to left-prefix matching rules.

Index Types and Selection Criteria

Different database systems support various index types, each optimized for specific access patterns.

B-tree indexes (default in most systems) handle equality and range queries efficiently. Use them for columns in WHERE clauses, JOIN conditions, and ORDER BY statements.

-- B-tree index for range queries
CREATE INDEX idx_products_price ON products(price);

-- Query that benefits from this index
SELECT * FROM products 
WHERE price BETWEEN 100 AND 500
ORDER BY price;

Hash indexes provide O(1) lookup for equality comparisons but cannot optimize range queries or sorting operations.

-- Hash index (PostgreSQL syntax)
CREATE INDEX idx_sessions_token ON sessions USING HASH(session_token);

-- Optimized query
SELECT * FROM sessions WHERE session_token = 'abc123xyz';

-- NOT optimized (range query)
SELECT * FROM sessions WHERE session_token > 'abc123';

Partial indexes reduce storage by indexing only rows meeting specific criteria.

-- Index only active users
CREATE INDEX idx_active_users ON users(last_login)
WHERE status = 'active';

-- Index only pending orders
CREATE INDEX idx_pending_orders ON orders(created_at)
WHERE status = 'pending';

This approach significantly reduces index size when querying specific subsets repeatedly.

Composite Index Strategy

Column order in composite indexes critically affects performance. Place the most selective column first and arrange columns by query frequency.

-- Poor composite index
CREATE INDEX idx_logs_bad ON application_logs(severity, timestamp, user_id);

-- Better composite index
CREATE INDEX idx_logs_good ON application_logs(user_id, timestamp, severity);

If queries frequently filter by user_id first, then by timestamp ranges, the second index structure aligns with access patterns. The database can use this index for:

WHERE user_id = 123
WHERE user_id = 123 AND timestamp > '2024-01-01'
WHERE user_id = 123 AND timestamp > '2024-01-01' AND severity = 'ERROR'

But not efficiently for WHERE severity = 'ERROR' alone.

-- Verify index usage with EXPLAIN
EXPLAIN SELECT * FROM application_logs
WHERE user_id = 123 
  AND timestamp BETWEEN '2024-01-01' AND '2024-01-31';

Creating Indexes Concurrently

Standard index creation locks the table, blocking writes during construction. For production databases, use concurrent index creation.

-- PostgreSQL concurrent index creation
CREATE INDEX CONCURRENTLY idx_transactions_date 
ON transactions(transaction_date);

-- MySQL online DDL (5.6+)
CREATE INDEX idx_transactions_date 
ON transactions(transaction_date) 
ALGORITHM=INPLACE, LOCK=NONE;

Concurrent creation takes longer but allows normal table operations to continue. Monitor progress in production environments:

-- PostgreSQL: Check index creation progress
SELECT phase, blocks_done, blocks_total,
       round(100.0 * blocks_done / blocks_total, 2) AS percent_complete
FROM pg_stat_progress_create_index;

Covering Indexes for Query Optimization

Covering indexes include all columns needed by a query, eliminating table lookups entirely.

-- Non-covering index
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- Query requires table lookup for order_total
SELECT customer_id, order_date, order_total
FROM orders
WHERE customer_id = 100;

-- Covering index includes all required columns
CREATE INDEX idx_orders_covering 
ON orders(customer_id, order_date, order_total);

The covering index satisfies the entire query from index data alone. Database execution plans show “Index Only Scan” instead of “Index Scan” followed by table access.

-- PostgreSQL covering index with INCLUDE clause
CREATE INDEX idx_orders_include 
ON orders(customer_id) 
INCLUDE (order_date, order_total);

The INCLUDE clause adds columns to the index leaf nodes without including them in the B-tree structure, optimizing storage while maintaining covering benefits.

Dropping Indexes

Remove unused or redundant indexes to improve write performance and reduce storage consumption.

-- Drop a single index
DROP INDEX idx_customers_old_email;

-- PostgreSQL concurrent drop (non-blocking)
DROP INDEX CONCURRENTLY idx_customers_old_email;

-- Drop with IF EXISTS clause
DROP INDEX IF EXISTS idx_might_not_exist;

Identify unused indexes before dropping:

-- PostgreSQL: Find unused indexes
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE 'pg_toast%'
ORDER BY pg_relation_size(indexrelid) DESC;

-- MySQL: Check index usage (requires sys schema)
SELECT object_schema, object_name, index_name
FROM sys.schema_unused_indexes;

Index Maintenance and Monitoring

Indexes require periodic maintenance as data changes fragment the index structure.

-- PostgreSQL: Rebuild fragmented index
REINDEX INDEX idx_orders_customer_date;

-- Rebuild all indexes on a table
REINDEX TABLE orders;

-- MySQL: Optimize table (rebuilds indexes)
OPTIMIZE TABLE orders;

Monitor index bloat and effectiveness:

-- PostgreSQL: Check index size and bloat
SELECT schemaname, tablename, indexname,
       pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
       idx_scan AS scans,
       idx_tup_read AS tuples_read,
       idx_tup_fetch AS tuples_fetched
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 20;

Common Pitfalls and Best Practices

Avoid indexing low-cardinality columns like boolean flags or status fields with few distinct values. A table with 1 million rows where 95% have status = 'active' gains minimal benefit from indexing status.

-- Poor index choice (low cardinality)
CREATE INDEX idx_users_status ON users(status); -- Only 3 possible values

-- Better approach: partial index for minority case
CREATE INDEX idx_inactive_users ON users(email)
WHERE status = 'inactive';

Don’t create redundant indexes. If you have an index on (customer_id, order_date), a separate index on just customer_id is redundant.

-- Redundant indexes
CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);

-- Keep only the composite index
DROP INDEX idx_orders_customer;

Test index effectiveness with realistic data volumes. An index performing well with 1,000 rows may behave differently with 10 million rows. Use EXPLAIN ANALYZE to measure actual execution times, not just theoretical plans.