SQL - Index Types (B-Tree, Hash, GIN, GiST)
B-Tree (Balanced Tree) indexes are PostgreSQL's default index type for good reason. They maintain sorted data in a tree structure where each node contains multiple keys, enabling efficient range...
Key Insights
- B-Tree indexes handle most use cases efficiently but perform poorly with pattern matching and array operations, while Hash indexes excel at exact equality comparisons with lower overhead
- GIN indexes are essential for full-text search and array containment queries, offering 3x faster lookups on JSONB data compared to B-Tree at the cost of slower writes and larger storage
- GiST indexes provide extensible framework for geometric data and nearest-neighbor searches, supporting custom operator classes that B-Tree cannot handle
B-Tree Indexes: The Default Workhorse
B-Tree (Balanced Tree) indexes are PostgreSQL’s default index type for good reason. They maintain sorted data in a tree structure where each node contains multiple keys, enabling efficient range queries, sorting, and equality comparisons.
-- Basic B-Tree index creation (default)
CREATE INDEX idx_users_email ON users(email);
-- Explicit B-Tree specification
CREATE INDEX idx_orders_created ON orders USING btree(created_at);
-- Composite B-Tree index
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);
B-Tree indexes shine with comparison operators: <, <=, =, >=, >, BETWEEN, IN, and IS NULL. They also support LIKE and ~ operators when the pattern is anchored at the start.
-- Queries that use B-Tree efficiently
SELECT * FROM users WHERE email = 'user@example.com';
SELECT * FROM orders WHERE created_at > '2024-01-01';
SELECT * FROM products WHERE price BETWEEN 10 AND 100;
-- This uses the index
SELECT * FROM users WHERE email LIKE 'admin%';
-- This does NOT use the index efficiently
SELECT * FROM users WHERE email LIKE '%admin%';
Performance characteristics matter. A B-Tree index on a 10-million-row table typically requires 3-4 disk reads for lookups due to its logarithmic height. For composite indexes, column order is critical—queries must use columns from left to right.
-- Given: CREATE INDEX idx_multi ON orders(user_id, status, created_at);
-- Uses index efficiently
SELECT * FROM orders WHERE user_id = 123;
SELECT * FROM orders WHERE user_id = 123 AND status = 'pending';
-- Cannot use index efficiently
SELECT * FROM orders WHERE status = 'pending';
SELECT * FROM orders WHERE created_at > '2024-01-01';
Hash Indexes: Optimized Equality
Hash indexes use a hash function to map keys to bucket locations, providing O(1) average-case lookup for equality comparisons. Before PostgreSQL 10, hash indexes weren’t WAL-logged and were considered unsafe. Modern versions have fixed these issues.
-- Hash index creation
CREATE INDEX idx_sessions_token ON sessions USING hash(session_token);
-- Hash index on UUID column
CREATE INDEX idx_devices_uuid ON devices USING hash(device_uuid);
Hash indexes only support the equality operator (=). They’re smaller than B-Tree indexes and faster for exact matches, but completely useless for range queries or sorting.
-- Efficient with hash index
SELECT * FROM sessions WHERE session_token = 'abc123def456';
-- Cannot use hash index
SELECT * FROM sessions WHERE session_token LIKE 'abc%';
SELECT * FROM sessions WHERE session_token > 'abc123';
Benchmark comparison on a table with 5 million UUIDs:
-- B-Tree index: 1.2 MB, 0.08ms lookup
CREATE INDEX idx_uuid_btree ON events USING btree(event_uuid);
-- Hash index: 0.9 MB, 0.05ms lookup
CREATE INDEX idx_uuid_hash ON events USING hash(event_uuid);
The 30-40% size reduction and faster lookups make hash indexes worthwhile for large tables with frequent equality lookups on high-cardinality columns.
GIN Indexes: Full-Text and Array Operations
Generalized Inverted Indexes (GIN) store a separate index entry for each component value. They excel at indexing composite values like arrays, JSONB documents, and full-text search vectors.
-- Array containment
CREATE INDEX idx_tags ON articles USING gin(tags);
SELECT * FROM articles WHERE tags @> ARRAY['postgresql', 'performance'];
SELECT * FROM articles WHERE tags && ARRAY['database', 'optimization'];
-- JSONB indexing
CREATE INDEX idx_metadata ON products USING gin(metadata);
SELECT * FROM products WHERE metadata @> '{"brand": "Apple"}';
SELECT * FROM products WHERE metadata ? 'warranty';
-- Full-text search
CREATE INDEX idx_content_fts ON documents USING gin(to_tsvector('english', content));
SELECT * FROM documents
WHERE to_tsvector('english', content) @@ to_tsquery('postgresql & performance');
GIN indexes are inverted: instead of mapping row to values, they map values to rows. For an array column, each unique array element gets an index entry pointing to all rows containing that element.
-- Performance comparison on 1M rows with array columns
-- Without GIN index: 2400ms
-- With GIN index: 8ms
CREATE TABLE products (
id serial PRIMARY KEY,
tags text[]
);
-- Insert test data
INSERT INTO products (tags)
SELECT ARRAY(
SELECT 'tag_' || (random() * 1000)::int
FROM generate_series(1, 5)
)
FROM generate_series(1, 1000000);
-- Create GIN index
CREATE INDEX idx_products_tags ON products USING gin(tags);
-- Query performance
EXPLAIN ANALYZE
SELECT * FROM products WHERE tags @> ARRAY['tag_42', 'tag_123'];
GIN indexes have higher write costs because inserting a row with 10 array elements requires 10 index updates. They also consume more space—typically 50-100% larger than B-Tree for the same data. Use gin_pending_list_limit to batch updates and reduce write overhead.
-- Optimize GIN for write-heavy workloads
ALTER INDEX idx_products_tags SET (fastupdate = on);
ALTER INDEX idx_products_tags SET (gin_pending_list_limit = 4096);
GiST Indexes: Geometric and Custom Types
Generalized Search Tree (GiST) indexes provide an extensible framework for indexing complex data types. Unlike GIN’s inverted structure, GiST maintains a balanced tree with lossy compression, making it ideal for geometric data, ranges, and nearest-neighbor searches.
-- Geometric data indexing
CREATE TABLE locations (
id serial PRIMARY KEY,
name text,
coordinates point
);
CREATE INDEX idx_locations_coords ON locations USING gist(coordinates);
-- Find nearest locations
SELECT name, coordinates <-> point '(40.7128, -74.0060)' AS distance
FROM locations
ORDER BY coordinates <-> point '(40.7128, -74.0060)'
LIMIT 10;
-- Range types
CREATE TABLE reservations (
id serial PRIMARY KEY,
room_id int,
during tsrange
);
CREATE INDEX idx_reservations_during ON reservations USING gist(during);
-- Find overlapping reservations
SELECT * FROM reservations
WHERE during && tsrange('2024-01-15', '2024-01-20');
GiST supports exclusion constraints, enabling database-level enforcement of business rules that B-Tree cannot handle:
-- Prevent overlapping reservations
CREATE TABLE bookings (
id serial PRIMARY KEY,
room_id int,
during tsrange,
EXCLUDE USING gist (room_id WITH =, during WITH &&)
);
-- This succeeds
INSERT INTO bookings VALUES (1, 101, '[2024-01-01, 2024-01-05)');
-- This fails with exclusion constraint violation
INSERT INTO bookings VALUES (2, 101, '[2024-01-03, 2024-01-07)');
For full-text search, GiST offers a space-efficient alternative to GIN:
-- GIN: Faster queries, larger size, slower updates
CREATE INDEX idx_docs_gin ON documents USING gin(to_tsvector('english', content));
-- GiST: Slower queries (3-5x), smaller size (50% of GIN), faster updates
CREATE INDEX idx_docs_gist ON documents USING gist(to_tsvector('english', content));
GiST indexes are lossy—they may return false positives that require rechecking against the actual table data. This trade-off enables smaller indexes and faster updates at the cost of slightly slower queries.
Choosing the Right Index Type
Selection criteria based on query patterns:
-- B-Tree: Default choice for most scenarios
CREATE INDEX idx_default ON table(column); -- Sorting, ranges, equality
-- Hash: High-cardinality columns with only equality checks
CREATE INDEX idx_tokens ON sessions USING hash(token); -- UUIDs, tokens
-- GIN: Arrays, JSONB, full-text (query-heavy workloads)
CREATE INDEX idx_jsonb ON data USING gin(payload); -- Complex containment
-- GiST: Geometry, ranges, nearest-neighbor, full-text (write-heavy)
CREATE INDEX idx_geo ON places USING gist(location); -- Spatial queries
Monitor index usage to identify unused indexes consuming resources:
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
Index maintenance matters. B-Tree and Hash indexes require occasional REINDEX after heavy updates. GIN and GiST indexes benefit from VACUUM to consolidate pending updates and remove dead tuples.