SQL - Index Types (B-Tree, Hash, GIN, GiST)

B-Tree (Balanced Tree) indexes are PostgreSQL's default index type for good reason. They maintain sorted data in a tree structure where each node contains multiple keys, enabling efficient range...

Key Insights

  • B-Tree indexes handle most use cases efficiently but perform poorly with pattern matching and array operations, while Hash indexes excel at exact equality comparisons with lower overhead
  • GIN indexes are essential for full-text search and array containment queries, offering 3x faster lookups on JSONB data compared to B-Tree at the cost of slower writes and larger storage
  • GiST indexes provide extensible framework for geometric data and nearest-neighbor searches, supporting custom operator classes that B-Tree cannot handle

B-Tree Indexes: The Default Workhorse

B-Tree (Balanced Tree) indexes are PostgreSQL’s default index type for good reason. They maintain sorted data in a tree structure where each node contains multiple keys, enabling efficient range queries, sorting, and equality comparisons.

-- Basic B-Tree index creation (default)
CREATE INDEX idx_users_email ON users(email);

-- Explicit B-Tree specification
CREATE INDEX idx_orders_created ON orders USING btree(created_at);

-- Composite B-Tree index
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);

B-Tree indexes shine with comparison operators: <, <=, =, >=, >, BETWEEN, IN, and IS NULL. They also support LIKE and ~ operators when the pattern is anchored at the start.

-- Queries that use B-Tree efficiently
SELECT * FROM users WHERE email = 'user@example.com';
SELECT * FROM orders WHERE created_at > '2024-01-01';
SELECT * FROM products WHERE price BETWEEN 10 AND 100;

-- This uses the index
SELECT * FROM users WHERE email LIKE 'admin%';

-- This does NOT use the index efficiently
SELECT * FROM users WHERE email LIKE '%admin%';

Performance characteristics matter. A B-Tree index on a 10-million-row table typically requires 3-4 disk reads for lookups due to its logarithmic height. For composite indexes, column order is critical—queries must use columns from left to right.

-- Given: CREATE INDEX idx_multi ON orders(user_id, status, created_at);

-- Uses index efficiently
SELECT * FROM orders WHERE user_id = 123;
SELECT * FROM orders WHERE user_id = 123 AND status = 'pending';

-- Cannot use index efficiently
SELECT * FROM orders WHERE status = 'pending';
SELECT * FROM orders WHERE created_at > '2024-01-01';

Hash Indexes: Optimized Equality

Hash indexes use a hash function to map keys to bucket locations, providing O(1) average-case lookup for equality comparisons. Before PostgreSQL 10, hash indexes weren’t WAL-logged and were considered unsafe. Modern versions have fixed these issues.

-- Hash index creation
CREATE INDEX idx_sessions_token ON sessions USING hash(session_token);

-- Hash index on UUID column
CREATE INDEX idx_devices_uuid ON devices USING hash(device_uuid);

Hash indexes only support the equality operator (=). They’re smaller than B-Tree indexes and faster for exact matches, but completely useless for range queries or sorting.

-- Efficient with hash index
SELECT * FROM sessions WHERE session_token = 'abc123def456';

-- Cannot use hash index
SELECT * FROM sessions WHERE session_token LIKE 'abc%';
SELECT * FROM sessions WHERE session_token > 'abc123';

Benchmark comparison on a table with 5 million UUIDs:

-- B-Tree index: 1.2 MB, 0.08ms lookup
CREATE INDEX idx_uuid_btree ON events USING btree(event_uuid);

-- Hash index: 0.9 MB, 0.05ms lookup
CREATE INDEX idx_uuid_hash ON events USING hash(event_uuid);

The 30-40% size reduction and faster lookups make hash indexes worthwhile for large tables with frequent equality lookups on high-cardinality columns.

GIN Indexes: Full-Text and Array Operations

Generalized Inverted Indexes (GIN) store a separate index entry for each component value. They excel at indexing composite values like arrays, JSONB documents, and full-text search vectors.

-- Array containment
CREATE INDEX idx_tags ON articles USING gin(tags);

SELECT * FROM articles WHERE tags @> ARRAY['postgresql', 'performance'];
SELECT * FROM articles WHERE tags && ARRAY['database', 'optimization'];

-- JSONB indexing
CREATE INDEX idx_metadata ON products USING gin(metadata);

SELECT * FROM products WHERE metadata @> '{"brand": "Apple"}';
SELECT * FROM products WHERE metadata ? 'warranty';

-- Full-text search
CREATE INDEX idx_content_fts ON documents USING gin(to_tsvector('english', content));

SELECT * FROM documents 
WHERE to_tsvector('english', content) @@ to_tsquery('postgresql & performance');

GIN indexes are inverted: instead of mapping row to values, they map values to rows. For an array column, each unique array element gets an index entry pointing to all rows containing that element.

-- Performance comparison on 1M rows with array columns
-- Without GIN index: 2400ms
-- With GIN index: 8ms

CREATE TABLE products (
    id serial PRIMARY KEY,
    tags text[]
);

-- Insert test data
INSERT INTO products (tags)
SELECT ARRAY(
    SELECT 'tag_' || (random() * 1000)::int
    FROM generate_series(1, 5)
)
FROM generate_series(1, 1000000);

-- Create GIN index
CREATE INDEX idx_products_tags ON products USING gin(tags);

-- Query performance
EXPLAIN ANALYZE
SELECT * FROM products WHERE tags @> ARRAY['tag_42', 'tag_123'];

GIN indexes have higher write costs because inserting a row with 10 array elements requires 10 index updates. They also consume more space—typically 50-100% larger than B-Tree for the same data. Use gin_pending_list_limit to batch updates and reduce write overhead.

-- Optimize GIN for write-heavy workloads
ALTER INDEX idx_products_tags SET (fastupdate = on);
ALTER INDEX idx_products_tags SET (gin_pending_list_limit = 4096);

GiST Indexes: Geometric and Custom Types

Generalized Search Tree (GiST) indexes provide an extensible framework for indexing complex data types. Unlike GIN’s inverted structure, GiST maintains a balanced tree with lossy compression, making it ideal for geometric data, ranges, and nearest-neighbor searches.

-- Geometric data indexing
CREATE TABLE locations (
    id serial PRIMARY KEY,
    name text,
    coordinates point
);

CREATE INDEX idx_locations_coords ON locations USING gist(coordinates);

-- Find nearest locations
SELECT name, coordinates <-> point '(40.7128, -74.0060)' AS distance
FROM locations
ORDER BY coordinates <-> point '(40.7128, -74.0060)'
LIMIT 10;

-- Range types
CREATE TABLE reservations (
    id serial PRIMARY KEY,
    room_id int,
    during tsrange
);

CREATE INDEX idx_reservations_during ON reservations USING gist(during);

-- Find overlapping reservations
SELECT * FROM reservations 
WHERE during && tsrange('2024-01-15', '2024-01-20');

GiST supports exclusion constraints, enabling database-level enforcement of business rules that B-Tree cannot handle:

-- Prevent overlapping reservations
CREATE TABLE bookings (
    id serial PRIMARY KEY,
    room_id int,
    during tsrange,
    EXCLUDE USING gist (room_id WITH =, during WITH &&)
);

-- This succeeds
INSERT INTO bookings VALUES (1, 101, '[2024-01-01, 2024-01-05)');

-- This fails with exclusion constraint violation
INSERT INTO bookings VALUES (2, 101, '[2024-01-03, 2024-01-07)');

For full-text search, GiST offers a space-efficient alternative to GIN:

-- GIN: Faster queries, larger size, slower updates
CREATE INDEX idx_docs_gin ON documents USING gin(to_tsvector('english', content));

-- GiST: Slower queries (3-5x), smaller size (50% of GIN), faster updates
CREATE INDEX idx_docs_gist ON documents USING gist(to_tsvector('english', content));

GiST indexes are lossy—they may return false positives that require rechecking against the actual table data. This trade-off enables smaller indexes and faster updates at the cost of slightly slower queries.

Choosing the Right Index Type

Selection criteria based on query patterns:

-- B-Tree: Default choice for most scenarios
CREATE INDEX idx_default ON table(column);  -- Sorting, ranges, equality

-- Hash: High-cardinality columns with only equality checks
CREATE INDEX idx_tokens ON sessions USING hash(token);  -- UUIDs, tokens

-- GIN: Arrays, JSONB, full-text (query-heavy workloads)
CREATE INDEX idx_jsonb ON data USING gin(payload);  -- Complex containment

-- GiST: Geometry, ranges, nearest-neighbor, full-text (write-heavy)
CREATE INDEX idx_geo ON places USING gist(location);  -- Spatial queries

Monitor index usage to identify unused indexes consuming resources:

SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

Index maintenance matters. B-Tree and Hash indexes require occasional REINDEX after heavy updates. GIN and GiST indexes benefit from VACUUM to consolidate pending updates and remove dead tuples.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.