B+ Tree: Database Index Structure Implementation
Every time you run a SQL query with a WHERE clause, you're almost certainly traversing a B+ tree. This data structure has dominated database indexing for decades, and understanding its implementation...
Key Insights
- B+ trees store data pointers only in leaf nodes and link leaves together, making range queries dramatically faster than B trees or binary search trees
- Node splitting during insertion and merging during deletion maintain perfect balance automatically, guaranteeing O(log n) operations regardless of insertion order
- Choosing a branching factor that matches your disk page size (typically 4KB or 8KB) minimizes I/O operations and maximizes cache efficiency
Introduction to B+ Trees
Every time you run a SQL query with a WHERE clause, you’re almost certainly traversing a B+ tree. This data structure has dominated database indexing for decades, and understanding its implementation reveals why.
Binary search trees fail for databases because they create deep, narrow structures that require many disk reads. A tree with a million records might be 20 levels deep, meaning 20 separate disk I/O operations per lookup. B trees improve this by packing many keys into each node, but they scatter data pointers throughout the tree.
B+ trees solve both problems. They use high branching factors to create shallow trees (typically 3-4 levels for millions of records), and they concentrate all data in linked leaf nodes. This design optimizes for the reality of storage systems: sequential reads are cheap, random reads are expensive.
B+ Tree Structure and Properties
A B+ tree of order m has these properties:
- Each internal node holds at most
m-1keys andmchild pointers - Each internal node (except root) has at least
⌈m/2⌉children - All leaves appear at the same level
- Leaf nodes contain keys and data pointers, linked in sorted order
- Internal nodes contain only keys for navigation
The order m is typically chosen so that a node fills exactly one disk page. With 4KB pages and 8-byte keys plus 8-byte pointers, you might fit 250+ keys per node.
Here’s the node structure:
from typing import List, Optional, Any, Tuple
class BPlusTreeNode:
def __init__(self, order: int, is_leaf: bool = False):
self.order = order
self.is_leaf = is_leaf
self.keys: List[int] = []
self.parent: Optional[BPlusTreeNode] = None
@property
def is_full(self) -> bool:
return len(self.keys) >= self.order - 1
@property
def is_underflow(self) -> bool:
min_keys = (self.order - 1) // 2
return len(self.keys) < min_keys
class InternalNode(BPlusTreeNode):
def __init__(self, order: int):
super().__init__(order, is_leaf=False)
self.children: List[BPlusTreeNode] = []
def get_child_index(self, key: int) -> int:
"""Find which child pointer to follow for a given key."""
for i, k in enumerate(self.keys):
if key < k:
return i
return len(self.keys)
class LeafNode(BPlusTreeNode):
def __init__(self, order: int):
super().__init__(order, is_leaf=True)
self.values: List[Any] = [] # Data pointers or actual values
self.next_leaf: Optional[LeafNode] = None
self.prev_leaf: Optional[LeafNode] = None
The separation between internal and leaf nodes is fundamental. Internal nodes are purely navigational—they tell you which subtree to explore. Leaf nodes hold the actual payload and form a doubly-linked list for efficient sequential access.
Search Operations
Searching a B+ tree always ends at a leaf node. Starting from the root, you compare the search key against node keys to determine which child to visit. This continues until you reach a leaf, where you perform a final search for the key.
class BPlusTree:
def __init__(self, order: int = 4):
self.order = order
self.root: BPlusTreeNode = LeafNode(order)
def search(self, key: int) -> Optional[Any]:
"""Find value associated with key, or None if not found."""
leaf = self._find_leaf(key)
for i, k in enumerate(leaf.keys):
if k == key:
return leaf.values[i]
return None
def _find_leaf(self, key: int) -> LeafNode:
"""Navigate from root to the leaf that should contain key."""
node = self.root
while not node.is_leaf:
internal = node # Type: InternalNode
child_idx = internal.get_child_index(key)
node = internal.children[child_idx]
return node
def search_iterative(self, key: int) -> Optional[Any]:
"""Iterative version - often preferred for deep trees."""
node = self.root
# Descend to leaf level
while isinstance(node, InternalNode):
idx = 0
while idx < len(node.keys) and key >= node.keys[idx]:
idx += 1
node = node.children[idx]
# Binary search within leaf
leaf = node
left, right = 0, len(leaf.keys) - 1
while left <= right:
mid = (left + right) // 2
if leaf.keys[mid] == key:
return leaf.values[mid]
elif leaf.keys[mid] < key:
left = mid + 1
else:
right = mid - 1
return None
With a branching factor of 256 and a million records, the tree is only 3 levels deep. Three disk reads to find any record—that’s the power of B+ trees.
Insertion and Node Splitting
Insertion starts by finding the correct leaf, then inserting the key-value pair in sorted order. When a node overflows (exceeds m-1 keys), it splits into two nodes, and the middle key propagates up to the parent. This might trigger cascading splits up to the root.
def insert(self, key: int, value: Any) -> None:
"""Insert a key-value pair into the tree."""
leaf = self._find_leaf(key)
# Insert into leaf in sorted position
insert_idx = 0
while insert_idx < len(leaf.keys) and leaf.keys[insert_idx] < key:
insert_idx += 1
leaf.keys.insert(insert_idx, key)
leaf.values.insert(insert_idx, value)
# Handle overflow
if leaf.is_full:
self._split_leaf(leaf)
def _split_leaf(self, leaf: LeafNode) -> None:
"""Split an overflowing leaf node."""
mid = len(leaf.keys) // 2
# Create new leaf with upper half
new_leaf = LeafNode(self.order)
new_leaf.keys = leaf.keys[mid:]
new_leaf.values = leaf.values[mid:]
# Truncate original leaf
leaf.keys = leaf.keys[:mid]
leaf.values = leaf.values[:mid]
# Update leaf chain
new_leaf.next_leaf = leaf.next_leaf
new_leaf.prev_leaf = leaf
if leaf.next_leaf:
leaf.next_leaf.prev_leaf = new_leaf
leaf.next_leaf = new_leaf
# Propagate split key to parent
split_key = new_leaf.keys[0]
self._insert_into_parent(leaf, split_key, new_leaf)
def _insert_into_parent(self, left: BPlusTreeNode, key: int,
right: BPlusTreeNode) -> None:
"""Insert a key and child pointer into the parent node."""
if left.parent is None:
# Create new root
new_root = InternalNode(self.order)
new_root.keys = [key]
new_root.children = [left, right]
left.parent = new_root
right.parent = new_root
self.root = new_root
return
parent = left.parent
# Find position and insert
idx = parent.children.index(left)
parent.keys.insert(idx, key)
parent.children.insert(idx + 1, right)
right.parent = parent
# Handle parent overflow
if parent.is_full:
self._split_internal(parent)
def _split_internal(self, node: InternalNode) -> None:
"""Split an overflowing internal node."""
mid = len(node.keys) // 2
promote_key = node.keys[mid]
# Create new internal node with upper half
new_node = InternalNode(self.order)
new_node.keys = node.keys[mid + 1:]
new_node.children = node.children[mid + 1:]
# Update children's parent pointers
for child in new_node.children:
child.parent = new_node
# Truncate original node
node.keys = node.keys[:mid]
node.children = node.children[:mid + 1]
# Propagate upward
self._insert_into_parent(node, promote_key, new_node)
Notice that leaf splits copy the middle key up (it remains in the leaf), while internal splits push the middle key up (it’s removed from the original node). This distinction ensures leaves contain all keys while internal nodes remain purely navigational.
Deletion and Rebalancing
Deletion is more complex. After removing a key from a leaf, you might have underflow (fewer than minimum keys). The solutions, in order of preference: borrow from a sibling, or merge with a sibling.
def delete(self, key: int) -> bool:
"""Delete a key from the tree. Returns True if found and deleted."""
leaf = self._find_leaf(key)
# Find and remove key
try:
idx = leaf.keys.index(key)
except ValueError:
return False
leaf.keys.pop(idx)
leaf.values.pop(idx)
# Handle underflow (skip for root)
if leaf.parent and leaf.is_underflow:
self._handle_underflow(leaf)
return True
def _handle_underflow(self, node: BPlusTreeNode) -> None:
"""Rebalance tree after deletion causes underflow."""
parent = node.parent
node_idx = parent.children.index(node)
# Try borrowing from left sibling
if node_idx > 0:
left_sibling = parent.children[node_idx - 1]
if len(left_sibling.keys) > (self.order - 1) // 2:
self._borrow_from_left(node, left_sibling, parent, node_idx)
return
# Try borrowing from right sibling
if node_idx < len(parent.children) - 1:
right_sibling = parent.children[node_idx + 1]
if len(right_sibling.keys) > (self.order - 1) // 2:
self._borrow_from_right(node, right_sibling, parent, node_idx)
return
# Must merge
if node_idx > 0:
self._merge_nodes(parent.children[node_idx - 1], node,
parent, node_idx - 1)
else:
self._merge_nodes(node, parent.children[node_idx + 1],
parent, node_idx)
def _borrow_from_left(self, node: LeafNode, sibling: LeafNode,
parent: InternalNode, node_idx: int) -> None:
"""Borrow one key from left sibling."""
# Move last key from sibling to node
node.keys.insert(0, sibling.keys.pop())
node.values.insert(0, sibling.values.pop())
# Update parent separator
parent.keys[node_idx - 1] = node.keys[0]
Range Queries and Sequential Access
This is where B+ trees shine. The linked leaf structure enables efficient range scans without tree traversal:
def range_query(self, start_key: int, end_key: int) -> List[Tuple[int, Any]]:
"""Return all key-value pairs where start_key <= key <= end_key."""
results = []
# Find starting leaf
leaf = self._find_leaf(start_key)
# Scan through leaves using next pointers
while leaf:
for i, key in enumerate(leaf.keys):
if key > end_key:
return results
if key >= start_key:
results.append((key, leaf.values[i]))
leaf = leaf.next_leaf
return results
def scan_all(self) -> List[Tuple[int, Any]]:
"""Return all key-value pairs in sorted order."""
results = []
# Find leftmost leaf
node = self.root
while not node.is_leaf:
node = node.children[0]
# Traverse leaf chain
while node:
for i, key in enumerate(node.keys):
results.append((key, node.values[i]))
node = node.next_leaf
return results
A range query touching 1000 consecutive records requires finding the starting leaf (3 disk reads) plus reading the sequential leaf pages (perhaps 4 more reads). Compare this to a binary search tree, which would need 1000 separate tree traversals.
Practical Considerations
Real implementations must consider disk page sizes. Here’s how to calculate optimal node capacity:
def calculate_order(page_size: int = 4096,
key_size: int = 8,
pointer_size: int = 8) -> int:
"""
Calculate optimal B+ tree order for given page size.
Internal node: m pointers + (m-1) keys must fit in one page
Formula: m * pointer_size + (m-1) * key_size <= page_size
"""
# Solve for m: m * (pointer_size + key_size) <= page_size + key_size
m = (page_size + key_size) // (pointer_size + key_size)
return m
# Example: 4KB pages with 8-byte keys and pointers
order = calculate_order(4096, 8, 8) # Returns 256
For concurrent access, B+ trees typically use lock coupling (crabbing): acquire lock on child before releasing parent lock. Some implementations use optimistic locking, assuming conflicts are rare and validating before committing changes.
Bulk loading is another optimization. Instead of inserting records one by one, sort the data first, then build the tree bottom-up by filling leaf pages sequentially. This produces a perfectly packed tree and is orders of magnitude faster for initial loads.
B+ trees have earned their place as the default index structure. Understanding their implementation helps you make better decisions about indexing strategies and appreciate the engineering behind every database query.