B+ Tree: Database Index Structure Implementation

Every time you run a SQL query with a WHERE clause, you're almost certainly traversing a B+ tree. This data structure has dominated database indexing for decades, and understanding its implementation...

Key Insights

  • B+ trees store data pointers only in leaf nodes and link leaves together, making range queries dramatically faster than B trees or binary search trees
  • Node splitting during insertion and merging during deletion maintain perfect balance automatically, guaranteeing O(log n) operations regardless of insertion order
  • Choosing a branching factor that matches your disk page size (typically 4KB or 8KB) minimizes I/O operations and maximizes cache efficiency

Introduction to B+ Trees

Every time you run a SQL query with a WHERE clause, you’re almost certainly traversing a B+ tree. This data structure has dominated database indexing for decades, and understanding its implementation reveals why.

Binary search trees fail for databases because they create deep, narrow structures that require many disk reads. A tree with a million records might be 20 levels deep, meaning 20 separate disk I/O operations per lookup. B trees improve this by packing many keys into each node, but they scatter data pointers throughout the tree.

B+ trees solve both problems. They use high branching factors to create shallow trees (typically 3-4 levels for millions of records), and they concentrate all data in linked leaf nodes. This design optimizes for the reality of storage systems: sequential reads are cheap, random reads are expensive.

B+ Tree Structure and Properties

A B+ tree of order m has these properties:

  • Each internal node holds at most m-1 keys and m child pointers
  • Each internal node (except root) has at least ⌈m/2⌉ children
  • All leaves appear at the same level
  • Leaf nodes contain keys and data pointers, linked in sorted order
  • Internal nodes contain only keys for navigation

The order m is typically chosen so that a node fills exactly one disk page. With 4KB pages and 8-byte keys plus 8-byte pointers, you might fit 250+ keys per node.

Here’s the node structure:

from typing import List, Optional, Any, Tuple

class BPlusTreeNode:
    def __init__(self, order: int, is_leaf: bool = False):
        self.order = order
        self.is_leaf = is_leaf
        self.keys: List[int] = []
        self.parent: Optional[BPlusTreeNode] = None
    
    @property
    def is_full(self) -> bool:
        return len(self.keys) >= self.order - 1
    
    @property
    def is_underflow(self) -> bool:
        min_keys = (self.order - 1) // 2
        return len(self.keys) < min_keys


class InternalNode(BPlusTreeNode):
    def __init__(self, order: int):
        super().__init__(order, is_leaf=False)
        self.children: List[BPlusTreeNode] = []
    
    def get_child_index(self, key: int) -> int:
        """Find which child pointer to follow for a given key."""
        for i, k in enumerate(self.keys):
            if key < k:
                return i
        return len(self.keys)


class LeafNode(BPlusTreeNode):
    def __init__(self, order: int):
        super().__init__(order, is_leaf=True)
        self.values: List[Any] = []  # Data pointers or actual values
        self.next_leaf: Optional[LeafNode] = None
        self.prev_leaf: Optional[LeafNode] = None

The separation between internal and leaf nodes is fundamental. Internal nodes are purely navigational—they tell you which subtree to explore. Leaf nodes hold the actual payload and form a doubly-linked list for efficient sequential access.

Search Operations

Searching a B+ tree always ends at a leaf node. Starting from the root, you compare the search key against node keys to determine which child to visit. This continues until you reach a leaf, where you perform a final search for the key.

class BPlusTree:
    def __init__(self, order: int = 4):
        self.order = order
        self.root: BPlusTreeNode = LeafNode(order)
    
    def search(self, key: int) -> Optional[Any]:
        """Find value associated with key, or None if not found."""
        leaf = self._find_leaf(key)
        
        for i, k in enumerate(leaf.keys):
            if k == key:
                return leaf.values[i]
        return None
    
    def _find_leaf(self, key: int) -> LeafNode:
        """Navigate from root to the leaf that should contain key."""
        node = self.root
        
        while not node.is_leaf:
            internal = node  # Type: InternalNode
            child_idx = internal.get_child_index(key)
            node = internal.children[child_idx]
        
        return node
    
    def search_iterative(self, key: int) -> Optional[Any]:
        """Iterative version - often preferred for deep trees."""
        node = self.root
        
        # Descend to leaf level
        while isinstance(node, InternalNode):
            idx = 0
            while idx < len(node.keys) and key >= node.keys[idx]:
                idx += 1
            node = node.children[idx]
        
        # Binary search within leaf
        leaf = node
        left, right = 0, len(leaf.keys) - 1
        while left <= right:
            mid = (left + right) // 2
            if leaf.keys[mid] == key:
                return leaf.values[mid]
            elif leaf.keys[mid] < key:
                left = mid + 1
            else:
                right = mid - 1
        
        return None

With a branching factor of 256 and a million records, the tree is only 3 levels deep. Three disk reads to find any record—that’s the power of B+ trees.

Insertion and Node Splitting

Insertion starts by finding the correct leaf, then inserting the key-value pair in sorted order. When a node overflows (exceeds m-1 keys), it splits into two nodes, and the middle key propagates up to the parent. This might trigger cascading splits up to the root.

def insert(self, key: int, value: Any) -> None:
    """Insert a key-value pair into the tree."""
    leaf = self._find_leaf(key)
    
    # Insert into leaf in sorted position
    insert_idx = 0
    while insert_idx < len(leaf.keys) and leaf.keys[insert_idx] < key:
        insert_idx += 1
    
    leaf.keys.insert(insert_idx, key)
    leaf.values.insert(insert_idx, value)
    
    # Handle overflow
    if leaf.is_full:
        self._split_leaf(leaf)

def _split_leaf(self, leaf: LeafNode) -> None:
    """Split an overflowing leaf node."""
    mid = len(leaf.keys) // 2
    
    # Create new leaf with upper half
    new_leaf = LeafNode(self.order)
    new_leaf.keys = leaf.keys[mid:]
    new_leaf.values = leaf.values[mid:]
    
    # Truncate original leaf
    leaf.keys = leaf.keys[:mid]
    leaf.values = leaf.values[:mid]
    
    # Update leaf chain
    new_leaf.next_leaf = leaf.next_leaf
    new_leaf.prev_leaf = leaf
    if leaf.next_leaf:
        leaf.next_leaf.prev_leaf = new_leaf
    leaf.next_leaf = new_leaf
    
    # Propagate split key to parent
    split_key = new_leaf.keys[0]
    self._insert_into_parent(leaf, split_key, new_leaf)

def _insert_into_parent(self, left: BPlusTreeNode, key: int, 
                        right: BPlusTreeNode) -> None:
    """Insert a key and child pointer into the parent node."""
    if left.parent is None:
        # Create new root
        new_root = InternalNode(self.order)
        new_root.keys = [key]
        new_root.children = [left, right]
        left.parent = new_root
        right.parent = new_root
        self.root = new_root
        return
    
    parent = left.parent
    
    # Find position and insert
    idx = parent.children.index(left)
    parent.keys.insert(idx, key)
    parent.children.insert(idx + 1, right)
    right.parent = parent
    
    # Handle parent overflow
    if parent.is_full:
        self._split_internal(parent)

def _split_internal(self, node: InternalNode) -> None:
    """Split an overflowing internal node."""
    mid = len(node.keys) // 2
    promote_key = node.keys[mid]
    
    # Create new internal node with upper half
    new_node = InternalNode(self.order)
    new_node.keys = node.keys[mid + 1:]
    new_node.children = node.children[mid + 1:]
    
    # Update children's parent pointers
    for child in new_node.children:
        child.parent = new_node
    
    # Truncate original node
    node.keys = node.keys[:mid]
    node.children = node.children[:mid + 1]
    
    # Propagate upward
    self._insert_into_parent(node, promote_key, new_node)

Notice that leaf splits copy the middle key up (it remains in the leaf), while internal splits push the middle key up (it’s removed from the original node). This distinction ensures leaves contain all keys while internal nodes remain purely navigational.

Deletion and Rebalancing

Deletion is more complex. After removing a key from a leaf, you might have underflow (fewer than minimum keys). The solutions, in order of preference: borrow from a sibling, or merge with a sibling.

def delete(self, key: int) -> bool:
    """Delete a key from the tree. Returns True if found and deleted."""
    leaf = self._find_leaf(key)
    
    # Find and remove key
    try:
        idx = leaf.keys.index(key)
    except ValueError:
        return False
    
    leaf.keys.pop(idx)
    leaf.values.pop(idx)
    
    # Handle underflow (skip for root)
    if leaf.parent and leaf.is_underflow:
        self._handle_underflow(leaf)
    
    return True

def _handle_underflow(self, node: BPlusTreeNode) -> None:
    """Rebalance tree after deletion causes underflow."""
    parent = node.parent
    node_idx = parent.children.index(node)
    
    # Try borrowing from left sibling
    if node_idx > 0:
        left_sibling = parent.children[node_idx - 1]
        if len(left_sibling.keys) > (self.order - 1) // 2:
            self._borrow_from_left(node, left_sibling, parent, node_idx)
            return
    
    # Try borrowing from right sibling
    if node_idx < len(parent.children) - 1:
        right_sibling = parent.children[node_idx + 1]
        if len(right_sibling.keys) > (self.order - 1) // 2:
            self._borrow_from_right(node, right_sibling, parent, node_idx)
            return
    
    # Must merge
    if node_idx > 0:
        self._merge_nodes(parent.children[node_idx - 1], node, 
                         parent, node_idx - 1)
    else:
        self._merge_nodes(node, parent.children[node_idx + 1], 
                         parent, node_idx)

def _borrow_from_left(self, node: LeafNode, sibling: LeafNode,
                      parent: InternalNode, node_idx: int) -> None:
    """Borrow one key from left sibling."""
    # Move last key from sibling to node
    node.keys.insert(0, sibling.keys.pop())
    node.values.insert(0, sibling.values.pop())
    
    # Update parent separator
    parent.keys[node_idx - 1] = node.keys[0]

Range Queries and Sequential Access

This is where B+ trees shine. The linked leaf structure enables efficient range scans without tree traversal:

def range_query(self, start_key: int, end_key: int) -> List[Tuple[int, Any]]:
    """Return all key-value pairs where start_key <= key <= end_key."""
    results = []
    
    # Find starting leaf
    leaf = self._find_leaf(start_key)
    
    # Scan through leaves using next pointers
    while leaf:
        for i, key in enumerate(leaf.keys):
            if key > end_key:
                return results
            if key >= start_key:
                results.append((key, leaf.values[i]))
        
        leaf = leaf.next_leaf
    
    return results

def scan_all(self) -> List[Tuple[int, Any]]:
    """Return all key-value pairs in sorted order."""
    results = []
    
    # Find leftmost leaf
    node = self.root
    while not node.is_leaf:
        node = node.children[0]
    
    # Traverse leaf chain
    while node:
        for i, key in enumerate(node.keys):
            results.append((key, node.values[i]))
        node = node.next_leaf
    
    return results

A range query touching 1000 consecutive records requires finding the starting leaf (3 disk reads) plus reading the sequential leaf pages (perhaps 4 more reads). Compare this to a binary search tree, which would need 1000 separate tree traversals.

Practical Considerations

Real implementations must consider disk page sizes. Here’s how to calculate optimal node capacity:

def calculate_order(page_size: int = 4096, 
                   key_size: int = 8, 
                   pointer_size: int = 8) -> int:
    """
    Calculate optimal B+ tree order for given page size.
    
    Internal node: m pointers + (m-1) keys must fit in one page
    Formula: m * pointer_size + (m-1) * key_size <= page_size
    """
    # Solve for m: m * (pointer_size + key_size) <= page_size + key_size
    m = (page_size + key_size) // (pointer_size + key_size)
    return m

# Example: 4KB pages with 8-byte keys and pointers
order = calculate_order(4096, 8, 8)  # Returns 256

For concurrent access, B+ trees typically use lock coupling (crabbing): acquire lock on child before releasing parent lock. Some implementations use optimistic locking, assuming conflicts are rare and validating before committing changes.

Bulk loading is another optimization. Instead of inserting records one by one, sort the data first, then build the tree bottom-up by filling leaf pages sequentially. This produces a perfectly packed tree and is orders of magnitude faster for initial loads.

B+ trees have earned their place as the default index structure. Understanding their implementation helps you make better decisions about indexing strategies and appreciate the engineering behind every database query.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.