Weight-Balanced Tree: Size-Balanced BST

Binary search trees need balance to maintain O(log n) operations. Most developers reach for AVL trees (height-balanced) or Red-Black trees (color-based invariants) without considering a third option:...

Key Insights

  • Weight-balanced trees use subtree sizes rather than heights for balance, enabling O(log n) order statistic queries as a built-in feature rather than an afterthought.
  • The α-balance invariant guarantees logarithmic height while providing flexibility in choosing rebalancing aggressiveness through the α parameter.
  • Size-based balancing shines in scenarios requiring frequent rank queries, range counting, or functional programming contexts where immutable structures benefit from predictable rebalancing.

Introduction to Weight-Balanced Trees

Binary search trees need balance to maintain O(log n) operations. Most developers reach for AVL trees (height-balanced) or Red-Black trees (color-based invariants) without considering a third option: weight-balanced trees, also called BB[α] trees or trees of bounded balance.

Weight-balanced trees maintain balance using subtree sizes instead of heights. Each node stores the count of nodes in its subtree, and the tree enforces that no subtree becomes too small relative to its sibling. This approach, introduced by Nievergelt and Reingold in 1973, offers a compelling trade-off: slightly more complex rebalancing logic in exchange for built-in support for order statistics.

The core insight is simple. If you’re already tracking subtree sizes for rank queries, you might as well use that information for balancing. You get two features for the price of one data structure.

The Balance Invariant and Weight Factor

The α-weight-balance condition defines when a tree is balanced. For a node with left subtree of size left_size and right subtree of size right_size, the tree is α-balanced if:

α ≤ (left_size + 1) / (left_size + right_size + 2) ≤ 1 - α

The +1 and +2 account for counting nodes consistently (we treat empty subtrees as having “weight” 1). This ensures both children contain at least an α fraction of the parent’s weight.

Choosing α matters. Valid values range from 0 to 0.5, but practical implementations use 0.25 to 0.29. Lower α values mean looser balance (fewer rotations, potentially taller trees). Higher values enforce stricter balance (more rotations, guaranteed shorter trees). The sweet spot of α = 2/7 ≈ 0.286 allows single rotations to always restore balance, simplifying implementation.

Here’s the node structure and weight calculation:

class WBNode:
    def __init__(self, key, value=None):
        self.key = key
        self.value = value
        self.left = None
        self.right = None
        self.size = 1  # Size of subtree rooted at this node
    
    def update_size(self):
        left_size = self.left.size if self.left else 0
        right_size = self.right.size if self.right else 0
        self.size = left_size + right_size + 1


def weight(node):
    """Weight of a subtree (size + 1 for balance calculations)."""
    return node.size + 1 if node else 1


def is_balanced(node, alpha=0.29):
    """Check if node satisfies α-balance condition."""
    if not node:
        return True
    
    left_weight = weight(node.left)
    right_weight = weight(node.right)
    total_weight = left_weight + right_weight
    
    left_ratio = left_weight / total_weight
    return alpha <= left_ratio <= (1 - alpha)

Rotations with Size Maintenance

Rotations in weight-balanced trees follow the same structural transformations as standard BST rotations. The critical difference: you must update size fields after restructuring, and you must do it in the correct order—children before parents.

def rotate_right(node):
    """
    Right rotation around node.
    
        node              left
        /  \             /    \
      left  C    =>     A    node
      /  \                   /  \
     A    B                 B    C
    """
    left = node.left
    node.left = left.right
    left.right = node
    
    # Update sizes: node first (now a child), then left (now parent)
    node.update_size()
    left.update_size()
    
    return left


def rotate_left(node):
    """
    Left rotation around node.
    
      node                right
      /  \               /     \
     A  right    =>    node     C
        /  \           /  \
       B    C         A    B
    """
    right = node.right
    node.right = right.left
    right.left = node
    
    node.update_size()
    right.update_size()
    
    return right


def rotate_right_left(node):
    """Double rotation: right rotation on right child, then left on node."""
    node.right = rotate_right(node.right)
    return rotate_left(node)


def rotate_left_right(node):
    """Double rotation: left rotation on left child, then right on node."""
    node.left = rotate_left(node.left)
    return rotate_right(node)

The order of size updates is non-negotiable. After a right rotation, the original node becomes a child of left. You must update node’s size first because left’s new size depends on it.

Insertion with Rebalancing

Insertion follows the standard BST pattern with two additions: update sizes on the way back up, and rebalance when the invariant breaks.

class WeightBalancedTree:
    def __init__(self, alpha=0.29):
        self.root = None
        self.alpha = alpha
    
    def insert(self, key, value=None):
        self.root = self._insert(self.root, key, value)
    
    def _insert(self, node, key, value):
        if not node:
            return WBNode(key, value)
        
        if key < node.key:
            node.left = self._insert(node.left, key, value)
        elif key > node.key:
            node.right = self._insert(node.right, key, value)
        else:
            node.value = value  # Update existing key
            return node
        
        node.update_size()
        return self._rebalance(node)
    
    def _rebalance(self, node):
        """Restore balance if α-condition is violated."""
        if is_balanced(node, self.alpha):
            return node
        
        left_weight = weight(node.left)
        right_weight = weight(node.right)
        
        if left_weight > right_weight:
            # Left subtree too heavy
            left = node.left
            if weight(left.left) >= weight(left.right):
                # Single rotation suffices
                return rotate_right(node)
            else:
                # Need double rotation
                return rotate_left_right(node)
        else:
            # Right subtree too heavy
            right = node.right
            if weight(right.right) >= weight(right.left):
                return rotate_left(node)
            else:
                return rotate_right_left(node)

The rebalancing logic checks which subtree is too heavy, then examines the grandchild weights to decide between single and double rotations. With α ≤ 1 - 1/√2 ≈ 0.293, a single rotation always restores balance. This is why α = 0.29 is a popular choice.

Deletion and Size Updates

Deletion requires more care. After removing a node, you must update sizes along the path to the root and potentially rebalance multiple ancestors.

def delete(self, key):
    self.root = self._delete(self.root, key)

def _delete(self, node, key):
    if not node:
        return None
    
    if key < node.key:
        node.left = self._delete(node.left, key)
    elif key > node.key:
        node.right = self._delete(node.right, key)
    else:
        # Found the node to delete
        if not node.left:
            return node.right
        if not node.right:
            return node.left
        
        # Two children: replace with in-order successor
        successor = self._find_min(node.right)
        node.key = successor.key
        node.value = successor.value
        node.right = self._delete(node.right, successor.key)
    
    node.update_size()
    return self._rebalance(node)

def _find_min(self, node):
    while node.left:
        node = node.left
    return node

Some implementations use lazy rebalancing for deletion: allow the tree to become slightly unbalanced and rebuild subtrees only when they exceed a threshold. This amortizes rebalancing costs over multiple deletions. For most applications, eager rebalancing (as shown above) provides simpler code and predictable performance.

Order Statistics and Rank Queries

Here’s where weight-balanced trees earn their keep. The size field enables O(log n) order statistic queries without additional data structures.

def select(self, k):
    """Find the k-th smallest element (0-indexed)."""
    return self._select(self.root, k)

def _select(self, node, k):
    if not node:
        raise IndexError("k out of range")
    
    left_size = node.left.size if node.left else 0
    
    if k < left_size:
        return self._select(node.left, k)
    elif k == left_size:
        return node.key
    else:
        return self._select(node.right, k - left_size - 1)

def rank(self, key):
    """Find the rank (0-indexed position) of a key."""
    return self._rank(self.root, key)

def _rank(self, node, key):
    if not node:
        raise KeyError(f"Key {key} not found")
    
    left_size = node.left.size if node.left else 0
    
    if key < node.key:
        return self._rank(node.left, key)
    elif key > node.key:
        return left_size + 1 + self._rank(node.right, key)
    else:
        return left_size

def range_count(self, lo, hi):
    """Count elements in range [lo, hi]."""
    if lo > hi:
        return 0
    
    hi_rank = self._rank_upper(self.root, hi)
    lo_rank = self._rank_lower(self.root, lo)
    return hi_rank - lo_rank

These operations are the killer feature. Need the median? Call select(n // 2). Need to know how many elements fall below a threshold? Call rank(threshold). AVL and Red-Black trees require augmentation to support these queries; weight-balanced trees have them built in.

Performance Analysis and Use Cases

Time complexity matches other balanced BSTs: O(log n) for search, insert, and delete. Space overhead is one integer per node for the size field—comparable to the height field in AVL trees or the color bit in Red-Black trees.

The practical differences emerge in specific scenarios:

Choose weight-balanced trees when:

  • You need frequent order statistic queries (select, rank, range count)
  • You’re implementing in a functional language where immutable updates benefit from size-based splitting
  • You want a single data structure for both dictionary and sorted sequence operations

Stick with Red-Black trees when:

  • You need battle-tested library implementations
  • Memory is extremely constrained (Red-Black needs only 1 bit vs. ~32 bits for size)
  • You’re doing heavy concurrent access (Red-Black has simpler locking patterns)

Consider AVL trees when:

  • Read-heavy workloads dominate (AVL trees are slightly shorter)
  • You don’t need order statistics

Weight-balanced trees see heavy use in functional programming. Haskell’s Data.Set and Data.Map use weight-balanced trees internally. The size field enables efficient split and merge operations essential for persistent data structures.

For application developers, weight-balanced trees are worth considering whenever you find yourself maintaining both a balanced BST and a separate mechanism for rank queries. Consolidating into a single structure reduces complexity and often improves cache behavior. The implementation complexity is modest—roughly equivalent to AVL trees—and the payoff in functionality is substantial.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.