Weight-Balanced Tree: Size-Balanced BST
Binary search trees need balance to maintain O(log n) operations. Most developers reach for AVL trees (height-balanced) or Red-Black trees (color-based invariants) without considering a third option:...
Key Insights
- Weight-balanced trees use subtree sizes rather than heights for balance, enabling O(log n) order statistic queries as a built-in feature rather than an afterthought.
- The α-balance invariant guarantees logarithmic height while providing flexibility in choosing rebalancing aggressiveness through the α parameter.
- Size-based balancing shines in scenarios requiring frequent rank queries, range counting, or functional programming contexts where immutable structures benefit from predictable rebalancing.
Introduction to Weight-Balanced Trees
Binary search trees need balance to maintain O(log n) operations. Most developers reach for AVL trees (height-balanced) or Red-Black trees (color-based invariants) without considering a third option: weight-balanced trees, also called BB[α] trees or trees of bounded balance.
Weight-balanced trees maintain balance using subtree sizes instead of heights. Each node stores the count of nodes in its subtree, and the tree enforces that no subtree becomes too small relative to its sibling. This approach, introduced by Nievergelt and Reingold in 1973, offers a compelling trade-off: slightly more complex rebalancing logic in exchange for built-in support for order statistics.
The core insight is simple. If you’re already tracking subtree sizes for rank queries, you might as well use that information for balancing. You get two features for the price of one data structure.
The Balance Invariant and Weight Factor
The α-weight-balance condition defines when a tree is balanced. For a node with left subtree of size left_size and right subtree of size right_size, the tree is α-balanced if:
α ≤ (left_size + 1) / (left_size + right_size + 2) ≤ 1 - α
The +1 and +2 account for counting nodes consistently (we treat empty subtrees as having “weight” 1). This ensures both children contain at least an α fraction of the parent’s weight.
Choosing α matters. Valid values range from 0 to 0.5, but practical implementations use 0.25 to 0.29. Lower α values mean looser balance (fewer rotations, potentially taller trees). Higher values enforce stricter balance (more rotations, guaranteed shorter trees). The sweet spot of α = 2/7 ≈ 0.286 allows single rotations to always restore balance, simplifying implementation.
Here’s the node structure and weight calculation:
class WBNode:
def __init__(self, key, value=None):
self.key = key
self.value = value
self.left = None
self.right = None
self.size = 1 # Size of subtree rooted at this node
def update_size(self):
left_size = self.left.size if self.left else 0
right_size = self.right.size if self.right else 0
self.size = left_size + right_size + 1
def weight(node):
"""Weight of a subtree (size + 1 for balance calculations)."""
return node.size + 1 if node else 1
def is_balanced(node, alpha=0.29):
"""Check if node satisfies α-balance condition."""
if not node:
return True
left_weight = weight(node.left)
right_weight = weight(node.right)
total_weight = left_weight + right_weight
left_ratio = left_weight / total_weight
return alpha <= left_ratio <= (1 - alpha)
Rotations with Size Maintenance
Rotations in weight-balanced trees follow the same structural transformations as standard BST rotations. The critical difference: you must update size fields after restructuring, and you must do it in the correct order—children before parents.
def rotate_right(node):
"""
Right rotation around node.
node left
/ \ / \
left C => A node
/ \ / \
A B B C
"""
left = node.left
node.left = left.right
left.right = node
# Update sizes: node first (now a child), then left (now parent)
node.update_size()
left.update_size()
return left
def rotate_left(node):
"""
Left rotation around node.
node right
/ \ / \
A right => node C
/ \ / \
B C A B
"""
right = node.right
node.right = right.left
right.left = node
node.update_size()
right.update_size()
return right
def rotate_right_left(node):
"""Double rotation: right rotation on right child, then left on node."""
node.right = rotate_right(node.right)
return rotate_left(node)
def rotate_left_right(node):
"""Double rotation: left rotation on left child, then right on node."""
node.left = rotate_left(node.left)
return rotate_right(node)
The order of size updates is non-negotiable. After a right rotation, the original node becomes a child of left. You must update node’s size first because left’s new size depends on it.
Insertion with Rebalancing
Insertion follows the standard BST pattern with two additions: update sizes on the way back up, and rebalance when the invariant breaks.
class WeightBalancedTree:
def __init__(self, alpha=0.29):
self.root = None
self.alpha = alpha
def insert(self, key, value=None):
self.root = self._insert(self.root, key, value)
def _insert(self, node, key, value):
if not node:
return WBNode(key, value)
if key < node.key:
node.left = self._insert(node.left, key, value)
elif key > node.key:
node.right = self._insert(node.right, key, value)
else:
node.value = value # Update existing key
return node
node.update_size()
return self._rebalance(node)
def _rebalance(self, node):
"""Restore balance if α-condition is violated."""
if is_balanced(node, self.alpha):
return node
left_weight = weight(node.left)
right_weight = weight(node.right)
if left_weight > right_weight:
# Left subtree too heavy
left = node.left
if weight(left.left) >= weight(left.right):
# Single rotation suffices
return rotate_right(node)
else:
# Need double rotation
return rotate_left_right(node)
else:
# Right subtree too heavy
right = node.right
if weight(right.right) >= weight(right.left):
return rotate_left(node)
else:
return rotate_right_left(node)
The rebalancing logic checks which subtree is too heavy, then examines the grandchild weights to decide between single and double rotations. With α ≤ 1 - 1/√2 ≈ 0.293, a single rotation always restores balance. This is why α = 0.29 is a popular choice.
Deletion and Size Updates
Deletion requires more care. After removing a node, you must update sizes along the path to the root and potentially rebalance multiple ancestors.
def delete(self, key):
self.root = self._delete(self.root, key)
def _delete(self, node, key):
if not node:
return None
if key < node.key:
node.left = self._delete(node.left, key)
elif key > node.key:
node.right = self._delete(node.right, key)
else:
# Found the node to delete
if not node.left:
return node.right
if not node.right:
return node.left
# Two children: replace with in-order successor
successor = self._find_min(node.right)
node.key = successor.key
node.value = successor.value
node.right = self._delete(node.right, successor.key)
node.update_size()
return self._rebalance(node)
def _find_min(self, node):
while node.left:
node = node.left
return node
Some implementations use lazy rebalancing for deletion: allow the tree to become slightly unbalanced and rebuild subtrees only when they exceed a threshold. This amortizes rebalancing costs over multiple deletions. For most applications, eager rebalancing (as shown above) provides simpler code and predictable performance.
Order Statistics and Rank Queries
Here’s where weight-balanced trees earn their keep. The size field enables O(log n) order statistic queries without additional data structures.
def select(self, k):
"""Find the k-th smallest element (0-indexed)."""
return self._select(self.root, k)
def _select(self, node, k):
if not node:
raise IndexError("k out of range")
left_size = node.left.size if node.left else 0
if k < left_size:
return self._select(node.left, k)
elif k == left_size:
return node.key
else:
return self._select(node.right, k - left_size - 1)
def rank(self, key):
"""Find the rank (0-indexed position) of a key."""
return self._rank(self.root, key)
def _rank(self, node, key):
if not node:
raise KeyError(f"Key {key} not found")
left_size = node.left.size if node.left else 0
if key < node.key:
return self._rank(node.left, key)
elif key > node.key:
return left_size + 1 + self._rank(node.right, key)
else:
return left_size
def range_count(self, lo, hi):
"""Count elements in range [lo, hi]."""
if lo > hi:
return 0
hi_rank = self._rank_upper(self.root, hi)
lo_rank = self._rank_lower(self.root, lo)
return hi_rank - lo_rank
These operations are the killer feature. Need the median? Call select(n // 2). Need to know how many elements fall below a threshold? Call rank(threshold). AVL and Red-Black trees require augmentation to support these queries; weight-balanced trees have them built in.
Performance Analysis and Use Cases
Time complexity matches other balanced BSTs: O(log n) for search, insert, and delete. Space overhead is one integer per node for the size field—comparable to the height field in AVL trees or the color bit in Red-Black trees.
The practical differences emerge in specific scenarios:
Choose weight-balanced trees when:
- You need frequent order statistic queries (select, rank, range count)
- You’re implementing in a functional language where immutable updates benefit from size-based splitting
- You want a single data structure for both dictionary and sorted sequence operations
Stick with Red-Black trees when:
- You need battle-tested library implementations
- Memory is extremely constrained (Red-Black needs only 1 bit vs. ~32 bits for size)
- You’re doing heavy concurrent access (Red-Black has simpler locking patterns)
Consider AVL trees when:
- Read-heavy workloads dominate (AVL trees are slightly shorter)
- You don’t need order statistics
Weight-balanced trees see heavy use in functional programming. Haskell’s Data.Set and Data.Map use weight-balanced trees internally. The size field enables efficient split and merge operations essential for persistent data structures.
For application developers, weight-balanced trees are worth considering whenever you find yourself maintaining both a balanced BST and a separate mechanism for rank queries. Consolidating into a single structure reduces complexity and often improves cache behavior. The implementation complexity is modest—roughly equivalent to AVL trees—and the payoff in functionality is substantial.