Splay Tree: Self-Adjusting BST
Splay trees are binary search trees that reorganize themselves with every operation. Unlike AVL or Red-Black trees that maintain strict balance invariants, splay trees take a different approach: they...
Key Insights
- Splay trees automatically move frequently accessed elements to the root, achieving O(log n) amortized time without storing balance information
- The splaying operation uses three rotation patterns (Zig, Zig-Zig, Zig-Zag) that not only bring accessed nodes to the root but also roughly halve the depth of nodes along the access path
- Splay trees excel in scenarios with temporal locality—when some elements are accessed far more frequently than others—making them ideal for caches and symbol tables
Introduction to Splay Trees
Splay trees are binary search trees that reorganize themselves with every operation. Unlike AVL or Red-Black trees that maintain strict balance invariants, splay trees take a different approach: they move recently accessed nodes to the root through a series of rotations called “splaying.”
This design choice has profound implications. There’s no need to store height or color information at each node. The tree adapts dynamically to access patterns. Elements you use frequently naturally float toward the root, while rarely accessed elements sink deeper.
Daniel Sleator and Robert Tarjan introduced splay trees in 1985, and they remain relevant today. You’ll find them in the Linux kernel, GCC’s internal data structures, and Windows NT’s virtual memory management. The key insight is that strict balance isn’t always necessary—good amortized performance often matters more than worst-case guarantees.
The Splaying Operation
Splaying is the heart of the splay tree. After accessing a node, we perform rotations to bring it to the root. But we don’t just rotate blindly—we use three specific patterns based on the node’s position relative to its parent and grandparent.
Zig: When the target node’s parent is the root, perform a single rotation. This is the base case.
Zig-Zig: When the target and its parent are both left children (or both right children), rotate the parent around the grandparent first, then rotate the target around its parent.
Zig-Zag: When the target is a left child of a right child (or vice versa), rotate the target around its parent, then around its grandparent.
The Zig-Zig case is particularly clever. By rotating the parent first, we not only bring our target up but also reduce the depth of other nodes along the path. This property is crucial for the amortized analysis.
class SplayNode:
def __init__(self, key, value=None):
self.key = key
self.value = value
self.left = None
self.right = None
self.parent = None
class SplayTree:
def __init__(self):
self.root = None
def _rotate_left(self, node):
right_child = node.right
node.right = right_child.left
if right_child.left:
right_child.left.parent = node
right_child.parent = node.parent
if not node.parent:
self.root = right_child
elif node == node.parent.left:
node.parent.left = right_child
else:
node.parent.right = right_child
right_child.left = node
node.parent = right_child
def _rotate_right(self, node):
left_child = node.left
node.left = left_child.right
if left_child.right:
left_child.right.parent = node
left_child.parent = node.parent
if not node.parent:
self.root = left_child
elif node == node.parent.right:
node.parent.right = left_child
else:
node.parent.left = left_child
left_child.right = node
node.parent = left_child
def _splay(self, node):
while node.parent:
parent = node.parent
grandparent = parent.parent
if not grandparent:
# Zig case: parent is root
if node == parent.left:
self._rotate_right(parent)
else:
self._rotate_left(parent)
elif node == parent.left and parent == grandparent.left:
# Zig-Zig case: both are left children
self._rotate_right(grandparent)
self._rotate_right(parent)
elif node == parent.right and parent == grandparent.right:
# Zig-Zig case: both are right children
self._rotate_left(grandparent)
self._rotate_left(parent)
elif node == parent.right and parent == grandparent.left:
# Zig-Zag case: node is right, parent is left
self._rotate_left(parent)
self._rotate_right(grandparent)
else:
# Zig-Zag case: node is left, parent is right
self._rotate_right(parent)
self._rotate_left(grandparent)
Core Operations
Every splay tree operation ends with splaying. This is non-negotiable—it’s what makes the amortized analysis work.
Search: Walk down the tree as in any BST. If found, splay the node to the root. If not found, splay the last node visited.
Insert: Insert as in a standard BST, then splay the new node to the root.
Delete: Splay the target node to the root, remove it, then join the left and right subtrees.
def find(self, key):
node = self.root
last_visited = None
while node:
last_visited = node
if key < node.key:
node = node.left
elif key > node.key:
node = node.right
else:
self._splay(node)
return node.value
# Splay the last visited node even on miss
if last_visited:
self._splay(last_visited)
return None
def insert(self, key, value=None):
if not self.root:
self.root = SplayNode(key, value)
return
node = self.root
while True:
if key < node.key:
if node.left:
node = node.left
else:
node.left = SplayNode(key, value)
node.left.parent = node
self._splay(node.left)
return
elif key > node.key:
if node.right:
node = node.right
else:
node.right = SplayNode(key, value)
node.right.parent = node
self._splay(node.right)
return
else:
# Key exists, update value and splay
node.value = value
self._splay(node)
return
def delete(self, key):
if not self.root:
return False
# Splay the target to root
self.find(key)
if self.root.key != key:
return False # Key not found
left_subtree = self.root.left
right_subtree = self.root.right
if not left_subtree:
self.root = right_subtree
if right_subtree:
right_subtree.parent = None
elif not right_subtree:
self.root = left_subtree
left_subtree.parent = None
else:
# Find maximum in left subtree
left_subtree.parent = None
self.root = left_subtree
max_node = left_subtree
while max_node.right:
max_node = max_node.right
self._splay(max_node)
self.root.right = right_subtree
right_subtree.parent = self.root
return True
Amortized Analysis
Splay trees don’t guarantee O(log n) for any single operation. A pathological sequence can force O(n) time. Yet over any sequence of m operations on an n-node tree, the total time is O(m log n). This is amortized O(log n) per operation.
The analysis uses the potential method. We assign a “potential” to each tree configuration—think of it as stored energy. Operations that create imbalance increase potential; splaying decreases it. The amortized cost equals actual cost plus change in potential.
For splay trees, we define the potential as the sum of log(size of subtree) for each node. When we splay a deep node, the actual cost is high, but we significantly reduce potential by flattening the access path. This “pays back” the expensive operation.
The Zig-Zig rotation is key. It doesn’t just move our target up—it compresses the entire path, roughly halving depths. This compression is what enables the amortized bound.
There’s also the “static optimality” property: splay trees perform within a constant factor of the optimal static BST for any access sequence. They even exhibit the “working set” property—accessing recently used elements is cheap.
Practical Advantages and Trade-offs
Splay trees shine when access patterns are non-uniform. If 20% of your keys account for 80% of accesses, those popular keys will naturally cluster near the root.
Where splay trees excel:
- Caches with temporal locality
- Symbol tables in compilers (recently declared variables are accessed most)
- Network routing tables where some destinations are hot
- Any workload with a “working set” pattern
Where splay trees struggle:
- Real-time systems requiring worst-case guarantees
- Highly concurrent environments (splaying modifies structure on reads)
- Uniform random access patterns (no locality to exploit)
Here’s a simple cache implementation that leverages splay tree properties:
class SplayCache:
"""
A cache that uses splay tree properties for LRU-like behavior.
Recently accessed items naturally stay near the root.
"""
def __init__(self, max_size):
self.tree = SplayTree()
self.max_size = max_size
self.size = 0
def get(self, key):
value = self.tree.find(key)
# Splaying already happened in find()
return value
def put(self, key, value):
# Check if key exists
if self.tree.find(key) is not None:
self.tree.root.value = value
return
# Evict if at capacity
if self.size >= self.max_size:
self._evict_coldest()
self.tree.insert(key, value)
self.size += 1
def _evict_coldest(self):
"""Remove the deepest leaf (least recently accessed)."""
if not self.tree.root:
return
node = self.tree.root
while node.left or node.right:
# Prefer going to the subtree that wasn't recently accessed
if node.right:
node = node.right
else:
node = node.left
self.tree.delete(node.key)
self.size -= 1
Comparison with Other BST Variants
| Property | Splay Tree | AVL Tree | Red-Black Tree | Treap |
|---|---|---|---|---|
| Worst-case search | O(n) | O(log n) | O(log n) | O(n)* |
| Amortized search | O(log n) | O(log n) | O(log n) | O(log n) |
| Space per node | 2 pointers | 2 pointers + height | 2 pointers + color | 2 pointers + priority |
| Adapts to access pattern | Yes | No | No | No |
| Read-only thread safe | No | Yes | Yes | Yes |
| Implementation complexity | Medium | High | High | Low |
*Treaps have expected O(log n) with high probability.
Choose splay trees when:
- Access patterns have strong temporal locality
- You can tolerate occasional slow operations
- Memory per node matters
- You want adaptive performance without tuning
Choose AVL/Red-Black when:
- You need worst-case guarantees
- Multiple threads read concurrently
- Access patterns are uniform
Conclusion
Splay trees embody a powerful idea: let the data structure adapt to usage patterns rather than enforcing rigid invariants. The splaying operation—with its Zig, Zig-Zig, and Zig-Zag cases—brings accessed nodes to the root while compressing access paths.
The amortized O(log n) bound holds without any balance bookkeeping. In workloads with locality, splay trees often outperform their balanced cousins despite lacking worst-case guarantees.
You’ll find splay trees in production systems: Windows NT uses them for virtual memory management, GCC uses them internally, and various database systems employ them for indexing. They’re not always the right choice, but when access patterns are skewed, splay trees deliver performance that adapts to your actual workload rather than theoretical worst cases.