Compressed Trie: Patricia Tree and Radix Tree
Standard tries waste enormous amounts of memory. Consider storing the words 'application', 'applicant', and 'apply' in a traditional trie. You'd create 11 nodes just for the shared prefix 'applic',...
Key Insights
- Compressed tries eliminate single-child nodes by storing multiple characters per edge, reducing memory overhead by 50-90% compared to standard tries while maintaining O(k) lookup time where k is key length.
- Patricia trees are binary radix trees that compare bits rather than characters, making them ideal for fixed-width data like IP addresses, while general radix trees excel at string-based lookups.
- The Linux kernel, Redis, and most web framework routers use radix trees internally—understanding their mechanics helps you make informed decisions about data structure selection.
Introduction to Compressed Tries
Standard tries waste enormous amounts of memory. Consider storing the words “application”, “applicant”, and “apply” in a traditional trie. You’d create 11 nodes just for the shared prefix “applic”, with each node containing only a single child pointer. This pattern of long chains with single-child nodes is the norm, not the exception.
Compressed tries solve this by collapsing these chains. Instead of one character per edge, compressed tries store entire strings. The three words above would share a single edge labeled “applic” before branching. This simple optimization typically reduces node count by 50-90%, depending on key distribution.
The two main variants—radix trees and Patricia trees—take different approaches to compression. Understanding both helps you choose the right tool for your specific problem.
Radix Tree Fundamentals
A radix tree (also called a compact prefix tree) compresses paths where nodes have only one child. Each edge stores a string rather than a single character, and nodes only exist at branching points or key terminations.
The core insight is simple: if a node has exactly one child, merge it with that child. The edge label becomes the concatenation of both original edges.
Here’s a basic radix tree implementation:
class RadixNode:
def __init__(self):
self.children = {} # edge_label -> child_node
self.is_end = False
self.value = None # optional: store associated data
class RadixTree:
def __init__(self):
self.root = RadixNode()
def insert(self, key, value=None):
node = self.root
i = 0
while i < len(key):
# Find edge with matching prefix
match_edge = None
match_len = 0
for edge in node.children:
common = self._common_prefix_length(edge, key[i:])
if common > 0:
match_edge = edge
match_len = common
break
if match_edge is None:
# No matching edge; create new one
node.children[key[i:]] = RadixNode()
node.children[key[i:]].is_end = True
node.children[key[i:]].value = value
return
if match_len == len(match_edge):
# Full edge match; continue to child
node = node.children[match_edge]
i += match_len
else:
# Partial match; split the edge
self._split_edge(node, match_edge, match_len, key[i:], value)
return
node.is_end = True
node.value = value
def _split_edge(self, parent, edge, split_pos, remaining_key, value):
old_child = parent.children[edge]
# Create intermediate node
new_node = RadixNode()
parent.children[edge[:split_pos]] = new_node
del parent.children[edge]
# Reattach old child with remaining edge
new_node.children[edge[split_pos:]] = old_child
# Add new key's remaining portion
if len(remaining_key) > split_pos:
new_child = RadixNode()
new_child.is_end = True
new_child.value = value
new_node.children[remaining_key[split_pos:]] = new_child
else:
new_node.is_end = True
new_node.value = value
def _common_prefix_length(self, s1, s2):
i = 0
while i < len(s1) and i < len(s2) and s1[i] == s2[i]:
i += 1
return i
The key complexity lies in edge splitting. When inserting “test” into a tree containing “testing”, you must split the “testing” edge into “test” and “ing”, creating an intermediate node.
Patricia Trees Explained
PATRICIA (Practical Algorithm to Retrieve Information Coded in Alphanumeric) takes compression further by operating at the bit level. Instead of storing edge labels, Patricia trees store a “skip count”—the bit position to examine next.
This design emerged from Donald Morrison’s 1968 work on information retrieval. The key innovation: nodes don’t store the actual key data. Instead, they store only the bit index for comparison, and the full key is stored only at leaf nodes.
class PatriciaNode:
def __init__(self, bit_index=-1, key=None, value=None):
self.bit_index = bit_index # which bit to test
self.key = key # full key (only meaningful at leaves)
self.value = value
self.left = None # bit is 0
self.right = None # bit is 1
class PatriciaTree:
def __init__(self):
self.root = None
def _get_bit(self, key, index):
"""Extract bit at given index from key."""
if index < 0:
return 0
byte_index = index // 8
bit_offset = 7 - (index % 8)
if byte_index >= len(key):
return 0
return (key[byte_index] >> bit_offset) & 1
def _first_differing_bit(self, key1, key2):
"""Find first bit position where keys differ."""
max_len = max(len(key1), len(key2)) * 8
for i in range(max_len):
if self._get_bit(key1, i) != self._get_bit(key2, i):
return i
return -1
def search(self, key):
if self.root is None:
return None
key_bytes = key.encode() if isinstance(key, str) else key
node = self.root
prev = None
# Traverse until we go "up" (bit_index decreases)
while node is not None and (prev is None or node.bit_index > prev.bit_index):
prev = node
if self._get_bit(key_bytes, node.bit_index) == 0:
node = node.left
else:
node = node.right
# Verify the key matches
if node is not None and node.key == key_bytes:
return node.value
return None
The search operation is elegant: traverse based on bit values until you reach a node whose bit index is less than or equal to the parent’s. This indicates you’ve found a “back edge” pointing to the actual key location.
Key Operations: Insert, Search, Delete
Let’s complete the radix tree with all operations:
class RadixTree:
# ... (previous code) ...
def search(self, key):
node = self.root
i = 0
while i < len(key):
found = False
for edge, child in node.children.items():
if key[i:].startswith(edge):
node = child
i += len(edge)
found = True
break
elif edge.startswith(key[i:]):
# Key is prefix of edge; key not in tree
return None
if not found:
return None
return node.value if node.is_end else None
def delete(self, key):
"""Delete key and merge nodes if possible."""
# Find the node and its parent chain
path = [] # [(parent, edge_label, node), ...]
node = self.root
i = 0
while i < len(key):
for edge, child in node.children.items():
if key[i:].startswith(edge):
path.append((node, edge, child))
node = child
i += len(edge)
break
else:
return False # Key not found
if not node.is_end:
return False
node.is_end = False
node.value = None
# Merge nodes bottom-up
self._try_merge(path)
return True
def _try_merge(self, path):
"""Merge single-child nodes after deletion."""
while path:
parent, edge, node = path.pop()
if node.is_end:
break
if len(node.children) == 0:
# Remove empty leaf
del parent.children[edge]
elif len(node.children) == 1:
# Merge with single child
child_edge, child = next(iter(node.children.items()))
del parent.children[edge]
parent.children[edge + child_edge] = child
Complexity analysis:
- Search: O(k) where k is key length
- Insert: O(k) with potential edge split
- Delete: O(k) with potential node merge
The constant factors are higher than standard tries due to string comparisons, but memory savings usually outweigh this cost.
Radix Tree vs. Patricia Tree: When to Use Each
Choose Patricia trees when:
- Working with binary data or fixed-width keys (IP addresses, hashes)
- Memory is extremely constrained
- Keys share long common prefixes at the bit level
- You need predictable memory usage regardless of key content
Choose radix trees when:
- Working with human-readable strings
- Keys are variable-length text (URLs, words, paths)
- You need to enumerate keys with a common prefix
- Implementation simplicity matters
Patricia trees use less memory (no edge labels stored) but are harder to implement correctly and debug. Radix trees are more intuitive and work naturally with string data.
Real-World Applications
Here’s a practical URL router using a radix tree:
class URLRouter:
def __init__(self):
self.tree = RadixTree()
def add_route(self, path, handler):
# Normalize path
path = path.rstrip('/')
if not path:
path = '/'
self.tree.insert(path, handler)
def match(self, path):
path = path.rstrip('/')
if not path:
path = '/'
return self.tree.search(path)
# Usage
router = URLRouter()
router.add_route('/api/users', 'users_handler')
router.add_route('/api/users/profile', 'profile_handler')
router.add_route('/api/posts', 'posts_handler')
handler = router.match('/api/users/profile') # Returns 'profile_handler'
The Linux kernel uses radix trees for its page cache, mapping file offsets to memory pages. Redis uses them for cluster slot mapping. Go’s httprouter and Rust’s actix-web both use radix trees for URL routing, achieving sub-microsecond route matching.
Performance Considerations and Trade-offs
Compressed tries aren’t always the answer. Consider these trade-offs:
Memory: Radix trees beat standard tries and often beat hash maps for string keys with shared prefixes. However, if keys share no prefixes, you’re paying overhead for no benefit.
Speed: Hash maps offer O(1) average lookup versus O(k) for tries. For short keys or when you don’t need prefix operations, hash maps win.
Cache locality: Radix trees have poor cache behavior compared to arrays or hash tables. Each node access potentially causes a cache miss. For hot paths, consider flattening frequently-accessed subtrees.
When to avoid compressed tries:
- Random, unrelated keys with no shared prefixes
- Extremely short keys (hash map overhead is lower)
- When you only need exact match, never prefix operations
- Memory-mapped or disk-based storage (pointer chasing is expensive)
The right choice depends on your access patterns. Profile with realistic data before committing to any data structure.