Consistent Hashing: Distributed Systems Application
When distributing data across multiple servers, the naive approach uses modulo arithmetic: `server = hash(key) % server_count`. This works beautifully until you add or remove a server.
Key Insights
- Consistent hashing minimizes key redistribution when nodes change—only K/N keys move on average, compared to nearly 100% with modulo hashing
- Virtual nodes are essential for production deployments; without them, you’ll see severe load imbalances that defeat the purpose of distributed systems
- The algorithm is deceptively simple but implementation details matter: hash function choice, vnode count, and replication strategy determine real-world performance
The Problem with Traditional Hashing
When distributing data across multiple servers, the naive approach uses modulo arithmetic: server = hash(key) % server_count. This works beautifully until you add or remove a server.
Consider a cache cluster with 4 servers. You hash keys and distribute them evenly. Then traffic spikes, you add a fifth server, and suddenly 80% of your cache misses. Your database gets hammered. Users complain. You question your career choices.
def modulo_hash(key: str, server_count: int) -> int:
return hash(key) % server_count
# Simulate key distribution before and after adding a server
keys = [f"user:{i}" for i in range(1000)]
before = {k: modulo_hash(k, 4) for k in keys}
after = {k: modulo_hash(k, 5) for k in keys}
# Count how many keys moved
moved = sum(1 for k in keys if before[k] != after[k])
print(f"Keys moved: {moved}/1000 ({moved/10}%)")
# Output: Keys moved: 800/1000 (80.0%)
The math is brutal. When you change from N to N+1 servers, approximately N/(N+1) keys get reassigned. Going from 4 to 5 servers means 80% redistribution. From 99 to 100 servers? Still 99% of keys move. This is the rehashing problem, and it makes horizontal scaling painful.
How Consistent Hashing Works
Consistent hashing solves this by arranging the hash space as a ring. Both nodes and keys are hashed onto this ring, and each key belongs to the first node encountered when walking clockwise from the key’s position.
The critical insight: when a node joins or leaves, only the keys in its immediate neighborhood are affected. Everything else stays put.
import hashlib
from bisect import bisect_right
from typing import Optional
class ConsistentHashRing:
def __init__(self):
self.ring: list[int] = [] # Sorted positions on the ring
self.nodes: dict[int, str] = {} # Position -> node mapping
def _hash(self, key: str) -> int:
"""Hash a key to a position on the ring (0 to 2^32-1)."""
return int(hashlib.md5(key.encode()).hexdigest(), 16) % (2**32)
def add_node(self, node: str) -> None:
"""Add a node to the ring."""
position = self._hash(node)
if position not in self.nodes:
self.ring.append(position)
self.ring.sort()
self.nodes[position] = node
def remove_node(self, node: str) -> None:
"""Remove a node from the ring."""
position = self._hash(node)
if position in self.nodes:
self.ring.remove(position)
del self.nodes[position]
def get_node(self, key: str) -> Optional[str]:
"""Find the node responsible for a given key."""
if not self.ring:
return None
position = self._hash(key)
# Find the first node position >= key position
idx = bisect_right(self.ring, position)
# Wrap around to the first node if we've gone past the end
if idx == len(self.ring):
idx = 0
return self.nodes[self.ring[idx]]
When a node joins, it takes responsibility for keys between itself and its predecessor. When it leaves, its successor absorbs those keys. On average, only K/N keys move (where K is total keys and N is node count)—a massive improvement over modulo hashing.
Virtual Nodes: Solving Uneven Distribution
The basic ring has a problem: with few nodes, random hash placement creates severe imbalances. One node might own 40% of the ring while another owns 10%. This defeats the purpose of distribution.
Virtual nodes fix this by giving each physical node multiple positions on the ring. Instead of hashing “server-1” once, you hash “server-1-0”, “server-1-1”, through “server-1-99”. More positions mean better statistical distribution.
class VNodeHashRing:
def __init__(self, vnode_count: int = 150):
self.vnode_count = vnode_count
self.ring: list[int] = []
self.nodes: dict[int, str] = {} # Position -> physical node
self.physical_nodes: set[str] = set()
def _hash(self, key: str) -> int:
return int(hashlib.md5(key.encode()).hexdigest(), 16) % (2**32)
def add_node(self, node: str) -> None:
"""Add a physical node with multiple virtual nodes."""
if node in self.physical_nodes:
return
self.physical_nodes.add(node)
for i in range(self.vnode_count):
vnode_key = f"{node}:vnode{i}"
position = self._hash(vnode_key)
self.ring.append(position)
self.nodes[position] = node
self.ring.sort()
def remove_node(self, node: str) -> None:
"""Remove a physical node and all its virtual nodes."""
if node not in self.physical_nodes:
return
self.physical_nodes.remove(node)
positions_to_remove = [
pos for pos, n in self.nodes.items() if n == node
]
for pos in positions_to_remove:
self.ring.remove(pos)
del self.nodes[pos]
def get_node(self, key: str) -> Optional[str]:
if not self.ring:
return None
position = self._hash(key)
idx = bisect_right(self.ring, position) % len(self.ring)
return self.nodes[self.ring[idx]]
The trade-off is memory and lookup time. More vnodes means better balance but larger ring structures. In practice, 100-200 vnodes per physical node works well. Cassandra defaults to 256. Beyond that, you hit diminishing returns.
Real-World Applications
Consistent hashing isn’t academic—it’s infrastructure.
Amazon DynamoDB uses consistent hashing for partition placement. Each partition maps to a position on the ring, and adding storage nodes redistributes only adjacent partitions. This enables seamless scaling without downtime.
Apache Cassandra builds its entire data distribution model on consistent hashing with vnodes. The partitioner determines ring position, and the num_tokens setting controls vnodes per node. Cassandra’s approach handles heterogeneous hardware by assigning more vnodes to beefier machines.
CDN routing at companies like Akamai uses consistent hashing to map content to edge servers. When a server fails, only its content needs re-routing. Users experience minimal cache misses.
Memcached and Redis clusters rely on client-side consistent hashing. The client hashes each key, determines the responsible server, and sends the request directly. Libraries like ketama (the original consistent hashing implementation for memcached) made this approach popular.
Implementation Considerations
Production implementations need more than the basic algorithm.
Hash function selection matters. MD5 works but is slow. MurmurHash3 and xxHash offer better performance with good distribution. For cryptographic requirements, use SHA-256, but you rarely need that for load balancing.
Weighted nodes handle heterogeneous clusters. Give a 64GB server twice the vnodes of a 32GB server:
class WeightedHashRing:
def __init__(self, base_vnodes: int = 100):
self.base_vnodes = base_vnodes
self.ring: list[int] = []
self.nodes: dict[int, str] = {}
self.node_weights: dict[str, int] = {}
def _hash(self, key: str) -> int:
# Using a faster hash for production
return int(hashlib.sha1(key.encode()).hexdigest(), 16) % (2**32)
def add_node(self, node: str, weight: int = 1) -> None:
"""Add a node with specified weight (multiplier for vnodes)."""
self.node_weights[node] = weight
vnode_count = self.base_vnodes * weight
for i in range(vnode_count):
position = self._hash(f"{node}:vn{i}")
self.ring.append(position)
self.nodes[position] = node
self.ring.sort()
def get_replicas(self, key: str, count: int = 3) -> list[str]:
"""Get multiple nodes for replication."""
if not self.ring:
return []
position = self._hash(key)
idx = bisect_right(self.ring, position) % len(self.ring)
replicas = []
seen = set()
while len(replicas) < count and len(seen) < len(self.node_weights):
node = self.nodes[self.ring[idx]]
if node not in seen:
replicas.append(node)
seen.add(node)
idx = (idx + 1) % len(self.ring)
return replicas
Replication uses successor nodes on the ring. For replication factor 3, store data on the primary node and the next two distinct physical nodes clockwise. This provides fault tolerance while maintaining locality.
Failure Handling and Rebalancing
Node failures are inevitable. Consistent hashing makes recovery straightforward: the failed node’s successor absorbs its keyspace. No coordination required.
def simulate_failure(ring: WeightedHashRing, keys: list[str]):
"""Demonstrate minimal key movement during node failure."""
# Record initial assignments
initial = {k: ring.get_replicas(k, 1)[0] for k in keys}
# Simulate node failure
failed_node = "server-2"
ring_copy = WeightedHashRing()
for node, weight in ring.node_weights.items():
if node != failed_node:
ring_copy.add_node(node, weight)
# Record new assignments
after_failure = {k: ring_copy.get_replicas(k, 1)[0] for k in keys}
# Count affected keys
moved = sum(1 for k in keys if initial[k] != after_failure[k])
affected_by_failure = sum(1 for k in keys if initial[k] == failed_node)
print(f"Total keys: {len(keys)}")
print(f"Keys on failed node: {affected_by_failure}")
print(f"Keys that moved: {moved}")
print(f"Only failed node's keys moved: {moved == affected_by_failure}")
# Usage
ring = WeightedHashRing(base_vnodes=100)
for i in range(5):
ring.add_node(f"server-{i}", weight=1)
keys = [f"key:{i}" for i in range(10000)]
simulate_failure(ring, keys)
Graceful node addition follows the same principle. The new node announces itself, takes ownership of its keyspace, and the previous owner transfers relevant data. During transfer, both nodes can serve reads—eventual consistency handles the overlap.
When Not to Use Consistent Hashing
Consistent hashing isn’t always the answer.
Small, static clusters don’t need it. If you have 3 database replicas that never change, modulo hashing is simpler and just as effective.
Ordered data access conflicts with hash-based distribution. If you need range queries, consistent hashing scatters sequential keys across nodes. Use range-based partitioning instead.
Rendezvous hashing (highest random weight) offers an alternative with simpler implementation and no ring structure. Each key computes a score for every node and picks the highest. It’s O(N) per lookup but handles node changes gracefully.
Jump consistent hash provides perfect balance with minimal memory—just two integers of state. The trade-off: it only supports sequential node numbering (0, 1, 2…), making it unsuitable for named nodes or non-uniform weights.
Choose consistent hashing when you have dynamic clusters, need predictable key movement during scaling, and can tolerate the complexity. For everything else, simpler solutions exist.