Consistent Hashing: Distributed Load Balancing

Key Insights

Traditional modulo-based hashing causes catastrophic key redistribution when servers change—consistent hashing reduces this to K/N keys on average, making it essential for distributed systems that need to scale.
Virtual nodes solve the uneven distribution problem inherent in basic consistent hashing, but require careful tuning: too few creates hotspots, too many wastes memory and slows lookups.
Consistent hashing isn’t always the right choice—for static clusters or when you control all clients, simpler approaches like jump consistent hash offer better performance with less complexity.

The Problem with Traditional Hashing

When distributing data across multiple servers, the naive approach uses modulo arithmetic: server = hash(key) % num_servers. This works until you need to add or remove a server.

def naive_hash(key: str, num_servers: int) -> int:
    return hash(key) % num_servers

# Simulate key distribution across 3 servers
keys = ["user:1001", "user:1002", "user:1003", "session:abc", "cache:xyz"]
servers_before = 3
servers_after = 4

print("Before adding server:")
for key in keys:
    print(f"  {key} -> server {naive_hash(key, servers_before)}")

print("\nAfter adding server:")
for key in keys:
    print(f"  {key} -> server {naive_hash(key, servers_after)}")

# Count how many keys moved
moved = sum(1 for k in keys if naive_hash(k, servers_before) != naive_hash(k, servers_after))
print(f"\nKeys that moved: {moved}/{len(keys)}")

Run this with a larger dataset and you’ll find that roughly (N-1)/N keys get remapped when changing from N to N+1 servers. Add a fourth server to a three-server cluster? About 75% of your cache becomes invalid instantly. This triggers a “thundering herd” where your backend database gets hammered by cache misses.

The fundamental issue: modulo arithmetic creates a dependency between every key’s placement and the total server count. We need a scheme where adding or removing a server only affects the keys that must move.

How Consistent Hashing Works

Consistent hashing maps both keys and servers onto the same circular hash space (typically 0 to 2^32-1). Each key is assigned to the first server encountered when walking clockwise from the key’s position on the ring.

import hashlib
from bisect import bisect_right
from typing import Optional

class ConsistentHashRing:
    def __init__(self):
        self.ring: dict[int, str] = {}  # position -> node name
        self.sorted_positions: list[int] = []
    
    def _hash(self, key: str) -> int:
        """Generate a position on the ring (0 to 2^32-1)."""
        digest = hashlib.md5(key.encode()).hexdigest()
        return int(digest[:8], 16)
    
    def add_node(self, node: str) -> None:
        """Add a server node to the ring."""
        position = self._hash(node)
        if position not in self.ring:
            self.ring[position] = node
            self.sorted_positions = sorted(self.ring.keys())
    
    def remove_node(self, node: str) -> None:
        """Remove a server node from the ring."""
        position = self._hash(node)
        if position in self.ring:
            del self.ring[position]
            self.sorted_positions = sorted(self.ring.keys())
    
    def get_node(self, key: str) -> Optional[str]:
        """Find the server responsible for a given key."""
        if not self.ring:
            return None
        
        position = self._hash(key)
        # Find first node position >= key position
        idx = bisect_right(self.sorted_positions, position)
        
        # Wrap around to first node if we're past the last one
        if idx == len(self.sorted_positions):
            idx = 0
        
        return self.ring[self.sorted_positions[idx]]

When a node joins, it takes responsibility for keys between itself and its predecessor—only those keys move. When a node leaves, its keys transfer to its successor. On average, only K/N keys relocate (where K is total keys and N is the number of nodes), compared to K * (N-1)/N with modulo hashing.

Virtual Nodes for Better Distribution

The basic implementation has a critical flaw: nodes aren’t evenly distributed on the ring. With only a few physical servers, you’ll get unbalanced load—one server might handle 40% of traffic while another handles 10%.

Virtual nodes solve this by placing each physical server at multiple positions on the ring:

class ConsistentHashRingWithVnodes:
    def __init__(self, num_vnodes: int = 150):
        self.num_vnodes = num_vnodes
        self.ring: dict[int, str] = {}
        self.sorted_positions: list[int] = []
    
    def _hash(self, key: str) -> int:
        digest = hashlib.md5(key.encode()).hexdigest()
        return int(digest[:8], 16)
    
    def add_node(self, node: str) -> None:
        """Add a node with multiple virtual positions."""
        for i in range(self.num_vnodes):
            vnode_key = f"{node}:vnode{i}"
            position = self._hash(vnode_key)
            self.ring[position] = node  # Maps back to physical node
        self.sorted_positions = sorted(self.ring.keys())
    
    def remove_node(self, node: str) -> None:
        """Remove all virtual positions for a node."""
        for i in range(self.num_vnodes):
            vnode_key = f"{node}:vnode{i}"
            position = self._hash(vnode_key)
            self.ring.pop(position, None)
        self.sorted_positions = sorted(self.ring.keys())
    
    def get_node(self, key: str) -> Optional[str]:
        if not self.ring:
            return None
        
        position = self._hash(key)
        idx = bisect_right(self.sorted_positions, position)
        if idx == len(self.sorted_positions):
            idx = 0
        
        return self.ring[self.sorted_positions[idx]]

The trade-off is memory and lookup performance. With 100 physical nodes and 150 vnodes each, you’re storing 15,000 ring positions. Binary search remains O(log N), but the constant factors increase. In practice, 100-200 vnodes per physical node provides good distribution without significant overhead.

Real-World Implementations

Amazon DynamoDB pioneered consistent hashing at scale. Their implementation uses virtual nodes with a configurable replication factor (N). Each key is stored on N consecutive nodes on the ring, forming a “preference list.” DynamoDB adds coordinator nodes that handle request routing and conflict resolution.

Apache Cassandra adopted DynamoDB’s approach but made virtual nodes a first-class concept. Each Cassandra node defaults to 256 vnodes (configurable via num_tokens). The system uses “partitioners” to determine ring placement—Murmur3Partitioner is the modern default, offering better distribution than MD5.

Memcached clients implement consistent hashing differently. The libmemcached library uses the Ketama algorithm, which places each server at 100-200 points on the ring based on server weight. Notably, memcached servers themselves know nothing about the ring—all routing logic lives in clients.

The key difference: databases like Cassandra need consistent hashing for data placement and replication, while caching systems like memcached use it purely for request routing.

Handling Node Failures and Replication

For durability, data must exist on multiple nodes. The simplest approach walks clockwise from a key’s primary position and selects the next N-1 distinct physical nodes:

def get_replica_nodes(self, key: str, num_replicas: int = 3) -> list[str]:
    """Find N distinct physical nodes for replicating a key."""
    if not self.ring:
        return []
    
    position = self._hash(key)
    idx = bisect_right(self.sorted_positions, position)
    
    replicas = []
    seen_nodes = set()
    positions_checked = 0
    
    while len(replicas) < num_replicas and positions_checked < len(self.sorted_positions):
        if idx >= len(self.sorted_positions):
            idx = 0
        
        node = self.ring[self.sorted_positions[idx]]
        if node not in seen_nodes:
            replicas.append(node)
            seen_nodes.add(node)
        
        idx += 1
        positions_checked += 1
    
    return replicas

This “walk” approach ensures replicas are on different physical machines even when using virtual nodes. Some systems use “jump” placement instead—hashing the key with different seeds to independently select each replica. Jump placement provides better distribution but complicates failure handling since replicas aren’t naturally “next in line.”

Performance Considerations

Lookup time depends on your ring search strategy:

import time
import random

def benchmark_lookup(ring_size: int, num_lookups: int = 10000):
    """Compare linear vs binary search on the ring."""
    positions = sorted(random.sample(range(2**32), ring_size))
    test_keys = [random.randint(0, 2**32) for _ in range(num_lookups)]
    
    # Linear search
    start = time.perf_counter()
    for key in test_keys:
        for i, pos in enumerate(positions):
            if pos >= key:
                break
    linear_time = time.perf_counter() - start
    
    # Binary search
    start = time.perf_counter()
    for key in test_keys:
        bisect_right(positions, key)
    binary_time = time.perf_counter() - start
    
    print(f"Ring size: {ring_size}")
    print(f"  Linear search: {linear_time:.4f}s")
    print(f"  Binary search: {binary_time:.4f}s")
    print(f"  Speedup: {linear_time/binary_time:.1f}x")

benchmark_lookup(1000)
benchmark_lookup(10000)

With binary search, lookups are O(log N) where N is total ring positions (physical nodes × vnodes). Memory overhead is approximately N × (8 bytes for position + pointer to node name). For 1000 physical nodes with 150 vnodes, expect roughly 2-3 MB for the ring structure.

Consistent hashing is overkill when your server topology rarely changes, you control all clients and can coordinate updates, or you have fewer than 5-10 nodes. The complexity isn’t free.

Alternatives and When to Use Them

Rendezvous hashing (highest random weight) computes a score for each key-server pair and picks the highest. No ring structure needed, and it handles weighted servers elegantly. Downside: O(N) lookup time since you evaluate every server. Works well for small clusters.

Jump consistent hash uses a clever mathematical formula requiring only the key and bucket count—no ring storage at all. It’s faster and uses zero memory, but only works when servers are numbered 0 to N-1 and you can’t weight servers differently. Perfect for homogeneous clusters where you control the client.

Bounded-load consistent hashing (used by Google’s Maglev) adds load awareness to prevent hotspots from popular keys. When a node exceeds a threshold, requests spill to the next node. More complex but essential for systems with highly skewed access patterns.

Decision criteria:

Need to handle heterogeneous servers or complex topologies? Use consistent hashing with vnodes.
Small, static cluster with homogeneous servers? Jump consistent hash is simpler and faster.
Extremely skewed workloads? Consider bounded-load variants.
Need weighted distribution with small N? Rendezvous hashing works well.

Consistent hashing remains the go-to choice for large distributed systems because it handles the messy reality of servers joining, leaving, and failing—without requiring coordinated updates across all clients.