Design a Distributed Cache: Memcached/Redis Architecture
Every high-scale system eventually hits the same wall: database latency becomes the bottleneck. Your PostgreSQL instance handles 10,000 queries per second beautifully, but at 50,000 QPS, response...
Key Insights
- Consistent hashing with virtual nodes is the foundation of distributed cache scalability—it minimizes key redistribution when nodes join or leave the cluster
- Redis offers rich data structures and replication at the cost of complexity; Memcached provides raw speed and simplicity when you only need key-value storage
- Cache invalidation remains the hardest problem—use cache-aside with stampede protection as your default pattern, and invest heavily in monitoring hit rates
Introduction: Why Distributed Caching Matters
Every high-scale system eventually hits the same wall: database latency becomes the bottleneck. Your PostgreSQL instance handles 10,000 queries per second beautifully, but at 50,000 QPS, response times spike from 5ms to 500ms. Users notice. Revenue drops.
Distributed caching solves this by keeping frequently accessed data in memory, closer to your application. A well-tuned cache achieves 95%+ hit rates, meaning only 5% of requests touch your database. That’s a 20x reduction in database load.
The cache hit ratio directly determines your system’s performance ceiling. At 90% hit rate, you’re still sending 10% of traffic to the database. At 99%, you’ve reduced database load by another 10x. The math is brutal but simple.
Memcached vs Redis: Choose Memcached when you need pure key-value storage with maximum throughput and minimal operational overhead. Choose Redis when you need data structures (sorted sets, lists, streams), persistence, replication, or Lua scripting. Redis is more complex to operate but offers capabilities that eliminate entire categories of problems.
Core Architecture Components
A distributed cache consists of three layers: client libraries that route requests, cache nodes that store data, and the network protocol connecting them.
Client libraries handle the critical task of key distribution. They maintain a view of the cluster topology and route each key to the correct node. This is where consistent hashing comes in—it maps keys to nodes in a way that minimizes redistribution when the cluster changes.
Cache nodes manage memory allocation. Memcached uses slab allocation: it pre-allocates memory chunks of fixed sizes (64 bytes, 128 bytes, etc.) to avoid fragmentation. Redis uses jemalloc and stores data in type-specific structures optimized for each data type.
The network layer uses simple text or binary protocols. Memcached’s binary protocol reduces parsing overhead. Redis uses RESP (Redis Serialization Protocol), which is human-readable but efficient.
Here’s a consistent hashing implementation that forms the backbone of client-side routing:
import hashlib
from bisect import bisect_left
from typing import List, Optional
class ConsistentHashRing:
def __init__(self, nodes: List[str], virtual_nodes: int = 150):
self.virtual_nodes = virtual_nodes
self.ring: List[int] = []
self.node_map: dict[int, str] = {}
for node in nodes:
self.add_node(node)
def _hash(self, key: str) -> int:
return int(hashlib.md5(key.encode()).hexdigest(), 16)
def add_node(self, node: str) -> None:
for i in range(self.virtual_nodes):
virtual_key = f"{node}:vn{i}"
hash_val = self._hash(virtual_key)
self.ring.append(hash_val)
self.node_map[hash_val] = node
self.ring.sort()
def remove_node(self, node: str) -> None:
for i in range(self.virtual_nodes):
virtual_key = f"{node}:vn{i}"
hash_val = self._hash(virtual_key)
self.ring.remove(hash_val)
del self.node_map[hash_val]
def get_node(self, key: str) -> Optional[str]:
if not self.ring:
return None
hash_val = self._hash(key)
idx = bisect_left(self.ring, hash_val)
if idx == len(self.ring):
idx = 0
return self.node_map[self.ring[idx]]
The virtual_nodes parameter is crucial. With only physical nodes on the ring, key distribution becomes uneven. 150 virtual nodes per physical node provides good balance while keeping memory overhead reasonable.
Data Distribution Strategies
Hash-based partitioning divides the keyspace into slots. Redis Cluster uses 16,384 hash slots—each key hashes to a slot, and each slot maps to a node. This provides a clean abstraction for rebalancing: move slots between nodes rather than individual keys.
def get_hash_slot(key: str, total_slots: int = 16384) -> int:
"""Calculate Redis Cluster compatible hash slot."""
# Handle hash tags: {user:1000}.profile -> hash only "user:1000"
start = key.find('{')
if start != -1:
end = key.find('}', start + 1)
if end != -1 and end != start + 1:
key = key[start + 1:end]
# CRC16 implementation (simplified)
crc = 0
for char in key.encode():
crc = ((crc << 8) & 0xFFFF) ^ CRC16_TABLE[(crc >> 8) ^ char]
return crc % total_slots
def assign_slots_to_nodes(nodes: List[str], total_slots: int = 16384) -> dict:
"""Evenly distribute slots across nodes."""
slots_per_node = total_slots // len(nodes)
assignment = {}
for i, node in enumerate(nodes):
start = i * slots_per_node
end = start + slots_per_node if i < len(nodes) - 1 else total_slots
assignment[node] = list(range(start, end))
return assignment
Routing approaches differ significantly. Client-side routing (used by most Redis clients) keeps routing logic in the application. Proxy-based routing (twemproxy, Envoy) centralizes routing, simplifying clients but adding a network hop. Redis Cluster uses a hybrid: clients cache slot mappings but nodes redirect on misses.
Replication and High Availability
Redis implements asynchronous master-replica replication. Writes go to the master, which streams changes to replicas. This means replicas can serve stale reads—usually acceptable for caching, but understand the tradeoff.
Redis Sentinel provides automatic failover:
# sentinel.conf
sentinel monitor mymaster 192.168.1.10 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1
# Notification script for alerting
sentinel notification-script mymaster /opt/scripts/notify.sh
from redis.sentinel import Sentinel
sentinel = Sentinel([
('sentinel1.example.com', 26379),
('sentinel2.example.com', 26379),
('sentinel3.example.com', 26379)
], socket_timeout=0.5)
# Get master connection (follows failovers automatically)
master = sentinel.master_for('mymaster', socket_timeout=0.5)
master.set('key', 'value')
# Get replica for reads
replica = sentinel.slave_for('mymaster', socket_timeout=0.5)
value = replica.get('key')
Memcached takes a different approach: nodes are stateless and independent. High availability comes from client-side redundancy—store each key on multiple nodes and read from any available copy. This is simpler operationally but wastes memory.
Eviction Policies and Memory Management
When memory fills up, the cache must evict entries. The eviction policy determines which entries die.
LRU (Least Recently Used) evicts entries that haven’t been accessed recently. It’s the default for good reason—temporal locality is real.
LFU (Least Frequently Used) evicts entries with the lowest access count. Better for workloads with stable hot keys but can trap stale popular entries.
TTL-based expiration removes entries after a fixed time. Essential for correctness when data changes, but doesn’t help with memory pressure.
from collections import OrderedDict
from time import time
from typing import Any, Optional
class LRUCache:
def __init__(self, max_size: int, default_ttl: int = 300):
self.max_size = max_size
self.default_ttl = default_ttl
self.cache: OrderedDict[str, tuple[Any, float]] = OrderedDict()
self.eviction_count = 0
def get(self, key: str) -> Optional[Any]:
if key not in self.cache:
return None
value, expires_at = self.cache[key]
if time() > expires_at:
del self.cache[key]
return None
# Move to end (most recently used)
self.cache.move_to_end(key)
return value
def set(self, key: str, value: Any, ttl: Optional[int] = None) -> None:
ttl = ttl or self.default_ttl
expires_at = time() + ttl
if key in self.cache:
self.cache.move_to_end(key)
self.cache[key] = (value, expires_at)
# Evict if over capacity
while len(self.cache) > self.max_size:
self.cache.popitem(last=False)
self.eviction_count += 1
Sizing your cache: Estimate your working set—the data accessed within a time window (often 24 hours). Your cache should hold at least 80% of the working set. Monitor eviction rates; if they’re high, you need more memory or better TTLs.
Consistency and Cache Invalidation Patterns
Cache-aside is the safest default pattern. The application checks the cache first, falls back to the database on miss, and populates the cache:
import redis
import threading
from typing import Any, Callable, Optional
class CacheAside:
def __init__(self, redis_client: redis.Redis, lock_timeout: int = 5):
self.redis = redis_client
self.lock_timeout = lock_timeout
def get_with_stampede_protection(
self,
key: str,
fetch_fn: Callable[[], Any],
ttl: int = 300
) -> Any:
# Try cache first
cached = self.redis.get(key)
if cached is not None:
return cached
# Acquire lock to prevent stampede
lock_key = f"lock:{key}"
lock_acquired = self.redis.set(
lock_key, "1",
nx=True,
ex=self.lock_timeout
)
if lock_acquired:
try:
# Double-check cache (another thread might have populated it)
cached = self.redis.get(key)
if cached is not None:
return cached
# Fetch from source and cache
value = fetch_fn()
self.redis.setex(key, ttl, value)
return value
finally:
self.redis.delete(lock_key)
else:
# Wait for the lock holder to populate cache
for _ in range(50): # 5 seconds max wait
time.sleep(0.1)
cached = self.redis.get(key)
if cached is not None:
return cached
# Timeout: fetch directly (degraded mode)
return fetch_fn()
The thundering herd problem occurs when a popular key expires and hundreds of requests simultaneously hit the database. The lock-based approach above prevents this by serializing cache population.
For distributed invalidation, Redis Pub/Sub works well:
# Publisher (on data change)
redis_client.publish('cache:invalidate', 'user:1000')
# Subscriber (on each app server)
pubsub = redis_client.pubsub()
pubsub.subscribe('cache:invalidate')
for message in pubsub.listen():
if message['type'] == 'message':
key = message['data']
local_cache.delete(key)
Production Considerations
Monitor these metrics religiously:
- Hit rate: Below 90% indicates sizing or TTL problems
- Latency p99: Should stay under 5ms; spikes indicate network or memory issues
- Eviction rate: High rates mean you need more memory
- Connection count: Watch for leaks
from prometheus_client import Counter, Histogram, Gauge
import functools
cache_hits = Counter('cache_hits_total', 'Cache hit count', ['cache_name'])
cache_misses = Counter('cache_misses_total', 'Cache miss count', ['cache_name'])
cache_latency = Histogram('cache_operation_seconds', 'Cache operation latency',
['cache_name', 'operation'])
cache_size = Gauge('cache_size_bytes', 'Cache memory usage', ['cache_name'])
def instrumented_cache_get(cache_name: str):
def decorator(func):
@functools.wraps(func)
def wrapper(key: str, *args, **kwargs):
with cache_latency.labels(cache_name, 'get').time():
result = func(key, *args, **kwargs)
if result is not None:
cache_hits.labels(cache_name).inc()
else:
cache_misses.labels(cache_name).inc()
return result
return wrapper
return decorator
Connection pooling is non-negotiable. Creating connections per request adds 1-2ms latency and exhausts file descriptors under load. Use connection pools sized to your concurrency level.
Security: Enable AUTH in Redis, use TLS for cross-datacenter traffic, and isolate cache networks from the public internet. A compromised cache is a compromised system.
Distributed caching is deceptively simple in concept but demands attention to detail in practice. Start with cache-aside, measure everything, and add complexity only when the metrics demand it.