System Design: Distributed Locking Mechanisms
The moment you scale beyond a single server, you inherit a fundamental problem: how do you ensure only one process modifies a shared resource at a time? In-process mutexes won't help when your code...
Key Insights
- Distributed locks are fundamentally unreliable—design your system to tolerate lock failures through fencing tokens and idempotent operations, not just through choosing the “right” locking mechanism.
- Redis-based locks (including Redlock) trade consistency for performance; use them for efficiency optimizations, not safety-critical coordination where correctness is paramount.
- Before implementing distributed locking, ask whether you actually need it—often idempotent operations, optimistic concurrency control, or single-writer architectures eliminate the need entirely.
Why Distributed Locking Matters
The moment you scale beyond a single server, you inherit a fundamental problem: how do you ensure only one process modifies a shared resource at a time? In-process mutexes won’t help when your code runs across multiple machines.
Consider a payment processing system where two instances simultaneously process the same transaction. Without coordination, you might charge a customer twice or credit a merchant double. Race conditions in distributed systems don’t just cause bugs—they cause financial losses, data corruption, and angry customers.
Distributed locks provide mutual exclusion across process and machine boundaries. But here’s the uncomfortable truth: they’re harder to get right than most engineers assume. Network partitions, clock skew, and process pauses all conspire to break your assumptions about who holds a lock and for how long.
You need distributed locks when you’re protecting non-idempotent operations on shared state—things like incrementing counters, processing exactly-once jobs, or coordinating leader election. You don’t need them when you can redesign around idempotency or partition your data to avoid contention entirely.
Core Concepts and Lock Properties
A distributed lock must satisfy three properties to be useful:
Mutual exclusion: At most one client holds the lock at any time. This is the whole point.
Deadlock freedom: Even if a lock holder crashes, other clients must eventually acquire the lock. This typically means locks expire automatically.
Fault tolerance: The lock service remains available despite partial failures.
Lock granularity matters enormously. Locking an entire database table creates contention nightmares. Locking individual rows improves concurrency but increases coordination overhead. Choose the coarsest granularity that still allows acceptable parallelism.
TTL-based expiration is essential but dangerous. Set it too short, and your lock expires while you’re still doing work. Set it too long, and crashed processes block others for extended periods. There’s no universal right answer—it depends on your operation’s expected duration and variance.
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Optional
@dataclass
class LockResult:
acquired: bool
token: Optional[str] # Fencing token for validation
expires_at: Optional[float]
class DistributedLock(ABC):
@abstractmethod
def acquire(self, resource: str, ttl_seconds: int) -> LockResult:
"""Attempt to acquire lock. Returns immediately."""
pass
@abstractmethod
def release(self, resource: str, token: str) -> bool:
"""Release lock only if we still hold it (token matches)."""
pass
@abstractmethod
def extend(self, resource: str, token: str, ttl_seconds: int) -> bool:
"""Extend lock TTL if we still hold it."""
pass
The fencing token in this interface isn’t optional—it’s critical for safety. We’ll explore why in the failure handling section.
Redis-Based Locking
Redis is the most common choice for distributed locking because it’s fast, widely deployed, and conceptually simple. The basic approach uses SET with NX (only set if not exists) and PX (expiration in milliseconds):
-- Atomic lock acquisition with Lua script
-- KEYS[1] = lock key, ARGV[1] = token, ARGV[2] = TTL in milliseconds
if redis.call('SET', KEYS[1], ARGV[1], 'NX', 'PX', ARGV[2]) then
return 1
else
return 0
end
import redis
import uuid
import time
class RedisLock:
def __init__(self, client: redis.Redis):
self.client = client
self._acquire_script = self.client.register_script("""
if redis.call('SET', KEYS[1], ARGV[1], 'NX', 'PX', ARGV[2]) then
return 1
else
return 0
end
""")
self._release_script = self.client.register_script("""
if redis.call('GET', KEYS[1]) == ARGV[1] then
return redis.call('DEL', KEYS[1])
else
return 0
end
""")
def acquire(self, resource: str, ttl_ms: int) -> LockResult:
token = str(uuid.uuid4())
acquired = self._acquire_script(keys=[resource], args=[token, ttl_ms])
if acquired:
return LockResult(True, token, time.time() + ttl_ms/1000)
return LockResult(False, None, None)
def release(self, resource: str, token: str) -> bool:
return bool(self._release_script(keys=[resource], args=[token]))
The Lua scripts ensure atomicity—without them, a check-then-delete release operation could delete another client’s lock.
For higher availability, the Redlock algorithm acquires locks across multiple independent Redis instances, requiring a majority (N/2 + 1) to succeed. However, Martin Kleppmann’s analysis revealed fundamental issues: clock drift, process pauses, and network delays can all cause Redlock to violate mutual exclusion.
My recommendation: Use single-instance Redis locks for performance optimizations where occasional duplicate processing is acceptable. Don’t use Redlock for safety-critical operations—its complexity doesn’t buy you the guarantees you need.
ZooKeeper and Consensus-Based Locks
When you need stronger guarantees, consensus-based systems like ZooKeeper, etcd, or Consul provide them through replicated state machines and proper distributed consensus protocols.
ZooKeeper’s locking recipe uses sequential ephemeral nodes:
public class ZooKeeperLock {
private final CuratorFramework client;
private final String lockPath;
private String ourPath;
public ZooKeeperLock(CuratorFramework client, String lockPath) {
this.client = client;
this.lockPath = lockPath;
}
public boolean acquire(long timeout, TimeUnit unit) throws Exception {
// Create sequential ephemeral node
ourPath = client.create()
.creatingParentsIfNeeded()
.withMode(CreateMode.EPHEMERAL_SEQUENTIAL)
.forPath(lockPath + "/lock-");
long deadline = System.currentTimeMillis() + unit.toMillis(timeout);
while (System.currentTimeMillis() < deadline) {
List<String> children = client.getChildren().forPath(lockPath);
Collections.sort(children);
String ourNode = ourPath.substring(ourPath.lastIndexOf('/') + 1);
int ourIndex = children.indexOf(ourNode);
if (ourIndex == 0) {
return true; // We have the lock
}
// Watch the node immediately before us
String watchPath = lockPath + "/" + children.get(ourIndex - 1);
CountDownLatch latch = new CountDownLatch(1);
Stat stat = client.checkExists()
.usingWatcher((Watcher) event -> latch.countDown())
.forPath(watchPath);
if (stat != null) {
latch.await(deadline - System.currentTimeMillis(), TimeUnit.MILLISECONDS);
}
}
// Timeout - clean up
client.delete().forPath(ourPath);
return false;
}
public void release() throws Exception {
if (ourPath != null) {
client.delete().forPath(ourPath);
ourPath = null;
}
}
}
The ephemeral node automatically disappears if our session dies, preventing deadlocks. The sequential ordering ensures fair queuing—clients acquire the lock in the order they requested it.
etcd provides similar guarantees with a simpler API through its lease mechanism. Consul offers distributed locking through its session and KV primitives. All three use Raft consensus, giving you linearizable operations that Redis simply cannot provide.
The tradeoff is latency: consensus requires multiple round trips between nodes. Expect 10-50ms for lock operations versus sub-millisecond for Redis.
Database-Based Locking Strategies
If you’re already running PostgreSQL or MySQL, you have built-in locking primitives that might eliminate the need for additional infrastructure.
PostgreSQL advisory locks are particularly elegant:
import psycopg2
from contextlib import contextmanager
import hashlib
def resource_to_lock_id(resource: str) -> int:
"""Convert resource name to 64-bit lock ID."""
return int(hashlib.sha256(resource.encode()).hexdigest()[:16], 16) & 0x7FFFFFFFFFFFFFFF
@contextmanager
def pg_advisory_lock(conn, resource: str, timeout_seconds: int = 30):
lock_id = resource_to_lock_id(resource)
cursor = conn.cursor()
try:
# Set statement timeout for lock acquisition
cursor.execute(f"SET LOCAL lock_timeout = '{timeout_seconds}s'")
# Attempt to acquire lock (blocks until acquired or timeout)
cursor.execute("SELECT pg_advisory_lock(%s)", (lock_id,))
yield True
except psycopg2.errors.LockNotAvailable:
yield False
finally:
# Release lock (only if we acquired it)
try:
cursor.execute("SELECT pg_advisory_unlock(%s)", (lock_id,))
except:
pass # Connection might be dead
# Usage
with pg_advisory_lock(conn, "payment:12345", timeout_seconds=5) as acquired:
if acquired:
process_payment(12345)
else:
raise TimeoutError("Could not acquire lock")
For optimistic locking, add a version column and use conditional updates:
UPDATE orders
SET status = 'processed', version = version + 1
WHERE id = 12345 AND version = 7;
-- Check affected rows; if 0, someone else modified it
Database locks work well when your critical section already involves database operations. They fail if the database becomes unavailable, but so does your application logic in that case.
Handling Failures and Edge Cases
Here’s the scenario that breaks naive distributed locking: your process acquires a lock, then experiences a long garbage collection pause or network partition. The lock expires. Another process acquires it and starts working. Your original process resumes, unaware its lock expired, and corrupts data.
Fencing tokens solve this. Each lock acquisition returns a monotonically increasing token. Any system accepting writes must reject operations with tokens lower than the highest it’s seen:
class FencedStorage:
def __init__(self):
self.data = {}
self.max_token = {}
def write(self, key: str, value: any, fencing_token: int) -> bool:
current_max = self.max_token.get(key, 0)
if fencing_token < current_max:
# Stale lock holder - reject
return False
self.max_token[key] = fencing_token
self.data[key] = value
return True
# Usage
lock_result = lock.acquire("resource:123", ttl_seconds=30)
if lock_result.acquired:
# The storage layer validates our token
success = storage.write("resource:123", new_value, int(lock_result.token))
if not success:
# Our lock expired and someone else got a newer token
handle_conflict()
For retries, use exponential backoff with jitter to prevent thundering herds:
import random
def acquire_with_retry(lock, resource, ttl, max_attempts=5):
for attempt in range(max_attempts):
result = lock.acquire(resource, ttl)
if result.acquired:
return result
# Exponential backoff with jitter
base_delay = min(0.1 * (2 ** attempt), 5.0)
jitter = random.uniform(0, base_delay * 0.1)
time.sleep(base_delay + jitter)
raise LockAcquisitionTimeout(f"Failed after {max_attempts} attempts")
Choosing the Right Approach
| Requirement | Redis | ZooKeeper/etcd | PostgreSQL |
|---|---|---|---|
| Strong consistency | ❌ | ✅ | ✅ |
| Sub-ms latency | ✅ | ❌ | ❌ |
| No extra infrastructure | ❌ | ❌ | ✅ |
| Survives DB failure | ✅ | ✅ | ❌ |
Use Redis when you need speed and can tolerate occasional duplicate processing—rate limiting, caching coordination, or distributed throttling.
Use ZooKeeper/etcd when correctness matters more than latency—leader election, distributed configuration, or coordination of expensive operations.
Use PostgreSQL when you’re already database-bound and want simplicity—most CRUD applications with modest scale.
Avoid distributed locks entirely when you can. Idempotent operations with unique request IDs often eliminate the need. Single-writer architectures (partition by key, route to specific instances) avoid coordination overhead. CRDTs and event sourcing let you merge concurrent operations instead of preventing them.
Monitor lock acquisition times, hold times, and contention rates. High contention is a design smell—you’re probably locking too coarsely or have a hot partition. Instrument everything, because distributed locking failures are notoriously difficult to debug after the fact.