System Design: Consistency Models (Strong, Eventual, Causal)
Every distributed system faces the same fundamental problem: how do you keep data synchronized across multiple nodes when networks are unreliable, nodes fail, and operations happen concurrently?
Key Insights
- Strong consistency guarantees linearizability but costs you latency and availability—use it only when correctness is non-negotiable, like financial transactions
- Eventual consistency trades immediate accuracy for availability and performance, making it ideal for high-scale systems where temporary inconsistency is acceptable
- Causal consistency offers a practical middle ground by preserving cause-and-effect relationships without the overhead of global ordering
Introduction: Why Consistency Matters
Every distributed system faces the same fundamental problem: how do you keep data synchronized across multiple nodes when networks are unreliable, nodes fail, and operations happen concurrently?
The CAP theorem tells us we can’t have it all. When a network partition occurs, we must choose between consistency (every read returns the most recent write) and availability (every request receives a response). This isn’t a theoretical concern—it’s a daily reality for any system running across multiple servers.
Consistency models define the contract between your system and its clients. They answer the question: “After I write data, when and where can I expect to read it back?” The model you choose directly impacts your system’s performance, availability, and correctness guarantees.
Let’s examine the three most important consistency models and when to use each.
Strong Consistency
Strong consistency, specifically linearizability, provides the simplest mental model: the system behaves as if there’s only one copy of the data, and all operations happen atomically in some order. Every read returns the value of the most recent write.
This guarantee comes at a cost. To ensure linearizability, nodes must coordinate on every operation. This coordination requires network round-trips, which adds latency. If a majority of nodes become unreachable, the system must stop accepting writes to prevent divergence.
Use strong consistency when correctness is paramount: financial transactions, inventory management, booking systems, or any scenario where “overselling” or double-spending would be catastrophic.
Here’s a simplified distributed lock implementation that demonstrates read-after-write guarantees:
import time
from typing import Optional
import redis
class DistributedLock:
def __init__(self, redis_client: redis.Redis, lock_name: str, ttl_seconds: int = 10):
self.redis = redis_client
self.lock_name = f"lock:{lock_name}"
self.ttl = ttl_seconds
self.token: Optional[str] = None
def acquire(self, timeout_seconds: int = 5) -> bool:
"""Acquire lock with strong consistency guarantee using SET NX."""
self.token = f"{time.time()}:{id(self)}"
deadline = time.time() + timeout_seconds
while time.time() < deadline:
# SET NX ensures atomic check-and-set with linearizable semantics
# WAIT command ensures write is replicated before returning
acquired = self.redis.set(
self.lock_name,
self.token,
nx=True, # Only set if not exists
ex=self.ttl
)
if acquired:
# Force synchronous replication to replicas
self.redis.wait(num_replicas=1, timeout=1000)
return True
time.sleep(0.01)
return False
def release(self) -> bool:
"""Release lock only if we still hold it (atomic check-and-delete)."""
# Lua script executes atomically on Redis server
lua_script = """
if redis.call("GET", KEYS[1]) == ARGV[1] then
return redis.call("DEL", KEYS[1])
else
return 0
end
"""
result = self.redis.eval(lua_script, 1, self.lock_name, self.token)
return result == 1
The WAIT command ensures the write propagates to replicas before returning, providing a read-after-write guarantee even if the primary fails immediately after.
Eventual Consistency
Eventual consistency relaxes the timing guarantee: if no new updates are made, all replicas will eventually converge to the same value. The key word is “eventually”—there’s no bound on how long this takes.
This model enables high availability and partition tolerance. Nodes can accept writes independently without coordinating, then reconcile differences later. The trade-off is that clients might read stale data or see updates in different orders.
Eventual consistency works well for social media feeds, DNS, shopping carts, and any system where temporary inconsistency is acceptable and high availability is critical.
The challenge is conflict resolution. When two nodes accept conflicting writes, how do you reconcile them? Here’s a last-write-wins implementation with vector clocks for detecting conflicts:
from dataclasses import dataclass, field
from typing import Dict, Any, Tuple
import time
@dataclass
class VectorClock:
clocks: Dict[str, int] = field(default_factory=dict)
def increment(self, node_id: str) -> None:
self.clocks[node_id] = self.clocks.get(node_id, 0) + 1
def merge(self, other: 'VectorClock') -> 'VectorClock':
merged = VectorClock()
all_nodes = set(self.clocks.keys()) | set(other.clocks.keys())
for node in all_nodes:
merged.clocks[node] = max(
self.clocks.get(node, 0),
other.clocks.get(node, 0)
)
return merged
def compare(self, other: 'VectorClock') -> str:
"""Returns: 'before', 'after', 'concurrent', or 'equal'"""
dominated = dominated_by = True
for node in set(self.clocks.keys()) | set(other.clocks.keys()):
self_val = self.clocks.get(node, 0)
other_val = other.clocks.get(node, 0)
if self_val > other_val:
dominated = False
if self_val < other_val:
dominated_by = False
if dominated and dominated_by:
return 'equal'
if dominated:
return 'before'
if dominated_by:
return 'after'
return 'concurrent'
@dataclass
class VersionedValue:
value: Any
vector_clock: VectorClock
timestamp: float # Wall clock for LWW fallback
class EventuallyConsistentStore:
def __init__(self, node_id: str):
self.node_id = node_id
self.data: Dict[str, VersionedValue] = {}
def write(self, key: str, value: Any) -> VectorClock:
current = self.data.get(key)
new_clock = VectorClock() if current is None else VectorClock(current.vector_clock.clocks.copy())
new_clock.increment(self.node_id)
self.data[key] = VersionedValue(
value=value,
vector_clock=new_clock,
timestamp=time.time()
)
return new_clock
def merge_replica(self, key: str, incoming: VersionedValue) -> Tuple[Any, str]:
"""Merge incoming value from another replica. Returns (final_value, resolution)."""
current = self.data.get(key)
if current is None:
self.data[key] = incoming
return incoming.value, 'accepted'
comparison = current.vector_clock.compare(incoming.vector_clock)
if comparison == 'before':
# Incoming is strictly newer
self.data[key] = incoming
return incoming.value, 'accepted'
if comparison == 'after' or comparison == 'equal':
# Current is newer or equal, keep it
return current.value, 'rejected'
# Concurrent writes - use last-write-wins based on timestamp
if incoming.timestamp > current.timestamp:
merged_clock = current.vector_clock.merge(incoming.vector_clock)
self.data[key] = VersionedValue(
value=incoming.value,
vector_clock=merged_clock,
timestamp=incoming.timestamp
)
return incoming.value, 'lww_accepted'
return current.value, 'lww_rejected'
Vector clocks detect concurrent writes that need resolution. Last-write-wins provides a simple (if imperfect) resolution strategy. More sophisticated systems might use CRDTs or application-specific merge functions.
Causal Consistency
Causal consistency sits between strong and eventual. It guarantees that causally related operations are seen in the correct order by all nodes, while allowing concurrent operations to be seen in different orders.
Two operations are causally related if one could have influenced the other. If I read a value and then write based on it, those operations are causally linked. If two users write independently without seeing each other’s updates, those operations are concurrent.
This model is ideal for collaborative editing, comment threads, and messaging systems where cause-and-effect must be preserved but global ordering isn’t necessary.
Here’s a Lamport timestamp implementation for tracking causality:
package causality
import (
"sync"
)
type LamportClock struct {
mu sync.Mutex
counter uint64
nodeID string
}
func NewLamportClock(nodeID string) *LamportClock {
return &LamportClock{nodeID: nodeID}
}
func (lc *LamportClock) Tick() uint64 {
lc.mu.Lock()
defer lc.mu.Unlock()
lc.counter++
return lc.counter
}
func (lc *LamportClock) Update(receivedTimestamp uint64) uint64 {
lc.mu.Lock()
defer lc.mu.Unlock()
if receivedTimestamp > lc.counter {
lc.counter = receivedTimestamp
}
lc.counter++
return lc.counter
}
type CausalMessage struct {
ID string
Content string
Timestamp uint64
NodeID string
DependsOn []string // IDs of messages this one causally depends on
}
type CausalBroadcast struct {
clock *LamportClock
delivered map[string]bool
pending []CausalMessage
mu sync.Mutex
}
func NewCausalBroadcast(nodeID string) *CausalBroadcast {
return &CausalBroadcast{
clock: NewLamportClock(nodeID),
delivered: make(map[string]bool),
pending: []CausalMessage{},
}
}
func (cb *CausalBroadcast) Send(content string, dependsOn []string) CausalMessage {
cb.mu.Lock()
defer cb.mu.Unlock()
return CausalMessage{
ID: generateID(),
Content: content,
Timestamp: cb.clock.Tick(),
NodeID: cb.clock.nodeID,
DependsOn: dependsOn,
}
}
func (cb *CausalBroadcast) Receive(msg CausalMessage) []CausalMessage {
cb.mu.Lock()
defer cb.mu.Unlock()
cb.clock.Update(msg.Timestamp)
cb.pending = append(cb.pending, msg)
return cb.deliverReady()
}
func (cb *CausalBroadcast) deliverReady() []CausalMessage {
var delivered []CausalMessage
changed := true
for changed {
changed = false
remaining := []CausalMessage{}
for _, msg := range cb.pending {
if cb.canDeliver(msg) {
cb.delivered[msg.ID] = true
delivered = append(delivered, msg)
changed = true
} else {
remaining = append(remaining, msg)
}
}
cb.pending = remaining
}
return delivered
}
func (cb *CausalBroadcast) canDeliver(msg CausalMessage) bool {
for _, depID := range msg.DependsOn {
if !cb.delivered[depID] {
return false
}
}
return true
}
Messages are only delivered once all their causal dependencies have been delivered first, ensuring users never see a reply before the original message.
Comparing Models in Practice
| Model | Latency | Availability | Partition Tolerance | Use Cases |
|---|---|---|---|---|
| Strong | High | Lower | Must sacrifice | Financial, inventory |
| Eventual | Low | High | Full | Social feeds, caching |
| Causal | Medium | Medium-High | Good | Collaboration, messaging |
Real-world databases offer different defaults:
- Google Spanner: Strong consistency with TrueTime
- Amazon DynamoDB: Eventual by default, strong reads available
- CockroachDB: Serializable (strong) by default
- Cassandra: Tunable per-operation
Here’s how consistency levels work in Cassandra:
from cassandra.cluster import Cluster
from cassandra.query import SimpleStatement
from cassandra import ConsistencyLevel
cluster = Cluster(['node1', 'node2', 'node3'])
session = cluster.connect('my_keyspace')
# Strong consistency: read and write quorum
write_strong = SimpleStatement(
"INSERT INTO accounts (id, balance) VALUES (%s, %s)",
consistency_level=ConsistencyLevel.QUORUM
)
session.execute(write_strong, [account_id, new_balance])
read_strong = SimpleStatement(
"SELECT balance FROM accounts WHERE id = %s",
consistency_level=ConsistencyLevel.QUORUM
)
# Eventual consistency: fast writes, possibly stale reads
write_fast = SimpleStatement(
"INSERT INTO user_activity (user_id, action) VALUES (%s, %s)",
consistency_level=ConsistencyLevel.ONE
)
Hybrid Approaches and Tunable Consistency
Modern systems rarely use a single consistency model throughout. Instead, they tune consistency per-operation based on requirements.
A shopping cart might use eventual consistency for adding items (high availability) but switch to strong consistency during checkout (correctness). User sessions might tolerate staleness, while payment processing cannot.
class HybridConsistencyClient:
def __init__(self, strong_store, eventual_store):
self.strong = strong_store
self.eventual = eventual_store
def add_to_cart(self, user_id: str, item_id: str):
# Eventual consistency - prioritize availability
return self.eventual.write(
f"cart:{user_id}",
item_id,
consistency='ONE'
)
def checkout(self, user_id: str, payment_info: dict):
# Strong consistency - correctness required
cart = self.strong.read(f"cart:{user_id}", consistency='QUORUM')
inventory_ok = self.strong.atomic_decrement_all(cart.items)
if not inventory_ok:
raise InsufficientInventoryError()
return self.strong.write(
f"order:{generate_order_id()}",
{'items': cart.items, 'payment': payment_info},
consistency='QUORUM'
)
Key Takeaways
Choose your consistency model based on your actual requirements, not theoretical ideals:
-
Start with the question: What happens if a user sees stale data? If the answer is “financial loss” or “safety issue,” you need strong consistency for that operation.
-
Default to eventual where possible: Most read-heavy workloads tolerate staleness. Don’t pay for consistency you don’t need.
-
Consider causal for user-facing interactions: Users expect to see their own writes and for conversations to make sense. Causal consistency provides this without global coordination.
-
Tune per-operation, not per-system: The same database can serve different consistency needs. Use strong consistency surgically.
-
Test your assumptions: Simulate network partitions and node failures. Verify your system behaves correctly under the consistency model you’ve chosen.
The right consistency model isn’t about picking the “best” one—it’s about understanding your trade-offs and making them explicitly.