System Design: Consistency Models (Strong, Eventual, Causal)

Key Insights

Strong consistency guarantees linearizability but costs you latency and availability—use it only when correctness is non-negotiable, like financial transactions
Eventual consistency trades immediate accuracy for availability and performance, making it ideal for high-scale systems where temporary inconsistency is acceptable
Causal consistency offers a practical middle ground by preserving cause-and-effect relationships without the overhead of global ordering

Introduction: Why Consistency Matters

Every distributed system faces the same fundamental problem: how do you keep data synchronized across multiple nodes when networks are unreliable, nodes fail, and operations happen concurrently?

The CAP theorem tells us we can’t have it all. When a network partition occurs, we must choose between consistency (every read returns the most recent write) and availability (every request receives a response). This isn’t a theoretical concern—it’s a daily reality for any system running across multiple servers.

Consistency models define the contract between your system and its clients. They answer the question: “After I write data, when and where can I expect to read it back?” The model you choose directly impacts your system’s performance, availability, and correctness guarantees.

Let’s examine the three most important consistency models and when to use each.

Strong Consistency

Strong consistency, specifically linearizability, provides the simplest mental model: the system behaves as if there’s only one copy of the data, and all operations happen atomically in some order. Every read returns the value of the most recent write.

This guarantee comes at a cost. To ensure linearizability, nodes must coordinate on every operation. This coordination requires network round-trips, which adds latency. If a majority of nodes become unreachable, the system must stop accepting writes to prevent divergence.

Use strong consistency when correctness is paramount: financial transactions, inventory management, booking systems, or any scenario where “overselling” or double-spending would be catastrophic.

Here’s a simplified distributed lock implementation that demonstrates read-after-write guarantees:

import time
from typing import Optional
import redis

class DistributedLock:
    def __init__(self, redis_client: redis.Redis, lock_name: str, ttl_seconds: int = 10):
        self.redis = redis_client
        self.lock_name = f"lock:{lock_name}"
        self.ttl = ttl_seconds
        self.token: Optional[str] = None
    
    def acquire(self, timeout_seconds: int = 5) -> bool:
        """Acquire lock with strong consistency guarantee using SET NX."""
        self.token = f"{time.time()}:{id(self)}"
        deadline = time.time() + timeout_seconds
        
        while time.time() < deadline:
            # SET NX ensures atomic check-and-set with linearizable semantics
            # WAIT command ensures write is replicated before returning
            acquired = self.redis.set(
                self.lock_name, 
                self.token, 
                nx=True,  # Only set if not exists
                ex=self.ttl
            )
            
            if acquired:
                # Force synchronous replication to replicas
                self.redis.wait(num_replicas=1, timeout=1000)
                return True
            
            time.sleep(0.01)
        
        return False
    
    def release(self) -> bool:
        """Release lock only if we still hold it (atomic check-and-delete)."""
        # Lua script executes atomically on Redis server
        lua_script = """
        if redis.call("GET", KEYS[1]) == ARGV[1] then
            return redis.call("DEL", KEYS[1])
        else
            return 0
        end
        """
        result = self.redis.eval(lua_script, 1, self.lock_name, self.token)
        return result == 1

The WAIT command ensures the write propagates to replicas before returning, providing a read-after-write guarantee even if the primary fails immediately after.

Eventual Consistency

Eventual consistency relaxes the timing guarantee: if no new updates are made, all replicas will eventually converge to the same value. The key word is “eventually”—there’s no bound on how long this takes.

This model enables high availability and partition tolerance. Nodes can accept writes independently without coordinating, then reconcile differences later. The trade-off is that clients might read stale data or see updates in different orders.

Eventual consistency works well for social media feeds, DNS, shopping carts, and any system where temporary inconsistency is acceptable and high availability is critical.

The challenge is conflict resolution. When two nodes accept conflicting writes, how do you reconcile them? Here’s a last-write-wins implementation with vector clocks for detecting conflicts:

from dataclasses import dataclass, field
from typing import Dict, Any, Tuple
import time

@dataclass
class VectorClock:
    clocks: Dict[str, int] = field(default_factory=dict)
    
    def increment(self, node_id: str) -> None:
        self.clocks[node_id] = self.clocks.get(node_id, 0) + 1
    
    def merge(self, other: 'VectorClock') -> 'VectorClock':
        merged = VectorClock()
        all_nodes = set(self.clocks.keys()) | set(other.clocks.keys())
        for node in all_nodes:
            merged.clocks[node] = max(
                self.clocks.get(node, 0),
                other.clocks.get(node, 0)
            )
        return merged
    
    def compare(self, other: 'VectorClock') -> str:
        """Returns: 'before', 'after', 'concurrent', or 'equal'"""
        dominated = dominated_by = True
        
        for node in set(self.clocks.keys()) | set(other.clocks.keys()):
            self_val = self.clocks.get(node, 0)
            other_val = other.clocks.get(node, 0)
            
            if self_val > other_val:
                dominated = False
            if self_val < other_val:
                dominated_by = False
        
        if dominated and dominated_by:
            return 'equal'
        if dominated:
            return 'before'
        if dominated_by:
            return 'after'
        return 'concurrent'


@dataclass
class VersionedValue:
    value: Any
    vector_clock: VectorClock
    timestamp: float  # Wall clock for LWW fallback


class EventuallyConsistentStore:
    def __init__(self, node_id: str):
        self.node_id = node_id
        self.data: Dict[str, VersionedValue] = {}
    
    def write(self, key: str, value: Any) -> VectorClock:
        current = self.data.get(key)
        new_clock = VectorClock() if current is None else VectorClock(current.vector_clock.clocks.copy())
        new_clock.increment(self.node_id)
        
        self.data[key] = VersionedValue(
            value=value,
            vector_clock=new_clock,
            timestamp=time.time()
        )
        return new_clock
    
    def merge_replica(self, key: str, incoming: VersionedValue) -> Tuple[Any, str]:
        """Merge incoming value from another replica. Returns (final_value, resolution)."""
        current = self.data.get(key)
        
        if current is None:
            self.data[key] = incoming
            return incoming.value, 'accepted'
        
        comparison = current.vector_clock.compare(incoming.vector_clock)
        
        if comparison == 'before':
            # Incoming is strictly newer
            self.data[key] = incoming
            return incoming.value, 'accepted'
        
        if comparison == 'after' or comparison == 'equal':
            # Current is newer or equal, keep it
            return current.value, 'rejected'
        
        # Concurrent writes - use last-write-wins based on timestamp
        if incoming.timestamp > current.timestamp:
            merged_clock = current.vector_clock.merge(incoming.vector_clock)
            self.data[key] = VersionedValue(
                value=incoming.value,
                vector_clock=merged_clock,
                timestamp=incoming.timestamp
            )
            return incoming.value, 'lww_accepted'
        
        return current.value, 'lww_rejected'

Vector clocks detect concurrent writes that need resolution. Last-write-wins provides a simple (if imperfect) resolution strategy. More sophisticated systems might use CRDTs or application-specific merge functions.

Causal Consistency

Causal consistency sits between strong and eventual. It guarantees that causally related operations are seen in the correct order by all nodes, while allowing concurrent operations to be seen in different orders.

Two operations are causally related if one could have influenced the other. If I read a value and then write based on it, those operations are causally linked. If two users write independently without seeing each other’s updates, those operations are concurrent.

This model is ideal for collaborative editing, comment threads, and messaging systems where cause-and-effect must be preserved but global ordering isn’t necessary.

Here’s a Lamport timestamp implementation for tracking causality:

package causality

import (
    "sync"
)

type LamportClock struct {
    mu      sync.Mutex
    counter uint64
    nodeID  string
}

func NewLamportClock(nodeID string) *LamportClock {
    return &LamportClock{nodeID: nodeID}
}

func (lc *LamportClock) Tick() uint64 {
    lc.mu.Lock()
    defer lc.mu.Unlock()
    lc.counter++
    return lc.counter
}

func (lc *LamportClock) Update(receivedTimestamp uint64) uint64 {
    lc.mu.Lock()
    defer lc.mu.Unlock()
    if receivedTimestamp > lc.counter {
        lc.counter = receivedTimestamp
    }
    lc.counter++
    return lc.counter
}

type CausalMessage struct {
    ID        string
    Content   string
    Timestamp uint64
    NodeID    string
    DependsOn []string // IDs of messages this one causally depends on
}

type CausalBroadcast struct {
    clock     *LamportClock
    delivered map[string]bool
    pending   []CausalMessage
    mu        sync.Mutex
}

func NewCausalBroadcast(nodeID string) *CausalBroadcast {
    return &CausalBroadcast{
        clock:     NewLamportClock(nodeID),
        delivered: make(map[string]bool),
        pending:   []CausalMessage{},
    }
}

func (cb *CausalBroadcast) Send(content string, dependsOn []string) CausalMessage {
    cb.mu.Lock()
    defer cb.mu.Unlock()
    
    return CausalMessage{
        ID:        generateID(),
        Content:   content,
        Timestamp: cb.clock.Tick(),
        NodeID:    cb.clock.nodeID,
        DependsOn: dependsOn,
    }
}

func (cb *CausalBroadcast) Receive(msg CausalMessage) []CausalMessage {
    cb.mu.Lock()
    defer cb.mu.Unlock()
    
    cb.clock.Update(msg.Timestamp)
    cb.pending = append(cb.pending, msg)
    
    return cb.deliverReady()
}

func (cb *CausalBroadcast) deliverReady() []CausalMessage {
    var delivered []CausalMessage
    changed := true
    
    for changed {
        changed = false
        remaining := []CausalMessage{}
        
        for _, msg := range cb.pending {
            if cb.canDeliver(msg) {
                cb.delivered[msg.ID] = true
                delivered = append(delivered, msg)
                changed = true
            } else {
                remaining = append(remaining, msg)
            }
        }
        cb.pending = remaining
    }
    
    return delivered
}

func (cb *CausalBroadcast) canDeliver(msg CausalMessage) bool {
    for _, depID := range msg.DependsOn {
        if !cb.delivered[depID] {
            return false
        }
    }
    return true
}

Messages are only delivered once all their causal dependencies have been delivered first, ensuring users never see a reply before the original message.

Comparing Models in Practice

Model	Latency	Availability	Partition Tolerance	Use Cases
Strong	High	Lower	Must sacrifice	Financial, inventory
Eventual	Low	High	Full	Social feeds, caching
Causal	Medium	Medium-High	Good	Collaboration, messaging

Real-world databases offer different defaults:

Google Spanner: Strong consistency with TrueTime
Amazon DynamoDB: Eventual by default, strong reads available
CockroachDB: Serializable (strong) by default
Cassandra: Tunable per-operation

Here’s how consistency levels work in Cassandra:

from cassandra.cluster import Cluster
from cassandra.query import SimpleStatement
from cassandra import ConsistencyLevel

cluster = Cluster(['node1', 'node2', 'node3'])
session = cluster.connect('my_keyspace')

# Strong consistency: read and write quorum
write_strong = SimpleStatement(
    "INSERT INTO accounts (id, balance) VALUES (%s, %s)",
    consistency_level=ConsistencyLevel.QUORUM
)
session.execute(write_strong, [account_id, new_balance])

read_strong = SimpleStatement(
    "SELECT balance FROM accounts WHERE id = %s",
    consistency_level=ConsistencyLevel.QUORUM
)

# Eventual consistency: fast writes, possibly stale reads
write_fast = SimpleStatement(
    "INSERT INTO user_activity (user_id, action) VALUES (%s, %s)",
    consistency_level=ConsistencyLevel.ONE
)

Hybrid Approaches and Tunable Consistency

Modern systems rarely use a single consistency model throughout. Instead, they tune consistency per-operation based on requirements.

A shopping cart might use eventual consistency for adding items (high availability) but switch to strong consistency during checkout (correctness). User sessions might tolerate staleness, while payment processing cannot.

class HybridConsistencyClient:
    def __init__(self, strong_store, eventual_store):
        self.strong = strong_store
        self.eventual = eventual_store
    
    def add_to_cart(self, user_id: str, item_id: str):
        # Eventual consistency - prioritize availability
        return self.eventual.write(
            f"cart:{user_id}",
            item_id,
            consistency='ONE'
        )
    
    def checkout(self, user_id: str, payment_info: dict):
        # Strong consistency - correctness required
        cart = self.strong.read(f"cart:{user_id}", consistency='QUORUM')
        inventory_ok = self.strong.atomic_decrement_all(cart.items)
        
        if not inventory_ok:
            raise InsufficientInventoryError()
        
        return self.strong.write(
            f"order:{generate_order_id()}",
            {'items': cart.items, 'payment': payment_info},
            consistency='QUORUM'
        )

Key Takeaways

Choose your consistency model based on your actual requirements, not theoretical ideals:

Start with the question: What happens if a user sees stale data? If the answer is “financial loss” or “safety issue,” you need strong consistency for that operation.
Default to eventual where possible: Most read-heavy workloads tolerate staleness. Don’t pay for consistency you don’t need.
Consider causal for user-facing interactions: Users expect to see their own writes and for conversations to make sense. Causal consistency provides this without global coordination.
Tune per-operation, not per-system: The same database can serve different consistency needs. Use strong consistency surgically.
Test your assumptions: Simulate network partitions and node failures. Verify your system behaves correctly under the consistency model you’ve chosen.

The right consistency model isn’t about picking the “best” one—it’s about understanding your trade-offs and making them explicitly.