System Design: CAP Theorem and Trade-offs

Key Insights

CAP theorem forces a choice between consistency and availability only during network partitions—understanding when partitions occur helps you make informed trade-offs rather than blanket decisions.
Most modern distributed databases offer tunable consistency, letting you adjust the CAP trade-off per operation rather than being locked into a single configuration.
The PACELC theorem extends CAP by acknowledging that even during normal operation, you’re trading latency for consistency—a trade-off that affects every request, not just failure scenarios.

Introduction to CAP Theorem

In 2000, Eric Brewer presented a conjecture at the ACM Symposium on Principles of Distributed Computing that would fundamentally shape how we think about distributed systems. Two years later, Seth Gilbert and Nancy Lynch formally proved it. The CAP theorem states that a distributed data store can provide at most two of three guarantees: Consistency, Availability, and Partition Tolerance.

Here’s the critical insight most developers miss: this isn’t about choosing two properties for your system design. It’s about what happens during a network partition. Since network partitions are inevitable in distributed systems, you’re really choosing between consistency and availability when things go wrong.

Understanding CAP isn’t academic—it directly impacts how you design systems, choose databases, and configure replication. Get it wrong, and you’ll either lose data or serve stale results when your system is under stress.

Understanding Each Property

Consistency means every read receives the most recent write or an error. All nodes in the system see the same data at the same time. Think of it like a bank account: if you deposit $100, every ATM should immediately show your updated balance.

Availability means every request receives a non-error response, without guarantee that it contains the most recent write. The system always responds, even if the data might be slightly stale. Think of a social media feed—you’d rather see posts from five minutes ago than get an error page.

Partition Tolerance means the system continues operating despite network failures between nodes. Messages between nodes can be dropped or delayed. In any real distributed system, partitions will happen—network switches fail, cables get cut, cloud regions lose connectivity.

Here’s a simple distributed key-value store that illustrates these concepts:

class DistributedKVStore:
    def __init__(self, node_id, peers, consistency_mode="CP"):
        self.node_id = node_id
        self.peers = peers  # List of peer node connections
        self.data = {}
        self.consistency_mode = consistency_mode
    
    def write(self, key, value):
        if self.consistency_mode == "CP":
            # Wait for majority acknowledgment before confirming
            acks = 1  # Count self
            for peer in self.peers:
                try:
                    peer.replicate(key, value, timeout=5)
                    acks += 1
                except NetworkTimeout:
                    continue
            
            if acks < (len(self.peers) + 1) // 2 + 1:
                raise UnavailableError("Cannot reach quorum")
            
            self.data[key] = value
            return "OK"
        
        else:  # AP mode
            # Write locally, replicate asynchronously
            self.data[key] = value
            for peer in self.peers:
                self._async_replicate(peer, key, value)
            return "OK"  # Always responds
    
    def read(self, key):
        if self.consistency_mode == "CP":
            # Read from quorum to ensure consistency
            values = [self.data.get(key)]
            for peer in self.peers:
                try:
                    values.append(peer.get(key, timeout=5))
                except NetworkTimeout:
                    continue
            
            if len(values) < (len(self.peers) + 1) // 2 + 1:
                raise UnavailableError("Cannot reach quorum")
            
            return self._resolve_latest(values)
        
        else:  # AP mode
            # Return local value immediately
            return self.data.get(key)

The Three Trade-off Combinations

CP Systems: Consistency Over Availability

CP systems refuse to respond rather than return potentially stale data. When a partition occurs, nodes that can’t confirm they have the latest data will return errors.

Examples: ZooKeeper, etcd, Consul, HBase

Use when: Correctness matters more than uptime. Configuration management, leader election, distributed locks.

AP Systems: Availability Over Consistency

AP systems always respond, accepting that different nodes might temporarily have different views of the data. They rely on eventual consistency—given enough time without new updates, all nodes will converge.

Examples: Cassandra, DynamoDB, CouchDB, Riak

Use when: Uptime matters more than immediate consistency. User sessions, product catalogs, social media feeds.

CA Systems: The Myth

You’ll sometimes see CA listed as an option. It’s not. A CA system would be consistent and available but not partition tolerant—meaning it would simply stop working during network issues. That’s not a distributed system; that’s a single-node database. The moment you add a second node, you must handle partitions.

Here’s how Redis behaves differently based on cluster configuration:

import redis

# CP-style configuration: Redis Cluster with WAIT
def cp_write(client, key, value):
    """
    Blocks until write is replicated to specified number of replicas.
    May timeout and fail if replicas are unreachable.
    """
    pipe = client.pipeline()
    pipe.set(key, value)
    pipe.wait(numreplicas=2, timeout=5000)  # Wait for 2 replicas, 5s timeout
    results = pipe.execute()
    
    replicas_confirmed = results[1]
    if replicas_confirmed < 2:
        raise Exception(f"Only {replicas_confirmed} replicas confirmed")
    return True

# AP-style configuration: Fire and forget
def ap_write(client, key, value):
    """
    Returns immediately after primary acknowledges.
    Replication happens asynchronously.
    """
    client.set(key, value)
    return True  # Always succeeds if primary is up

# Reading with different consistency requirements
def cp_read(client, key):
    """
    Read from primary only to ensure latest value.
    Fails if primary is unreachable.
    """
    return client.get(key)  # Redis reads from primary by default

def ap_read(client, key):
    """
    Allow reading from replicas for higher availability.
    May return stale data.
    """
    client.readonly()  # Enable replica reads
    return client.get(key)

Real-World System Analysis

MongoDB

MongoDB defaults to AP behavior but can be configured for CP semantics through write concerns:

// AP configuration: Fast, potentially inconsistent
db.orders.insertOne(
  { orderId: "12345", status: "pending" },
  { writeConcern: { w: 1 } }  // Acknowledge after primary write
)

// CP configuration: Slower, strongly consistent
db.orders.insertOne(
  { orderId: "12345", status: "pending" },
  { writeConcern: { w: "majority", j: true } }  // Wait for majority + journal
)

// Read concern affects consistency too
db.orders.find({ orderId: "12345" }).readConcern("majority")
// Only returns data acknowledged by majority of nodes

PostgreSQL with Streaming Replication

PostgreSQL offers synchronous replication for CP semantics:

-- On primary: Configure synchronous replication
ALTER SYSTEM SET synchronous_commit = 'on';
ALTER SYSTEM SET synchronous_standby_names = 'replica1, replica2';

-- Transactions wait for replica acknowledgment
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;  -- Blocks until replica confirms

Apache Kafka

Kafka lets you tune consistency per producer:

// CP configuration
Properties props = new Properties();
props.put("acks", "all");  // Wait for all in-sync replicas
props.put("min.insync.replicas", 2);
props.put("retries", Integer.MAX_VALUE);

// AP configuration  
Properties props = new Properties();
props.put("acks", "1");  // Only wait for leader
props.put("retries", 0);

Beyond CAP: PACELC Theorem

CAP only describes behavior during partitions. But what about normal operation? The PACELC theorem extends CAP: if there’s a Partition, choose Availability or Consistency; Else, choose Latency or Consistency.

This matters because even when your network is healthy, you’re constantly trading latency for consistency. A synchronous write to three replicas takes longer than a write to one. A quorum read across data centers adds latency.

Consider this breakdown:

System	During Partition (PAC)	Normal Operation (ELC)
DynamoDB	AP	EL (low latency, eventual consistency)
MongoDB (default)	AP	EL
MongoDB (w:majority)	CP	EC
Spanner	CP	EC (pays latency cost)
Cassandra	AP	EL (tunable)

Making Trade-off Decisions

Here’s my decision framework:

Choose CP when:

Financial transactions where inconsistency means lost money
Inventory systems where overselling is unacceptable
Distributed locks and leader election
Configuration that affects system behavior

Choose AP when:

User-facing features where errors hurt engagement
Caching layers where stale data is acceptable
Analytics and logging where some loss is tolerable
Shopping carts (with conflict resolution)

For AP systems, you need a conflict resolution strategy:

class ShoppingCart:
    """
    AP shopping cart with vector clock conflict resolution.
    Multiple nodes can accept concurrent updates.
    """
    
    def __init__(self):
        self.items = {}
        self.vector_clock = {}  # {node_id: counter}
    
    def add_item(self, node_id, item_id, quantity):
        # Increment this node's clock
        self.vector_clock[node_id] = self.vector_clock.get(node_id, 0) + 1
        
        # Add item with clock timestamp
        self.items[item_id] = {
            "quantity": quantity,
            "clock": self.vector_clock.copy()
        }
    
    def merge(self, other_cart):
        """
        Merge concurrent updates from another node.
        Uses vector clocks to detect and resolve conflicts.
        """
        for item_id, other_item in other_cart.items.items():
            if item_id not in self.items:
                self.items[item_id] = other_item
            else:
                my_item = self.items[item_id]
                
                if self._clock_greater(other_item["clock"], my_item["clock"]):
                    # Other is newer, take it
                    self.items[item_id] = other_item
                elif self._clock_concurrent(other_item["clock"], my_item["clock"]):
                    # Concurrent updates: merge by taking max quantity
                    self.items[item_id] = {
                        "quantity": max(my_item["quantity"], other_item["quantity"]),
                        "clock": self._merge_clocks(my_item["clock"], other_item["clock"])
                    }
                # else: my item is newer, keep it

Conclusion

CAP theorem isn’t about permanently sacrificing one property—it’s about understanding what happens during the inevitable network partition. Most modern databases give you knobs to tune this trade-off per operation, letting you choose strong consistency for critical writes and high availability for reads.

When designing systems, ask these questions: What’s the cost of serving stale data? What’s the cost of being unavailable? How often do partitions actually occur in your infrastructure? Your answers should drive your database choice and configuration, not theoretical purity.

Remember that PACELC extends this thinking to normal operation. Even when everything is working, you’re trading latency for consistency on every request. Design for your actual access patterns and failure modes, not worst-case scenarios you’ll never encounter.