System Design: Idempotency in Distributed Systems

Key Insights

Idempotency keys combined with atomic storage operations are the foundation for preventing duplicate processing in distributed systems—without them, network retries will eventually cause data corruption.
The hardest part isn’t storing idempotency state; it’s handling concurrent duplicate requests that arrive within milliseconds of each other, requiring distributed locking patterns.
Idempotency must be designed into your operations from the start—retrofitting it onto non-idempotent workflows requires state machines and careful consideration of partial failure scenarios.

Why Idempotency Matters

Idempotency means that performing an operation multiple times produces the same result as performing it once. In distributed systems, this property isn’t a nice-to-have—it’s essential for correctness.

Consider what happens when a payment API times out. Did the payment succeed? The client doesn’t know. The rational response is to retry, but without idempotency, that retry might charge the customer twice. I’ve seen this happen in production systems, and the resulting customer support tickets and refund processes are painful.

The fundamental problem is that networks are unreliable. TCP connections drop. Load balancers timeout. Services restart mid-request. The CAP theorem tells us we can’t have it all, and in practice, this means we can’t achieve true “exactly-once” semantics. What we can achieve is “at-least-once” delivery with idempotent processing—which gives us the same practical outcome.

Idempotency Keys: The Foundation

An idempotency key is a unique identifier that clients attach to requests, allowing servers to recognize and deduplicate retries. The key should be unique per logical operation, not per HTTP request.

There are two approaches to generating these keys:

Client-generated keys (typically UUIDs) give clients full control and work well for user-initiated actions. The client generates a key when the user clicks “Submit” and reuses it for all retries of that specific action.

Deterministic keys are computed from request content using hashing. This works when the request payload itself defines uniqueness—for example, hashing the combination of user ID, amount, and merchant ID for a payment.

Here’s a basic API endpoint that validates idempotency keys:

from fastapi import FastAPI, Header, HTTPException
from pydantic import BaseModel
from typing import Optional
import hashlib

app = FastAPI()

class PaymentRequest(BaseModel):
    user_id: str
    amount: int
    currency: str
    merchant_id: str

def generate_deterministic_key(request: PaymentRequest) -> str:
    """Generate idempotency key from request content."""
    content = f"{request.user_id}:{request.amount}:{request.currency}:{request.merchant_id}"
    return hashlib.sha256(content.encode()).hexdigest()[:32]

@app.post("/payments")
async def create_payment(
    request: PaymentRequest,
    idempotency_key: Optional[str] = Header(None, alias="Idempotency-Key")
):
    if not idempotency_key:
        # Fall back to deterministic key generation
        idempotency_key = generate_deterministic_key(request)
    
    if len(idempotency_key) < 16 or len(idempotency_key) > 64:
        raise HTTPException(
            status_code=400,
            detail="Idempotency-Key must be between 16 and 64 characters"
        )
    
    # Process with idempotency key...
    return {"idempotency_key": idempotency_key, "status": "processing"}

Storage Strategies for Idempotency State

You need to store idempotency state somewhere. The choice depends on your durability requirements and scale.

In-memory storage is fast but doesn’t survive restarts and doesn’t work across multiple server instances. Only use this for development or single-instance deployments.

Redis is the sweet spot for most applications. It’s fast, supports atomic operations, handles TTL expiration automatically, and is shared across all your application instances.

Database storage provides durability and transactional guarantees. Use this when idempotency state must survive Redis failures or when you need to query historical idempotency data.

Here’s a Redis-based middleware implementation:

import redis
import json
from functools import wraps
from typing import Callable, Any
import time

class IdempotencyMiddleware:
    def __init__(self, redis_client: redis.Redis, ttl_seconds: int = 86400):
        self.redis = redis_client
        self.ttl = ttl_seconds
    
    def _key(self, idempotency_key: str) -> str:
        return f"idempotency:{idempotency_key}"
    
    def check_and_set(self, idempotency_key: str) -> tuple[bool, Any]:
        """
        Returns (is_duplicate, cached_response).
        Uses atomic operations to prevent race conditions.
        """
        key = self._key(idempotency_key)
        
        # Try to get existing response
        cached = self.redis.get(key)
        if cached:
            data = json.loads(cached)
            if data.get("status") == "completed":
                return True, data.get("response")
            elif data.get("status") == "processing":
                # Request is in flight - caller should wait or retry
                return True, None
        
        # Atomically set "processing" status
        # NX = only set if not exists
        processing_data = json.dumps({
            "status": "processing",
            "started_at": time.time()
        })
        
        was_set = self.redis.set(key, processing_data, nx=True, ex=self.ttl)
        if not was_set:
            # Another request beat us - recheck
            return self.check_and_set(idempotency_key)
        
        return False, None
    
    def store_response(self, idempotency_key: str, response: Any):
        """Store the completed response."""
        key = self._key(idempotency_key)
        completed_data = json.dumps({
            "status": "completed",
            "response": response,
            "completed_at": time.time()
        })
        self.redis.set(key, completed_data, ex=self.ttl)

Handling Concurrent Duplicate Requests

The trickiest scenario is when two identical requests arrive within milliseconds of each other—before either has completed processing. Without proper handling, both might execute the operation.

Distributed locking solves this. The first request acquires a lock, processes the operation, and stores the result. Concurrent requests either wait for the lock or return immediately with a “request in progress” response.

import redis
import time
from contextlib import contextmanager
from typing import Optional

class DistributedLock:
    def __init__(self, redis_client: redis.Redis):
        self.redis = redis_client
    
    @contextmanager
    def acquire(
        self, 
        lock_key: str, 
        timeout_seconds: int = 30,
        retry_interval: float = 0.1,
        max_retries: int = 50
    ):
        """
        Acquire a distributed lock with automatic expiration.
        Uses the Redlock-like pattern for safety.
        """
        lock_value = f"{time.time()}:{id(self)}"
        full_key = f"lock:{lock_key}"
        acquired = False
        
        for _ in range(max_retries):
            # SET NX with expiration - atomic acquire
            acquired = self.redis.set(
                full_key, 
                lock_value, 
                nx=True, 
                ex=timeout_seconds
            )
            if acquired:
                break
            time.sleep(retry_interval)
        
        if not acquired:
            raise TimeoutError(f"Could not acquire lock for {lock_key}")
        
        try:
            yield
        finally:
            # Only release if we still own the lock
            # Use Lua script for atomic check-and-delete
            release_script = """
            if redis.call("get", KEYS[1]) == ARGV[1] then
                return redis.call("del", KEYS[1])
            else
                return 0
            end
            """
            self.redis.eval(release_script, 1, full_key, lock_value)


class IdempotentProcessor:
    def __init__(self, redis_client: redis.Redis):
        self.redis = redis_client
        self.lock = DistributedLock(redis_client)
        self.middleware = IdempotencyMiddleware(redis_client)
    
    def process(self, idempotency_key: str, operation: Callable) -> Any:
        # First, quick check without locking
        is_duplicate, cached = self.middleware.check_and_set(idempotency_key)
        if is_duplicate and cached:
            return cached
        
        # Acquire lock for processing
        with self.lock.acquire(idempotency_key):
            # Double-check after acquiring lock
            is_duplicate, cached = self.middleware.check_and_set(idempotency_key)
            if is_duplicate and cached:
                return cached
            
            # Execute the actual operation
            result = operation()
            
            # Store result
            self.middleware.store_response(idempotency_key, result)
            return result

Designing Idempotent Operations

Some operations are naturally idempotent. HTTP PUT (replace resource) and DELETE (remove resource) can be called multiple times safely. POST (create resource) is not—calling it twice creates two resources.

For non-idempotent operations, use state machines with conditional updates:

from enum import Enum
from dataclasses import dataclass
from typing import Optional
import uuid

class PaymentStatus(Enum):
    PENDING = "pending"
    PROCESSING = "processing"
    COMPLETED = "completed"
    FAILED = "failed"

@dataclass
class Payment:
    id: str
    idempotency_key: str
    amount: int
    status: PaymentStatus
    external_transaction_id: Optional[str] = None

class PaymentService:
    def __init__(self, db, payment_gateway):
        self.db = db
        self.gateway = payment_gateway
    
    def process_payment(self, idempotency_key: str, amount: int) -> Payment:
        # Check for existing payment with this idempotency key
        existing = self.db.find_payment_by_idempotency_key(idempotency_key)
        
        if existing:
            if existing.status == PaymentStatus.COMPLETED:
                return existing  # Already done, return cached result
            elif existing.status == PaymentStatus.FAILED:
                # Could allow retry of failed payments
                raise PaymentFailedError(existing)
            # Status is PENDING or PROCESSING - continue below
            payment = existing
        else:
            # Create new payment record
            payment = Payment(
                id=str(uuid.uuid4()),
                idempotency_key=idempotency_key,
                amount=amount,
                status=PaymentStatus.PENDING
            )
            self.db.insert_payment(payment)
        
        # Atomic status transition: PENDING -> PROCESSING
        updated = self.db.update_payment_status(
            payment_id=payment.id,
            expected_status=PaymentStatus.PENDING,
            new_status=PaymentStatus.PROCESSING
        )
        
        if not updated:
            # Another process is handling this - fetch and return
            return self.db.get_payment(payment.id)
        
        try:
            # Call external payment gateway
            result = self.gateway.charge(amount, reference=payment.id)
            
            # Atomic transition: PROCESSING -> COMPLETED
            self.db.update_payment_completed(
                payment_id=payment.id,
                external_transaction_id=result.transaction_id
            )
            payment.status = PaymentStatus.COMPLETED
            payment.external_transaction_id = result.transaction_id
            
        except PaymentGatewayError as e:
            self.db.update_payment_status(
                payment_id=payment.id,
                expected_status=PaymentStatus.PROCESSING,
                new_status=PaymentStatus.FAILED
            )
            raise
        
        return payment

Idempotency Across Service Boundaries

In microservice architectures, a single user request might trigger calls to multiple downstream services. Each service needs to handle idempotency, and you need to propagate idempotency keys through the chain.

import httpx
from typing import Dict, Any

class ServiceClient:
    def __init__(self, base_url: str):
        self.base_url = base_url
        self.client = httpx.Client(timeout=30.0)
    
    def call_with_idempotency(
        self, 
        method: str,
        path: str,
        parent_idempotency_key: str,
        operation_name: str,
        **kwargs
    ) -> Dict[Any, Any]:
        """
        Propagate idempotency through service chain.
        Derives child key from parent to maintain request lineage.
        """
        # Derive a unique key for this specific downstream call
        child_key = f"{parent_idempotency_key}:{operation_name}"
        
        headers = kwargs.pop("headers", {})
        headers["Idempotency-Key"] = child_key
        headers["X-Parent-Idempotency-Key"] = parent_idempotency_key
        
        response = self.client.request(
            method,
            f"{self.base_url}{path}",
            headers=headers,
            **kwargs
        )
        response.raise_for_status()
        return response.json()


class OrderService:
    def __init__(self, payment_client: ServiceClient, inventory_client: ServiceClient):
        self.payments = payment_client
        self.inventory = inventory_client
    
    def create_order(self, idempotency_key: str, order_data: dict) -> dict:
        # Reserve inventory - idempotent with derived key
        inventory_result = self.inventory.call_with_idempotency(
            "POST",
            "/reservations",
            parent_idempotency_key=idempotency_key,
            operation_name="reserve_inventory",
            json={"items": order_data["items"]}
        )
        
        # Process payment - idempotent with derived key
        payment_result = self.payments.call_with_idempotency(
            "POST",
            "/charges",
            parent_idempotency_key=idempotency_key,
            operation_name="charge_payment",
            json={"amount": order_data["total"]}
        )
        
        return {
            "order_id": idempotency_key,
            "inventory": inventory_result,
            "payment": payment_result
        }

For message queues, most platforms provide built-in deduplication. SQS content-based deduplication uses message body hashing. Kafka supports idempotent producers. Use these features—don’t reinvent them.

Production Considerations

Key expiration: Set TTLs based on your retry window. 24 hours is reasonable for most APIs. Too short, and legitimate retries fail. Too long, and you waste storage.

Monitoring: Track duplicate detection rates. A sudden spike might indicate client bugs, network issues, or an attack. Alert on anomalies.

Testing: Use chaos engineering to verify idempotency works under failure. Kill services mid-request. Introduce network partitions. Replay requests from logs.

Common pitfalls: Non-deterministic responses break idempotency if clients expect identical responses. Timestamps, random IDs in responses, or different error messages for the same cached request will confuse clients. Partial failures are harder—if step 2 of 3 fails, ensure retries don’t re-execute step 1.

Idempotency isn’t optional in distributed systems. Build it in from the start, test it aggressively, and monitor it in production. Your future self—and your customers—will thank you.