System Design: Caching Strategies (Write-Through, Write-Back, Write-Around)

Every caching layer introduces a fundamental challenge: how do you keep two data stores in sync when writes happen? Get this wrong and you'll face stale reads, lost writes, or both. Get it right and...

Key Insights

  • Write-through guarantees consistency but adds latency; write-back optimizes for speed but risks data loss; write-around prevents cache pollution but causes initial read misses
  • Your caching strategy should vary by data type—use write-through for financial transactions, write-back for session data, and write-around for analytics
  • Most production systems need hybrid approaches, not a single strategy applied uniformly across all data

The Sync Problem Every System Faces

Every caching layer introduces a fundamental challenge: how do you keep two data stores in sync when writes happen? Get this wrong and you’ll face stale reads, lost writes, or both. Get it right and you’ll have a system that’s both fast and reliable.

The three main write strategies—write-through, write-back, and write-around—represent different trade-offs along the consistency-performance spectrum. There’s no universally correct answer. The right choice depends on your data’s characteristics, your tolerance for inconsistency, and how much complexity you’re willing to manage.

Let’s break down each strategy with real implementation code.

Write-Through Caching

Write-through is the conservative choice. Every write goes to both the cache and the database synchronously. The operation only succeeds when both writes complete.

This gives you strong consistency. The cache always reflects the database state. Reads are fast and trustworthy. The downside is latency—every write operation now has the overhead of two storage operations.

import redis
import psycopg2
from contextlib import contextmanager
from typing import Any, Optional

class WriteThroughCache:
    def __init__(self, redis_client: redis.Redis, db_connection):
        self.cache = redis_client
        self.db = db_connection
    
    @contextmanager
    def _transaction(self):
        cursor = self.db.cursor()
        try:
            yield cursor
            self.db.commit()
        except Exception as e:
            self.db.rollback()
            raise e
        finally:
            cursor.close()
    
    def set(self, key: str, value: Any, table: str = "cache_data") -> bool:
        """
        Write-through: write to both cache and DB atomically.
        Cache write only succeeds if DB write succeeds.
        """
        try:
            with self._transaction() as cursor:
                # Write to database first
                cursor.execute(
                    f"""
                    INSERT INTO {table} (key, value, updated_at)
                    VALUES (%s, %s, NOW())
                    ON CONFLICT (key) DO UPDATE SET value = %s, updated_at = NOW()
                    """,
                    (key, value, value)
                )
                
                # Only update cache after DB succeeds
                self.cache.set(key, value)
                return True
                
        except Exception as e:
            # If DB fails, don't update cache
            # If cache fails after DB succeeds, invalidate cache entry
            self.cache.delete(key)
            raise e
    
    def get(self, key: str) -> Optional[Any]:
        """Read from cache first, fall back to DB."""
        value = self.cache.get(key)
        if value is not None:
            return value
        
        # Cache miss - fetch from DB and populate cache
        with self._transaction() as cursor:
            cursor.execute(
                "SELECT value FROM cache_data WHERE key = %s",
                (key,)
            )
            row = cursor.fetchone()
            if row:
                self.cache.set(key, row[0])
                return row[0]
        return None

The critical detail here is the order of operations. Write to the database first. If that fails, the cache stays unchanged. If the cache write fails after the database succeeds, invalidate the cache entry. This ensures you never have stale data in the cache—at worst, you have a cache miss.

Write-through works best for read-heavy workloads where consistency matters more than write latency. Think user profile data, configuration settings, or any data where serving stale values causes real problems.

Write-Back (Write-Behind) Caching

Write-back flips the trade-off. Writes go to the cache immediately and return to the caller. The database update happens asynchronously, usually batched for efficiency.

This gives you excellent write performance. The caller doesn’t wait for the slow database operation. But you’re now accepting durability risk—if the cache node dies before the async flush, you lose data.

import redis
import json
import threading
import time
from queue import Queue
from typing import Any, Dict, List
from dataclasses import dataclass
from datetime import datetime

@dataclass
class DirtyEntry:
    key: str
    value: Any
    timestamp: datetime

class WriteBackCache:
    def __init__(self, redis_client: redis.Redis, db_connection, 
                 flush_interval: int = 5, batch_size: int = 100):
        self.cache = redis_client
        self.db = db_connection
        self.dirty_queue: Queue[DirtyEntry] = Queue()
        self.flush_interval = flush_interval
        self.batch_size = batch_size
        self._start_flush_worker()
    
    def _start_flush_worker(self):
        """Background worker that flushes dirty entries to DB."""
        def flush_loop():
            while True:
                time.sleep(self.flush_interval)
                self._flush_batch()
        
        worker = threading.Thread(target=flush_loop, daemon=True)
        worker.start()
    
    def _flush_batch(self):
        """Batch flush dirty entries to database."""
        batch: List[DirtyEntry] = []
        
        while not self.dirty_queue.empty() and len(batch) < self.batch_size:
            try:
                entry = self.dirty_queue.get_nowait()
                batch.append(entry)
            except:
                break
        
        if not batch:
            return
        
        cursor = self.db.cursor()
        try:
            # Batch insert/update for efficiency
            values = [(e.key, e.value, e.timestamp) for e in batch]
            cursor.executemany(
                """
                INSERT INTO cache_data (key, value, updated_at)
                VALUES (%s, %s, %s)
                ON CONFLICT (key) DO UPDATE SET value = EXCLUDED.value, 
                                                 updated_at = EXCLUDED.updated_at
                """,
                values
            )
            self.db.commit()
            
            # Mark entries as clean in cache metadata
            for entry in batch:
                self.cache.srem("dirty_keys", entry.key)
                
        except Exception as e:
            self.db.rollback()
            # Re-queue failed entries
            for entry in batch:
                self.dirty_queue.put(entry)
            raise e
        finally:
            cursor.close()
    
    def set(self, key: str, value: Any) -> bool:
        """Write to cache immediately, queue for async DB write."""
        # Write to cache
        self.cache.set(key, value)
        
        # Track dirty state
        self.cache.sadd("dirty_keys", key)
        
        # Queue for async persistence
        self.dirty_queue.put(DirtyEntry(
            key=key,
            value=value,
            timestamp=datetime.now()
        ))
        
        return True
    
    def get(self, key: str) -> Any:
        """Read from cache - it's always the source of truth."""
        return self.cache.get(key)
    
    def force_flush(self):
        """Force immediate flush of all dirty entries."""
        while not self.dirty_queue.empty():
            self._flush_batch()

The dirty key tracking is essential. You need to know which cache entries haven’t been persisted yet. If a cache node fails, you can use this set to understand what data might be at risk.

Write-back shines for high-throughput write scenarios where you can tolerate some data loss. Session data, analytics events, and real-time metrics are good candidates. Financial transactions are not.

Write-Around Caching

Write-around takes a different approach entirely. Writes go directly to the database, bypassing the cache completely. The cache only gets populated on subsequent reads.

This prevents cache pollution—data that’s written but rarely read never consumes cache space. The trade-off is that immediately after a write, the next read will be a cache miss.

import redis
from typing import Any, Optional

class WriteAroundCache:
    def __init__(self, redis_client: redis.Redis, db_connection, 
                 cache_ttl: int = 3600):
        self.cache = redis_client
        self.db = db_connection
        self.cache_ttl = cache_ttl
    
    def set(self, key: str, value: Any, table: str = "cache_data") -> bool:
        """Write directly to database, skip cache entirely."""
        cursor = self.db.cursor()
        try:
            cursor.execute(
                f"""
                INSERT INTO {table} (key, value, updated_at)
                VALUES (%s, %s, NOW())
                ON CONFLICT (key) DO UPDATE SET value = %s, updated_at = NOW()
                """,
                (key, value, value)
            )
            self.db.commit()
            
            # Invalidate cache entry if it exists
            # This prevents serving stale data
            self.cache.delete(key)
            
            return True
        except Exception as e:
            self.db.rollback()
            raise e
        finally:
            cursor.close()
    
    def get(self, key: str) -> Optional[Any]:
        """
        Read from cache first. On miss, fetch from DB and populate cache.
        This is where cache gets populated - lazy loading.
        """
        # Try cache first
        value = self.cache.get(key)
        if value is not None:
            return value
        
        # Cache miss - fetch from database
        cursor = self.db.cursor()
        try:
            cursor.execute(
                "SELECT value FROM cache_data WHERE key = %s",
                (key,)
            )
            row = cursor.fetchone()
            
            if row:
                # Populate cache for future reads
                self.cache.setex(key, self.cache_ttl, row[0])
                return row[0]
            
            return None
        finally:
            cursor.close()

The cache invalidation on write is crucial. Without it, subsequent reads would return stale cached data until the TTL expires.

Write-around is ideal for write-heavy workloads where most data is written once and rarely read. Log ingestion, audit trails, and bulk data imports are perfect use cases.

Comparison Matrix and Decision Framework

Strategy Consistency Write Latency Read Latency Durability Complexity
Write-Through Strong High Low High Low
Write-Back Eventual Low Low Medium High
Write-Around Strong Medium Variable High Low

Choose write-through when: Data consistency is critical, write volume is moderate, and you can tolerate higher write latency.

Choose write-back when: Write performance is critical, you can tolerate eventual consistency, and you have infrastructure to handle cache failures gracefully.

Choose write-around when: Write volume is high, most data is rarely re-read, and you want to prevent cache pollution.

Hybrid Approaches for Production Systems

Real systems rarely use a single strategy. Different data types have different requirements.

class HybridCache:
    """
    Uses different strategies based on data classification.
    - Critical data: write-through (user balances, orders)
    - Session data: write-back (user preferences, cart)
    - Analytics: write-around (page views, click events)
    """
    
    def __init__(self, redis_client, db_connection):
        self.write_through = WriteThroughCache(redis_client, db_connection)
        self.write_back = WriteBackCache(redis_client, db_connection)
        self.write_around = WriteAroundCache(redis_client, db_connection)
        
        self.strategy_map = {
            "user_balance": "write_through",
            "order": "write_through",
            "session": "write_back",
            "preferences": "write_back",
            "page_view": "write_around",
            "analytics": "write_around",
        }
    
    def _get_strategy(self, data_type: str):
        strategy_name = self.strategy_map.get(data_type, "write_through")
        return getattr(self, strategy_name)
    
    def set(self, key: str, value: Any, data_type: str) -> bool:
        strategy = self._get_strategy(data_type)
        return strategy.set(key, value)
    
    def get(self, key: str, data_type: str) -> Any:
        strategy = self._get_strategy(data_type)
        return strategy.get(key)

Key Takeaways

The thundering herd problem hits write-around hardest. When a popular item gets written, every subsequent read becomes a cache miss until one request populates the cache. Use request coalescing or cache warming to mitigate this.

Write-back requires robust failure handling. Your async flush worker needs retry logic, dead letter queues, and monitoring for queue depth. If the queue backs up, you’re accumulating durability risk.

Don’t optimize prematurely. Start with write-through for everything. It’s the simplest to reason about and debug. Only move to write-back or write-around when you have measured evidence that write latency or cache pollution is actually a problem.

Monitor cache-database drift. Even with write-through, network partitions and partial failures can cause inconsistencies. Implement periodic reconciliation jobs that compare cache and database state for critical data.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.