System Design: CDN Architecture and Caching

Key Insights

CDN architecture is fundamentally about trading consistency for latency—understanding this trade-off helps you make better caching decisions and avoid the common pitfall of over-caching dynamic content or under-caching static assets.
Cache invalidation isn’t just a technical problem; it’s an organizational one. The best invalidation strategy depends on your deployment pipeline, content update frequency, and tolerance for stale data.
Edge computing has moved beyond static content delivery. Modern CDNs can execute code at the edge, enabling personalization, A/B testing, and API responses without round-trips to your origin.

Introduction to CDN Architecture

Content Delivery Networks solve a fundamental physics problem: the speed of light is finite, and your users are scattered across the globe. A request from Tokyo to a server in Virginia takes roughly 150ms just for the round trip—before your server even processes anything.

CDNs work by caching content at edge nodes distributed globally. These nodes live in Points of Presence (PoPs)—data centers strategically placed in major metropolitan areas. When a user requests content, they hit the nearest edge node instead of your origin server.

The business case is straightforward: faster sites convert better. Amazon found that every 100ms of latency costs 1% in sales. Google uses page speed as a ranking factor. Beyond performance, CDNs provide DDoS protection, reduce origin server load, and improve availability through redundancy.

But CDNs aren’t magic. They introduce complexity around cache invalidation, debugging, and consistency. Understanding the architecture helps you make informed trade-offs.

CDN Topology and Request Flow

When a user types your URL, the request doesn’t go directly to your server. Here’s what actually happens:

User Request Flow:
1. Browser resolves cdn.example.com via DNS
2. DNS returns IP of nearest edge node (via GeoDNS or Anycast)
3. Browser connects to edge node
4. Edge checks local cache
   → Cache HIT: Return cached response
   → Cache MISS: Forward to regional tier or origin
5. Response cached at edge for future requests

Most CDNs use DNS-based routing or Anycast to direct users to the nearest edge. DNS-based routing returns different IP addresses based on the resolver’s location. Anycast advertises the same IP from multiple locations, letting BGP routing find the shortest path.

The cache hierarchy typically has three tiers:

class CDNRequestRouter:
    def route_request(self, request, edge_node):
        # Level 1: Edge cache (closest to user)
        cached = edge_node.cache.get(request.cache_key)
        if cached and not cached.is_stale():
            return CacheResponse(cached, hit_level="edge")
        
        # Level 2: Regional/Shield cache (reduces origin load)
        regional = self.get_regional_node(edge_node)
        cached = regional.cache.get(request.cache_key)
        if cached and not cached.is_stale():
            edge_node.cache.set(request.cache_key, cached)
            return CacheResponse(cached, hit_level="regional")
        
        # Level 3: Origin fetch
        response = self.fetch_from_origin(request)
        regional.cache.set(request.cache_key, response)
        edge_node.cache.set(request.cache_key, response)
        return CacheResponse(response, hit_level="origin")

The regional tier (sometimes called “origin shield”) is crucial for reducing origin load. Without it, a cache miss at 50 edge nodes means 50 origin requests. With a regional shield, it’s one request to origin, then regional-to-edge distribution.

Caching Strategies and Cache Hierarchies

Cache behavior is controlled primarily through HTTP headers. Understanding these is non-negotiable for CDN work:

# Aggressive caching for static assets (1 year)
Cache-Control: public, max-age=31536000, immutable

# Short cache with revalidation for semi-dynamic content
Cache-Control: public, max-age=300, stale-while-revalidate=60

# No caching for sensitive/dynamic content
Cache-Control: private, no-store

# Cache but always revalidate (good for HTML pages)
Cache-Control: public, no-cache

The stale-while-revalidate directive is particularly powerful. It tells the CDN: “Serve stale content immediately, but fetch fresh content in the background.” Users get instant responses while content stays reasonably fresh.

Here’s a practical Cloudflare Workers configuration that implements tiered caching logic:

// Cloudflare Worker: Custom caching logic
export default {
  async fetch(request, env) {
    const url = new URL(request.url);
    
    // Define caching rules by path
    const cacheRules = {
      '/api/': { ttl: 0, cacheability: 'private' },
      '/static/': { ttl: 31536000, cacheability: 'public' },
      '/images/': { ttl: 86400, cacheability: 'public' },
      '/': { ttl: 300, cacheability: 'public', swr: 60 }
    };
    
    // Find matching rule
    const rule = Object.entries(cacheRules)
      .find(([path]) => url.pathname.startsWith(path))?.[1] 
      || { ttl: 3600, cacheability: 'public' };
    
    // Check cache first
    const cacheKey = new Request(url.toString(), request);
    const cache = caches.default;
    let response = await cache.match(cacheKey);
    
    if (!response) {
      response = await fetch(request);
      response = new Response(response.body, response);
      
      // Set cache headers based on rules
      let cacheControl = `${rule.cacheability}, max-age=${rule.ttl}`;
      if (rule.swr) {
        cacheControl += `, stale-while-revalidate=${rule.swr}`;
      }
      response.headers.set('Cache-Control', cacheControl);
      
      if (rule.ttl > 0) {
        await cache.put(cacheKey, response.clone());
      }
    }
    
    return response;
  }
};

Cache Invalidation Patterns

Phil Karlton famously said there are only two hard things in computer science: cache invalidation and naming things. He wasn’t wrong.

Here are the primary invalidation strategies, ranked by complexity:

Time-based expiration (TTL): The simplest approach. Set a TTL and accept that content might be stale for that duration. Works well when “eventually consistent” is acceptable.

Versioned URLs: Append a hash or version to asset URLs. When content changes, the URL changes, effectively bypassing the cache entirely.

// Build-time URL versioning
const assetUrl = (path) => {
  const hash = crypto
    .createHash('md5')
    .update(fs.readFileSync(path))
    .digest('hex')
    .slice(0, 8);
  return `${path}?v=${hash}`;
};

// Output: /static/app.js?v=a1b2c3d4

Event-driven purging: Trigger cache purges when content changes. This requires integration between your CMS/deployment pipeline and the CDN API.

import requests

class CDNCacheManager:
    def __init__(self, api_token, zone_id):
        self.api_token = api_token
        self.zone_id = zone_id
        self.base_url = f"https://api.cloudflare.com/client/v4/zones/{zone_id}"
    
    def purge_urls(self, urls: list[str]):
        """Purge specific URLs from cache"""
        response = requests.post(
            f"{self.base_url}/purge_cache",
            headers={"Authorization": f"Bearer {self.api_token}"},
            json={"files": urls}
        )
        return response.json()
    
    def purge_by_tag(self, tags: list[str]):
        """Purge all content with specific cache tags"""
        response = requests.post(
            f"{self.base_url}/purge_cache",
            headers={"Authorization": f"Bearer {self.api_token}"},
            json={"tags": tags}
        )
        return response.json()
    
    def purge_everything(self):
        """Nuclear option - use sparingly"""
        response = requests.post(
            f"{self.base_url}/purge_cache",
            headers={"Authorization": f"Bearer {self.api_token}"},
            json={"purge_everything": True}
        )
        return response.json()

# Webhook handler for CMS content updates
def handle_content_update(event):
    cache = CDNCacheManager(os.environ['CF_TOKEN'], os.environ['CF_ZONE'])
    
    content_type = event['content_type']
    content_id = event['id']
    
    # Purge by cache tag (more surgical than URL purging)
    cache.purge_by_tag([f"{content_type}:{content_id}", content_type])

Cache tags deserve special attention. Instead of tracking every URL that might contain a piece of content, you tag responses with metadata. When that content changes, you purge by tag. This is how you invalidate a product page, all category pages containing that product, and the homepage—with a single API call.

Dynamic Content at the Edge

Modern CDNs aren’t just caches—they’re distributed compute platforms. Edge functions let you run code milliseconds from your users, enabling:

Personalization without origin round-trips
A/B testing with consistent user bucketing
API responses for read-heavy endpoints
Authentication and authorization at the edge

// Lambda@Edge: A/B testing with consistent bucketing
exports.handler = async (event) => {
  const request = event.Records[0].cf.request;
  const headers = request.headers;
  
  // Get or create user bucket (consistent across requests)
  let userId = headers['cookie']?.[0]?.value
    ?.match(/user_id=([^;]+)/)?.[1];
  
  if (!userId) {
    userId = crypto.randomUUID();
  }
  
  // Deterministic bucket assignment based on user ID
  const bucket = hashToNumber(userId) % 100;
  
  // Route to experiment variant
  const experiment = {
    name: 'new-checkout-flow',
    variants: [
      { name: 'control', weight: 50, origin: 'origin-a.example.com' },
      { name: 'treatment', weight: 50, origin: 'origin-b.example.com' }
    ]
  };
  
  let cumulative = 0;
  for (const variant of experiment.variants) {
    cumulative += variant.weight;
    if (bucket < cumulative) {
      request.origin.custom.domainName = variant.origin;
      request.headers['x-experiment-variant'] = [{ value: variant.name }];
      break;
    }
  }
  
  return request;
};

function hashToNumber(str) {
  let hash = 0;
  for (let i = 0; i < str.length; i++) {
    hash = ((hash << 5) - hash) + str.charCodeAt(i);
    hash = hash & hash;
  }
  return Math.abs(hash);
}

Monitoring and Debugging CDN Performance

You can’t optimize what you don’t measure. Key metrics to track:

Cache Hit Ratio (CHR): Percentage of requests served from cache. Aim for 90%+ for static content.
Time to First Byte (TTFB): How long until the first byte reaches the user. Edge responses should be under 50ms.
Origin Shield Efficiency: How much traffic your shield absorbs before hitting origin.

#!/usr/bin/env python3
"""Analyze CDN logs for cache performance insights"""

import json
from collections import defaultdict
from datetime import datetime

def analyze_cdn_logs(log_file):
    stats = defaultdict(lambda: {'hits': 0, 'misses': 0, 'ttfb_sum': 0})
    
    with open(log_file) as f:
        for line in f:
            entry = json.loads(line)
            
            path_prefix = '/' + entry['uri'].split('/')[1]
            cache_status = entry['cache_status']  # HIT, MISS, EXPIRED, etc.
            ttfb = entry['time_to_first_byte_ms']
            
            if cache_status in ('HIT', 'STALE'):
                stats[path_prefix]['hits'] += 1
            else:
                stats[path_prefix]['misses'] += 1
            
            stats[path_prefix]['ttfb_sum'] += ttfb
    
    # Calculate and display metrics
    print(f"{'Path':<20} {'CHR':>8} {'Avg TTFB':>10} {'Requests':>10}")
    print("-" * 50)
    
    for path, data in sorted(stats.items()):
        total = data['hits'] + data['misses']
        chr_pct = (data['hits'] / total * 100) if total > 0 else 0
        avg_ttfb = data['ttfb_sum'] / total if total > 0 else 0
        
        # Flag low cache hit ratios
        flag = "⚠️" if chr_pct < 80 and total > 100 else ""
        print(f"{path:<20} {chr_pct:>7.1f}% {avg_ttfb:>9.1f}ms {total:>10} {flag}")

if __name__ == '__main__':
    analyze_cdn_logs('cdn-access.log')

Common causes of low cache hit ratios:

Vary header too broad: Vary: * or Vary: Cookie fragments your cache
Query string variations: Sort and normalize query params
Missing cache headers: Origin not sending Cache-Control
Short TTLs: Balance freshness against hit ratio

Design Considerations and Trade-offs

Choosing CDN architecture isn’t one-size-fits-all. Consider these trade-offs:

Consistency vs. Latency: Aggressive caching means faster responses but potentially stale content. For financial data, you want short TTLs or no caching. For blog posts, cache aggressively.

Cost vs. Coverage: More PoPs mean better latency but higher costs. If 80% of your users are in North America, you might not need presence in every Asian market.

Complexity vs. Control: Managed CDNs (Cloudflare, Fastly) are simpler but less customizable. Building on AWS CloudFront with Lambda@Edge gives more control but requires more operational expertise.

Decision checklist:

What’s your read/write ratio? CDNs excel at read-heavy workloads.
How stale can content be? This determines your TTL strategy.
Where are your users? This determines PoP requirements.
Do you need edge compute? This narrows your CDN options.
What’s your invalidation trigger? CMS webhooks, deploy hooks, or manual?

CDN architecture is ultimately about understanding your content’s lifecycle and your users’ expectations. Get those right, and the technical implementation follows naturally.