Design a Video Streaming Platform: Content Delivery

Key Insights

Multi-tier CDN architecture with origin shields, regional caches, and edge servers reduces origin load by 95%+ while keeping video startup times under 2 seconds globally
Adaptive bitrate streaming with properly sized segments (2-6 seconds) and strategic cache warming for popular content solves the fundamental tension between quality and reliability
Cache key design and consistent hashing are the unsung heroes of video delivery—get them wrong and your cache hit ratios plummet, your origin servers melt, and users experience constant rebuffering

Introduction to Video CDN Architecture

Video streaming is the hardest content delivery problem you’ll face. Unlike static assets where you cache once and serve forever, video introduces unique challenges: files measured in gigabytes, sequential access patterns, real-time quality adaptation, and users who notice every hiccup.

The numbers tell the story. A single 4K stream consumes 25 Mbps. Multiply that by millions of concurrent viewers, and you’re looking at petabits of bandwidth. Serving this from origin servers isn’t just expensive—it’s physically impossible. Light takes 67 milliseconds to travel from New York to London. Add server processing, TCP handshakes, and TLS negotiation, and you’re looking at 200+ milliseconds before the first byte arrives. Users abandon streams that buffer for more than 2 seconds.

Content Delivery Networks solve this by pushing content to the edge—servers physically close to users. But video CDN architecture goes far beyond simple caching. You need intelligent routing, adaptive quality, and cache hierarchies that handle both viral content and the long tail of rarely-watched videos.

CDN Topology and Edge Server Design

Effective video CDNs use a multi-tier architecture. At the top sits your origin: the source of truth for all video content. Below that, regional cache servers (sometimes called mid-tier or parent caches) aggregate requests from multiple edge locations. Finally, edge servers in Points of Presence (PoPs) serve users directly.

This hierarchy exists because cache efficiency improves with aggregation. An edge server in a small city might see each video segment once per day. A regional cache serving 50 edge servers sees it 50 times. The origin only handles cache misses from regional servers—typically less than 5% of total requests.

PoP placement follows user density and network topology. Major internet exchange points (IXPs) in cities like Frankfurt, Singapore, and Ashburn are obvious choices. But don’t ignore secondary markets—a PoP in Denver serves the entire Mountain West better than one in Los Angeles.

Anycast routing lets multiple servers share a single IP address, with BGP routing users to the nearest one. This provides automatic failover and geographic load balancing without client-side logic.

Here’s an Nginx configuration optimized for video segment caching with byte-range support:

proxy_cache_path /var/cache/nginx/video levels=1:2 
    keys_zone=video_cache:1g max_size=500g inactive=7d
    use_temp_path=off;

map $request_uri $cache_key {
    ~^(?<path>/videos/[^/]+/[^/]+) $path;
    default $request_uri;
}

server {
    listen 443 ssl http2;
    server_name edge.cdn.example.com;
    
    location ~ ^/videos/ {
        proxy_cache video_cache;
        proxy_cache_key $cache_key;
        proxy_cache_valid 200 206 7d;
        proxy_cache_use_stale error timeout updating;
        
        # Enable byte-range caching
        proxy_cache_lock on;
        proxy_cache_lock_timeout 5s;
        slice 1m;
        proxy_set_header Range $slice_range;
        proxy_cache_key $cache_key$slice_range;
        
        # Cache partial responses
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        
        proxy_pass http://regional_upstream;
        
        add_header X-Cache-Status $upstream_cache_status;
        add_header Accept-Ranges bytes;
    }
}

upstream regional_upstream {
    least_conn;
    server regional-east.internal:80 weight=5;
    server regional-west.internal:80 weight=5 backup;
    keepalive 100;
}

The slice directive is critical—it lets Nginx cache byte-range requests independently, so seeking to the middle of a video doesn’t require fetching the entire file first.

Adaptive Bitrate Streaming Implementation

Adaptive Bitrate (ABR) streaming solves the fundamental tension between quality and reliability. Instead of delivering a single video file, you encode multiple quality variants and let the player switch between them based on network conditions.

HLS (HTTP Live Streaming) and DASH (Dynamic Adaptive Streaming over HTTP) dominate the market. Both work similarly: video is split into small segments (typically 2-6 seconds), and a manifest file describes available quality levels and segment URLs.

Shorter segments mean faster quality adaptation but higher overhead from HTTP requests and reduced compression efficiency. Most platforms settle on 4-second segments as a reasonable compromise.

Here’s a Python script generating an HLS master manifest with multiple quality variants:

from dataclasses import dataclass
from typing import List
import os

@dataclass
class VideoVariant:
    resolution: str
    bandwidth: int
    codecs: str
    filename: str

def generate_master_manifest(
    video_id: str,
    variants: List[VideoVariant],
    base_url: str
) -> str:
    lines = [
        "#EXTM3U",
        "#EXT-X-VERSION:4",
    ]
    
    # Sort by bandwidth for player compatibility
    sorted_variants = sorted(variants, key=lambda v: v.bandwidth)
    
    for variant in sorted_variants:
        width, height = variant.resolution.split('x')
        lines.extend([
            f"#EXT-X-STREAM-INF:BANDWIDTH={variant.bandwidth},"
            f"RESOLUTION={variant.resolution},"
            f"CODECS=\"{variant.codecs}\"",
            f"{base_url}/{video_id}/{variant.filename}"
        ])
    
    return "\n".join(lines)

def generate_media_playlist(
    video_id: str,
    variant: str,
    segment_duration: float,
    segment_count: int,
    base_url: str
) -> str:
    lines = [
        "#EXTM3U",
        "#EXT-X-VERSION:4",
        f"#EXT-X-TARGETDURATION:{int(segment_duration) + 1}",
        "#EXT-X-MEDIA-SEQUENCE:0",
        "#EXT-X-PLAYLIST-TYPE:VOD",
    ]
    
    for i in range(segment_count):
        lines.extend([
            f"#EXTINF:{segment_duration:.3f},",
            f"{base_url}/{video_id}/{variant}/segment_{i:05d}.ts"
        ])
    
    lines.append("#EXT-X-ENDLIST")
    return "\n".join(lines)

# Example usage
variants = [
    VideoVariant("640x360", 800_000, "avc1.4d401e,mp4a.40.2", "360p.m3u8"),
    VideoVariant("1280x720", 2_500_000, "avc1.4d401f,mp4a.40.2", "720p.m3u8"),
    VideoVariant("1920x1080", 5_000_000, "avc1.640028,mp4a.40.2", "1080p.m3u8"),
    VideoVariant("3840x2160", 15_000_000, "avc1.640033,mp4a.40.2", "4k.m3u8"),
]

master = generate_master_manifest(
    "video_abc123",
    variants,
    "https://cdn.example.com/hls"
)
print(master)

Video Segmentation and Encoding Pipeline

Transcoding is where compute costs live. A single 4K source video might generate 6+ output variants, each requiring full decode and re-encode. Parallel processing is essential.

Segment your encoding pipeline: one job per quality variant, distributed across worker nodes. Use keyframe-aligned segments so players can switch qualities cleanly at segment boundaries.

#!/bin/bash
# Multi-bitrate HLS encoding with FFmpeg

INPUT=$1
OUTPUT_DIR=$2
SEGMENT_DURATION=4

mkdir -p "$OUTPUT_DIR"

# Encoding ladder: resolution, video bitrate, audio bitrate
declare -a VARIANTS=(
    "640:360:800k:96k:360p"
    "1280:720:2500k:128k:720p"
    "1920:1080:5000k:192k:1080p"
)

for variant in "${VARIANTS[@]}"; do
    IFS=':' read -r width height vbitrate abitrate name <<< "$variant"
    
    ffmpeg -i "$INPUT" \
        -vf "scale=${width}:${height}:force_original_aspect_ratio=decrease,pad=${width}:${height}:(ow-iw)/2:(oh-ih)/2" \
        -c:v libx264 -preset medium -profile:v main -level 4.0 \
        -b:v "$vbitrate" -maxrate "$vbitrate" -bufsize "$(echo "$vbitrate" | sed 's/k//')k" \
        -c:a aac -b:a "$abitrate" -ar 48000 \
        -g $((SEGMENT_DURATION * 30)) -keyint_min $((SEGMENT_DURATION * 30)) \
        -sc_threshold 0 \
        -hls_time "$SEGMENT_DURATION" \
        -hls_playlist_type vod \
        -hls_segment_filename "$OUTPUT_DIR/${name}/segment_%05d.ts" \
        "$OUTPUT_DIR/${name}/playlist.m3u8" &
done

wait
echo "Encoding complete"

The -sc_threshold 0 flag disables scene-change detection for keyframes, ensuring segments align across all quality levels. The -g flag sets GOP size to match segment duration.

Cache Optimization Strategies

Video traffic follows a power-law distribution: 10% of content generates 90% of views. Cache warming for popular content dramatically improves hit ratios.

import redis
import httpx
import asyncio
from typing import List, Set

class CacheWarmer:
    def __init__(self, redis_url: str, cdn_edges: List[str]):
        self.redis = redis.from_url(redis_url)
        self.cdn_edges = cdn_edges
        
    async def get_trending_videos(self, limit: int = 100) -> List[str]:
        """Fetch video IDs sorted by recent view count."""
        return self.redis.zrevrange("video:views:hourly", 0, limit - 1)
    
    async def warm_edge(self, edge: str, video_id: str, variants: List[str]):
        """Pre-fetch video segments to a specific edge server."""
        async with httpx.AsyncClient(timeout=30.0) as client:
            for variant in variants:
                # Warm first 10 segments (covers ~40 seconds at 4s segments)
                for seg_num in range(10):
                    url = f"https://{edge}/videos/{video_id}/{variant}/segment_{seg_num:05d}.ts"
                    try:
                        # HEAD request triggers cache fill without transferring body
                        await client.head(url, headers={"X-Cache-Warm": "true"})
                    except httpx.RequestError:
                        continue
    
    async def warm_trending(self):
        """Warm all edges with trending content."""
        trending = await self.get_trending_videos()
        variants = ["360p", "720p", "1080p"]
        
        tasks = []
        for video_id in trending:
            for edge in self.cdn_edges:
                tasks.append(self.warm_edge(edge, video_id, variants))
        
        await asyncio.gather(*tasks, return_exceptions=True)

# Run every 15 minutes
warmer = CacheWarmer(
    "redis://analytics.internal:6379",
    ["edge-nyc.cdn.example.com", "edge-lax.cdn.example.com"]
)
asyncio.run(warmer.warm_trending())

Origin Shield and Load Balancing

Origin shields aggregate requests from multiple edge locations, collapsing duplicate requests and protecting origin servers from traffic spikes. Consistent hashing ensures requests for the same content always hit the same shield server, maximizing cache efficiency.

import hashlib
from typing import List, Optional
from bisect import bisect_left

class ConsistentHash:
    def __init__(self, nodes: List[str], virtual_nodes: int = 150):
        self.virtual_nodes = virtual_nodes
        self.ring: List[int] = []
        self.node_map: dict[int, str] = {}
        
        for node in nodes:
            self.add_node(node)
    
    def _hash(self, key: str) -> int:
        return int(hashlib.md5(key.encode()).hexdigest(), 16)
    
    def add_node(self, node: str):
        for i in range(self.virtual_nodes):
            virtual_key = f"{node}:{i}"
            hash_val = self._hash(virtual_key)
            self.ring.append(hash_val)
            self.node_map[hash_val] = node
        self.ring.sort()
    
    def remove_node(self, node: str):
        for i in range(self.virtual_nodes):
            virtual_key = f"{node}:{i}"
            hash_val = self._hash(virtual_key)
            self.ring.remove(hash_val)
            del self.node_map[hash_val]
    
    def get_node(self, key: str) -> str:
        if not self.ring:
            raise ValueError("No nodes available")
        
        hash_val = self._hash(key)
        idx = bisect_left(self.ring, hash_val)
        
        if idx == len(self.ring):
            idx = 0
            
        return self.node_map[self.ring[idx]]

# Route video segment requests to consistent shield servers
shields = ConsistentHash([
    "shield-east-1.internal",
    "shield-east-2.internal", 
    "shield-west-1.internal",
])

# Same video always routes to same shield
cache_key = "videos/abc123/1080p/segment_00042.ts"
target_shield = shields.get_node(cache_key)

Monitoring and Performance Metrics

You can’t optimize what you don’t measure. Track these metrics religiously: rebuffer ratio (percentage of playback time spent buffering), video startup time, bitrate switches per session, and average delivered bitrate.

class PlaybackMetrics {
  constructor(videoElement, analyticsEndpoint) {
    this.video = videoElement;
    this.endpoint = analyticsEndpoint;
    this.metrics = {
      sessionId: crypto.randomUUID(),
      startTime: null,
      firstFrameTime: null,
      bufferingEvents: [],
      bitrateChanges: [],
      errors: []
    };
    
    this.attachListeners();
  }
  
  attachListeners() {
    this.video.addEventListener('loadstart', () => {
      this.metrics.startTime = performance.now();
    });
    
    this.video.addEventListener('playing', () => {
      if (!this.metrics.firstFrameTime) {
        this.metrics.firstFrameTime = performance.now();
        this.sendMetric('startup_time', 
          this.metrics.firstFrameTime - this.metrics.startTime);
      }
    });
    
    this.video.addEventListener('waiting', () => {
      this.bufferStart = performance.now();
    });
    
    this.video.addEventListener('playing', () => {
      if (this.bufferStart) {
        const duration = performance.now() - this.bufferStart;
        this.metrics.bufferingEvents.push({
          timestamp: this.video.currentTime,
          duration
        });
        this.sendMetric('rebuffer', duration);
        this.bufferStart = null;
      }
    });
  }
  
  recordBitrateChange(fromBitrate, toBitrate) {
    this.metrics.bitrateChanges.push({
      timestamp: this.video.currentTime,
      from: fromBitrate,
      to: toBitrate
    });
    this.sendMetric('bitrate_change', { from: fromBitrate, to: toBitrate });
  }
  
  sendMetric(type, data) {
    navigator.sendBeacon(this.endpoint, JSON.stringify({
      type,
      sessionId: this.metrics.sessionId,
      timestamp: Date.now(),
      data
    }));
  }
}

Aggregate these client-side metrics to calculate Quality of Experience (QoE) scores. A good target: 95% of sessions should have startup time under 2 seconds and rebuffer ratio under 1%. When metrics slip, you’ll know exactly where to investigate—whether it’s a specific edge location, ISP, or content type causing problems.