Design a Load Balancer: Layer 4 vs Layer 7

Key Insights

Layer 4 load balancers operate on TCP/UDP connections and offer superior throughput with minimal latency, making them ideal for high-volume, protocol-agnostic traffic like gaming servers or database connections.
Layer 7 load balancers inspect application-layer content (HTTP headers, URLs, cookies), enabling intelligent routing decisions at the cost of additional processing overhead.
Most production architectures benefit from a hybrid approach: L4 load balancers at the edge for raw performance, with L7 load balancers behind them for application-aware routing.

What is a Load Balancer?

A load balancer distributes incoming network traffic across multiple backend servers to ensure no single server becomes overwhelmed. This serves two critical purposes: scalability (handle more traffic by adding servers) and availability (if one server fails, others continue serving requests).

The distinction between Layer 4 and Layer 7 comes from the OSI model. Layer 4 (Transport) deals with TCP and UDP—it sees source/destination IPs and ports but nothing about the actual content. Layer 7 (Application) understands protocols like HTTP, meaning it can read headers, URLs, and even request bodies.

This difference fundamentally shapes what each type can do and what trade-offs you accept.

Layer 4 Load Balancing: Transport Level

L4 load balancers work with TCP/UDP packets. They make routing decisions based on:

Source IP and port
Destination IP and port
Protocol type

The load balancer doesn’t decrypt TLS or parse HTTP. It simply forwards packets to a selected backend. This simplicity translates to raw speed—L4 balancers can handle millions of connections per second with microsecond latency overhead.

NAT Modes

SNAT (Source NAT): The load balancer rewrites the source IP to its own. Backend servers see all traffic coming from the LB, which simplifies return routing but loses client IP information (unless you use proxy protocol).

DSR (Direct Server Return): The load balancer only handles inbound traffic. Backends respond directly to clients, bypassing the LB on the return path. This dramatically increases throughput since the LB doesn’t process response traffic.

Here’s a simplified L4 load balancer in Go that demonstrates TCP connection forwarding:

package main

import (
	"io"
	"log"
	"net"
	"sync"
	"sync/atomic"
)

type L4LoadBalancer struct {
	backends []string
	current  uint64
}

func NewL4LoadBalancer(backends []string) *L4LoadBalancer {
	return &L4LoadBalancer{backends: backends}
}

func (lb *L4LoadBalancer) nextBackend() string {
	// Round-robin selection
	idx := atomic.AddUint64(&lb.current, 1)
	return lb.backends[idx%uint64(len(lb.backends))]
}

func (lb *L4LoadBalancer) handleConnection(clientConn net.Conn) {
	defer clientConn.Close()

	backend := lb.nextBackend()
	backendConn, err := net.Dial("tcp", backend)
	if err != nil {
		log.Printf("Failed to connect to backend %s: %v", backend, err)
		return
	}
	defer backendConn.Close()

	var wg sync.WaitGroup
	wg.Add(2)

	// Client -> Backend
	go func() {
		defer wg.Done()
		io.Copy(backendConn, clientConn)
	}()

	// Backend -> Client
	go func() {
		defer wg.Done()
		io.Copy(clientConn, backendConn)
	}()

	wg.Wait()
}

func main() {
	backends := []string{"127.0.0.1:8081", "127.0.0.1:8082", "127.0.0.1:8083"}
	lb := NewL4LoadBalancer(backends)

	listener, err := net.Listen("tcp", ":8080")
	if err != nil {
		log.Fatal(err)
	}
	log.Println("L4 Load Balancer listening on :8080")

	for {
		conn, err := listener.Accept()
		if err != nil {
			log.Printf("Accept error: %v", err)
			continue
		}
		go lb.handleConnection(conn)
	}
}

This implementation uses io.Copy for efficient bidirectional streaming. In production, you’d add connection pooling, timeouts, and health checks.

Layer 7 Load Balancing: Application Level

L7 load balancers terminate the connection and inspect application-layer content. For HTTP traffic, this means access to:

URL paths and query parameters
HTTP headers (Host, Cookie, Authorization)
Request methods
Request/response bodies

This enables powerful routing capabilities: send /api/* requests to your API servers, /static/* to CDN origins, and route based on authentication headers or geographic cookies.

SSL Termination

L7 balancers typically handle TLS termination. Clients establish encrypted connections to the LB, which decrypts traffic, makes routing decisions, then optionally re-encrypts for backend communication. This centralizes certificate management but means the LB sees plaintext traffic.

Here’s an L7 load balancer with path-based routing:

package main

import (
	"log"
	"net/http"
	"net/http/httputil"
	"net/url"
	"strings"
	"sync/atomic"
)

type BackendPool struct {
	backends []*url.URL
	current  uint64
}

func (p *BackendPool) next() *url.URL {
	idx := atomic.AddUint64(&p.current, 1)
	return p.backends[idx%uint64(len(p.backends))]
}

type L7LoadBalancer struct {
	routes map[string]*BackendPool
}

func NewL7LoadBalancer() *L7LoadBalancer {
	return &L7LoadBalancer{
		routes: make(map[string]*BackendPool),
	}
}

func (lb *L7LoadBalancer) AddRoute(prefix string, backends []string) {
	pool := &BackendPool{}
	for _, b := range backends {
		u, _ := url.Parse(b)
		pool.backends = append(pool.backends, u)
	}
	lb.routes[prefix] = pool
}

func (lb *L7LoadBalancer) findPool(path string) *BackendPool {
	// Match longest prefix first
	var bestMatch string
	var bestPool *BackendPool
	
	for prefix, pool := range lb.routes {
		if strings.HasPrefix(path, prefix) && len(prefix) > len(bestMatch) {
			bestMatch = prefix
			bestPool = pool
		}
	}
	return bestPool
}

func (lb *L7LoadBalancer) ServeHTTP(w http.ResponseWriter, r *http.Request) {
	pool := lb.findPool(r.URL.Path)
	if pool == nil {
		http.Error(w, "No backend available", http.StatusBadGateway)
		return
	}

	backend := pool.next()
	proxy := httputil.NewSingleHostReverseProxy(backend)
	
	// Preserve original host header for virtual hosting
	originalDirector := proxy.Director
	proxy.Director = func(req *http.Request) {
		originalDirector(req)
		req.Header.Set("X-Forwarded-Host", r.Host)
		req.Header.Set("X-Real-IP", r.RemoteAddr)
	}

	proxy.ServeHTTP(w, r)
}

func main() {
	lb := NewL7LoadBalancer()
	
	// Route API traffic to API servers
	lb.AddRoute("/api/", []string{
		"http://127.0.0.1:9001",
		"http://127.0.0.1:9002",
	})
	
	// Route static content to file servers
	lb.AddRoute("/static/", []string{
		"http://127.0.0.1:9003",
	})
	
	// Default route
	lb.AddRoute("/", []string{
		"http://127.0.0.1:9004",
		"http://127.0.0.1:9005",
	})

	log.Println("L7 Load Balancer listening on :8080")
	log.Fatal(http.ListenAndServe(":8080", lb))
}

Load Balancing Algorithms

Round-Robin: Requests distributed sequentially. Simple but ignores server capacity differences.

Weighted Round-Robin: Assign weights based on server capacity. A server with weight 3 receives three times the traffic of weight 1.

Least Connections: Route to the server with fewest active connections. Better for long-lived connections with varying request durations.

IP Hash: Hash client IP to consistently route to the same backend. Provides sticky sessions without cookies.

Consistent Hashing: Distribute load using a hash ring. When backends change, only a fraction of keys remap. Essential for caching layers.

Here’s a consistent hashing implementation:

package main

import (
	"hash/crc32"
	"sort"
	"strconv"
	"sync"
)

type ConsistentHash struct {
	ring       map[uint32]string
	sortedKeys []uint32
	vnodes     int
	mu         sync.RWMutex
}

func NewConsistentHash(vnodes int) *ConsistentHash {
	return &ConsistentHash{
		ring:   make(map[uint32]string),
		vnodes: vnodes,
	}
}

func (ch *ConsistentHash) hash(key string) uint32 {
	return crc32.ChecksumIEEE([]byte(key))
}

func (ch *ConsistentHash) AddNode(node string) {
	ch.mu.Lock()
	defer ch.mu.Unlock()

	for i := 0; i < ch.vnodes; i++ {
		vkey := node + "-" + strconv.Itoa(i)
		hash := ch.hash(vkey)
		ch.ring[hash] = node
		ch.sortedKeys = append(ch.sortedKeys, hash)
	}
	sort.Slice(ch.sortedKeys, func(i, j int) bool {
		return ch.sortedKeys[i] < ch.sortedKeys[j]
	})
}

func (ch *ConsistentHash) RemoveNode(node string) {
	ch.mu.Lock()
	defer ch.mu.Unlock()

	for i := 0; i < ch.vnodes; i++ {
		vkey := node + "-" + strconv.Itoa(i)
		hash := ch.hash(vkey)
		delete(ch.ring, hash)
	}

	// Rebuild sorted keys
	ch.sortedKeys = ch.sortedKeys[:0]
	for k := range ch.ring {
		ch.sortedKeys = append(ch.sortedKeys, k)
	}
	sort.Slice(ch.sortedKeys, func(i, j int) bool {
		return ch.sortedKeys[i] < ch.sortedKeys[j]
	})
}

func (ch *ConsistentHash) GetNode(key string) string {
	ch.mu.RLock()
	defer ch.mu.RUnlock()

	if len(ch.sortedKeys) == 0 {
		return ""
	}

	hash := ch.hash(key)
	idx := sort.Search(len(ch.sortedKeys), func(i int) bool {
		return ch.sortedKeys[i] >= hash
	})

	if idx >= len(ch.sortedKeys) {
		idx = 0
	}

	return ch.ring[ch.sortedKeys[idx]]
}

Virtual nodes (vnodes) ensure even distribution. Without them, adding or removing a node can cause significant imbalance.

Health Checks and Failover

Load balancers must detect unhealthy backends and stop routing traffic to them.

Active Health Checks: The LB periodically probes backends (HTTP GET, TCP connect, or custom scripts). Configure intervals, timeouts, and failure thresholds.

Passive Health Checks: Monitor real traffic for errors. If a backend returns too many 5xx responses, mark it unhealthy.

package main

import (
	"log"
	"net/http"
	"sync"
	"time"
)

type Backend struct {
	URL             string
	Healthy         bool
	FailureCount    int
	FailureThreshold int
	mu              sync.RWMutex
}

type HealthChecker struct {
	backends []*Backend
	interval time.Duration
	timeout  time.Duration
	client   *http.Client
}

func NewHealthChecker(backends []*Backend, interval, timeout time.Duration) *HealthChecker {
	return &HealthChecker{
		backends: backends,
		interval: interval,
		timeout:  timeout,
		client:   &http.Client{Timeout: timeout},
	}
}

func (hc *HealthChecker) checkBackend(b *Backend) {
	resp, err := hc.client.Get(b.URL + "/health")
	
	b.mu.Lock()
	defer b.mu.Unlock()

	if err != nil || resp.StatusCode >= 500 {
		b.FailureCount++
		if b.FailureCount >= b.FailureThreshold {
			if b.Healthy {
				log.Printf("Backend %s marked unhealthy", b.URL)
			}
			b.Healthy = false
		}
		return
	}
	resp.Body.Close()

	if !b.Healthy {
		log.Printf("Backend %s recovered", b.URL)
	}
	b.Healthy = true
	b.FailureCount = 0
}

func (hc *HealthChecker) Start() {
	ticker := time.NewTicker(hc.interval)
	for range ticker.C {
		for _, b := range hc.backends {
			go hc.checkBackend(b)
		}
	}
}

func (hc *HealthChecker) GetHealthyBackends() []*Backend {
	var healthy []*Backend
	for _, b := range hc.backends {
		b.mu.RLock()
		if b.Healthy {
			healthy = append(healthy, b)
		}
		b.mu.RUnlock()
	}
	return healthy
}

Connection Draining: When removing a backend, stop sending new connections but allow existing ones to complete. This prevents dropped requests during deployments.

Architecture Decision Guide

Choose L4 when:

You need maximum throughput with minimal latency
Traffic is non-HTTP (databases, game servers, custom protocols)
You don’t need content-based routing
SSL passthrough is acceptable

Choose L7 when:

You need path-based or header-based routing
SSL termination should happen at the load balancer
You want to add/modify headers (X-Forwarded-For, etc.)
Caching, compression, or rate limiting at the LB level is required

Hybrid Approach: Many production systems use both. An L4 load balancer at the edge handles raw TCP distribution across multiple L7 load balancers. The L7 tier then performs intelligent routing to application backends. This gives you the throughput of L4 with the flexibility of L7.

Production Considerations

SSL/TLS: Decide between termination (decrypt at LB) and passthrough (decrypt at backend). Termination simplifies certificate management; passthrough provides end-to-end encryption.

DDoS Mitigation: L4 balancers can absorb volumetric attacks better due to lower per-connection overhead. Consider SYN cookies and connection rate limiting.

Observability: Expose metrics for connections, latency percentiles, error rates, and backend health. Structured logging with request IDs enables distributed tracing.

Scaling the Load Balancer: Use DNS round-robin or anycast for multiple LB instances. For cloud deployments, managed load balancers (AWS ALB/NLB, GCP Load Balancing) handle this automatically.

Tool Comparison:

HAProxy: Battle-tested, excellent L4/L7 support, configuration-based
NGINX: Great L7 features, familiar to web developers, Lua scripting
Envoy: Modern, designed for service mesh, excellent observability
Cloud LBs: Managed, auto-scaling, integrated with cloud ecosystems

The right choice depends on your specific requirements. Start with the simplest solution that meets your needs, and evolve as traffic patterns become clearer.