Design a Load Balancer: Layer 4 vs Layer 7
A load balancer distributes incoming network traffic across multiple backend servers to ensure no single server becomes overwhelmed. This serves two critical purposes: **scalability** (handle more...
Key Insights
- Layer 4 load balancers operate on TCP/UDP connections and offer superior throughput with minimal latency, making them ideal for high-volume, protocol-agnostic traffic like gaming servers or database connections.
- Layer 7 load balancers inspect application-layer content (HTTP headers, URLs, cookies), enabling intelligent routing decisions at the cost of additional processing overhead.
- Most production architectures benefit from a hybrid approach: L4 load balancers at the edge for raw performance, with L7 load balancers behind them for application-aware routing.
What is a Load Balancer?
A load balancer distributes incoming network traffic across multiple backend servers to ensure no single server becomes overwhelmed. This serves two critical purposes: scalability (handle more traffic by adding servers) and availability (if one server fails, others continue serving requests).
The distinction between Layer 4 and Layer 7 comes from the OSI model. Layer 4 (Transport) deals with TCP and UDP—it sees source/destination IPs and ports but nothing about the actual content. Layer 7 (Application) understands protocols like HTTP, meaning it can read headers, URLs, and even request bodies.
This difference fundamentally shapes what each type can do and what trade-offs you accept.
Layer 4 Load Balancing: Transport Level
L4 load balancers work with TCP/UDP packets. They make routing decisions based on:
- Source IP and port
- Destination IP and port
- Protocol type
The load balancer doesn’t decrypt TLS or parse HTTP. It simply forwards packets to a selected backend. This simplicity translates to raw speed—L4 balancers can handle millions of connections per second with microsecond latency overhead.
NAT Modes
SNAT (Source NAT): The load balancer rewrites the source IP to its own. Backend servers see all traffic coming from the LB, which simplifies return routing but loses client IP information (unless you use proxy protocol).
DSR (Direct Server Return): The load balancer only handles inbound traffic. Backends respond directly to clients, bypassing the LB on the return path. This dramatically increases throughput since the LB doesn’t process response traffic.
Here’s a simplified L4 load balancer in Go that demonstrates TCP connection forwarding:
package main
import (
"io"
"log"
"net"
"sync"
"sync/atomic"
)
type L4LoadBalancer struct {
backends []string
current uint64
}
func NewL4LoadBalancer(backends []string) *L4LoadBalancer {
return &L4LoadBalancer{backends: backends}
}
func (lb *L4LoadBalancer) nextBackend() string {
// Round-robin selection
idx := atomic.AddUint64(&lb.current, 1)
return lb.backends[idx%uint64(len(lb.backends))]
}
func (lb *L4LoadBalancer) handleConnection(clientConn net.Conn) {
defer clientConn.Close()
backend := lb.nextBackend()
backendConn, err := net.Dial("tcp", backend)
if err != nil {
log.Printf("Failed to connect to backend %s: %v", backend, err)
return
}
defer backendConn.Close()
var wg sync.WaitGroup
wg.Add(2)
// Client -> Backend
go func() {
defer wg.Done()
io.Copy(backendConn, clientConn)
}()
// Backend -> Client
go func() {
defer wg.Done()
io.Copy(clientConn, backendConn)
}()
wg.Wait()
}
func main() {
backends := []string{"127.0.0.1:8081", "127.0.0.1:8082", "127.0.0.1:8083"}
lb := NewL4LoadBalancer(backends)
listener, err := net.Listen("tcp", ":8080")
if err != nil {
log.Fatal(err)
}
log.Println("L4 Load Balancer listening on :8080")
for {
conn, err := listener.Accept()
if err != nil {
log.Printf("Accept error: %v", err)
continue
}
go lb.handleConnection(conn)
}
}
This implementation uses io.Copy for efficient bidirectional streaming. In production, you’d add connection pooling, timeouts, and health checks.
Layer 7 Load Balancing: Application Level
L7 load balancers terminate the connection and inspect application-layer content. For HTTP traffic, this means access to:
- URL paths and query parameters
- HTTP headers (Host, Cookie, Authorization)
- Request methods
- Request/response bodies
This enables powerful routing capabilities: send /api/* requests to your API servers, /static/* to CDN origins, and route based on authentication headers or geographic cookies.
SSL Termination
L7 balancers typically handle TLS termination. Clients establish encrypted connections to the LB, which decrypts traffic, makes routing decisions, then optionally re-encrypts for backend communication. This centralizes certificate management but means the LB sees plaintext traffic.
Here’s an L7 load balancer with path-based routing:
package main
import (
"log"
"net/http"
"net/http/httputil"
"net/url"
"strings"
"sync/atomic"
)
type BackendPool struct {
backends []*url.URL
current uint64
}
func (p *BackendPool) next() *url.URL {
idx := atomic.AddUint64(&p.current, 1)
return p.backends[idx%uint64(len(p.backends))]
}
type L7LoadBalancer struct {
routes map[string]*BackendPool
}
func NewL7LoadBalancer() *L7LoadBalancer {
return &L7LoadBalancer{
routes: make(map[string]*BackendPool),
}
}
func (lb *L7LoadBalancer) AddRoute(prefix string, backends []string) {
pool := &BackendPool{}
for _, b := range backends {
u, _ := url.Parse(b)
pool.backends = append(pool.backends, u)
}
lb.routes[prefix] = pool
}
func (lb *L7LoadBalancer) findPool(path string) *BackendPool {
// Match longest prefix first
var bestMatch string
var bestPool *BackendPool
for prefix, pool := range lb.routes {
if strings.HasPrefix(path, prefix) && len(prefix) > len(bestMatch) {
bestMatch = prefix
bestPool = pool
}
}
return bestPool
}
func (lb *L7LoadBalancer) ServeHTTP(w http.ResponseWriter, r *http.Request) {
pool := lb.findPool(r.URL.Path)
if pool == nil {
http.Error(w, "No backend available", http.StatusBadGateway)
return
}
backend := pool.next()
proxy := httputil.NewSingleHostReverseProxy(backend)
// Preserve original host header for virtual hosting
originalDirector := proxy.Director
proxy.Director = func(req *http.Request) {
originalDirector(req)
req.Header.Set("X-Forwarded-Host", r.Host)
req.Header.Set("X-Real-IP", r.RemoteAddr)
}
proxy.ServeHTTP(w, r)
}
func main() {
lb := NewL7LoadBalancer()
// Route API traffic to API servers
lb.AddRoute("/api/", []string{
"http://127.0.0.1:9001",
"http://127.0.0.1:9002",
})
// Route static content to file servers
lb.AddRoute("/static/", []string{
"http://127.0.0.1:9003",
})
// Default route
lb.AddRoute("/", []string{
"http://127.0.0.1:9004",
"http://127.0.0.1:9005",
})
log.Println("L7 Load Balancer listening on :8080")
log.Fatal(http.ListenAndServe(":8080", lb))
}
Load Balancing Algorithms
Round-Robin: Requests distributed sequentially. Simple but ignores server capacity differences.
Weighted Round-Robin: Assign weights based on server capacity. A server with weight 3 receives three times the traffic of weight 1.
Least Connections: Route to the server with fewest active connections. Better for long-lived connections with varying request durations.
IP Hash: Hash client IP to consistently route to the same backend. Provides sticky sessions without cookies.
Consistent Hashing: Distribute load using a hash ring. When backends change, only a fraction of keys remap. Essential for caching layers.
Here’s a consistent hashing implementation:
package main
import (
"hash/crc32"
"sort"
"strconv"
"sync"
)
type ConsistentHash struct {
ring map[uint32]string
sortedKeys []uint32
vnodes int
mu sync.RWMutex
}
func NewConsistentHash(vnodes int) *ConsistentHash {
return &ConsistentHash{
ring: make(map[uint32]string),
vnodes: vnodes,
}
}
func (ch *ConsistentHash) hash(key string) uint32 {
return crc32.ChecksumIEEE([]byte(key))
}
func (ch *ConsistentHash) AddNode(node string) {
ch.mu.Lock()
defer ch.mu.Unlock()
for i := 0; i < ch.vnodes; i++ {
vkey := node + "-" + strconv.Itoa(i)
hash := ch.hash(vkey)
ch.ring[hash] = node
ch.sortedKeys = append(ch.sortedKeys, hash)
}
sort.Slice(ch.sortedKeys, func(i, j int) bool {
return ch.sortedKeys[i] < ch.sortedKeys[j]
})
}
func (ch *ConsistentHash) RemoveNode(node string) {
ch.mu.Lock()
defer ch.mu.Unlock()
for i := 0; i < ch.vnodes; i++ {
vkey := node + "-" + strconv.Itoa(i)
hash := ch.hash(vkey)
delete(ch.ring, hash)
}
// Rebuild sorted keys
ch.sortedKeys = ch.sortedKeys[:0]
for k := range ch.ring {
ch.sortedKeys = append(ch.sortedKeys, k)
}
sort.Slice(ch.sortedKeys, func(i, j int) bool {
return ch.sortedKeys[i] < ch.sortedKeys[j]
})
}
func (ch *ConsistentHash) GetNode(key string) string {
ch.mu.RLock()
defer ch.mu.RUnlock()
if len(ch.sortedKeys) == 0 {
return ""
}
hash := ch.hash(key)
idx := sort.Search(len(ch.sortedKeys), func(i int) bool {
return ch.sortedKeys[i] >= hash
})
if idx >= len(ch.sortedKeys) {
idx = 0
}
return ch.ring[ch.sortedKeys[idx]]
}
Virtual nodes (vnodes) ensure even distribution. Without them, adding or removing a node can cause significant imbalance.
Health Checks and Failover
Load balancers must detect unhealthy backends and stop routing traffic to them.
Active Health Checks: The LB periodically probes backends (HTTP GET, TCP connect, or custom scripts). Configure intervals, timeouts, and failure thresholds.
Passive Health Checks: Monitor real traffic for errors. If a backend returns too many 5xx responses, mark it unhealthy.
package main
import (
"log"
"net/http"
"sync"
"time"
)
type Backend struct {
URL string
Healthy bool
FailureCount int
FailureThreshold int
mu sync.RWMutex
}
type HealthChecker struct {
backends []*Backend
interval time.Duration
timeout time.Duration
client *http.Client
}
func NewHealthChecker(backends []*Backend, interval, timeout time.Duration) *HealthChecker {
return &HealthChecker{
backends: backends,
interval: interval,
timeout: timeout,
client: &http.Client{Timeout: timeout},
}
}
func (hc *HealthChecker) checkBackend(b *Backend) {
resp, err := hc.client.Get(b.URL + "/health")
b.mu.Lock()
defer b.mu.Unlock()
if err != nil || resp.StatusCode >= 500 {
b.FailureCount++
if b.FailureCount >= b.FailureThreshold {
if b.Healthy {
log.Printf("Backend %s marked unhealthy", b.URL)
}
b.Healthy = false
}
return
}
resp.Body.Close()
if !b.Healthy {
log.Printf("Backend %s recovered", b.URL)
}
b.Healthy = true
b.FailureCount = 0
}
func (hc *HealthChecker) Start() {
ticker := time.NewTicker(hc.interval)
for range ticker.C {
for _, b := range hc.backends {
go hc.checkBackend(b)
}
}
}
func (hc *HealthChecker) GetHealthyBackends() []*Backend {
var healthy []*Backend
for _, b := range hc.backends {
b.mu.RLock()
if b.Healthy {
healthy = append(healthy, b)
}
b.mu.RUnlock()
}
return healthy
}
Connection Draining: When removing a backend, stop sending new connections but allow existing ones to complete. This prevents dropped requests during deployments.
Architecture Decision Guide
Choose L4 when:
- You need maximum throughput with minimal latency
- Traffic is non-HTTP (databases, game servers, custom protocols)
- You don’t need content-based routing
- SSL passthrough is acceptable
Choose L7 when:
- You need path-based or header-based routing
- SSL termination should happen at the load balancer
- You want to add/modify headers (X-Forwarded-For, etc.)
- Caching, compression, or rate limiting at the LB level is required
Hybrid Approach: Many production systems use both. An L4 load balancer at the edge handles raw TCP distribution across multiple L7 load balancers. The L7 tier then performs intelligent routing to application backends. This gives you the throughput of L4 with the flexibility of L7.
Production Considerations
SSL/TLS: Decide between termination (decrypt at LB) and passthrough (decrypt at backend). Termination simplifies certificate management; passthrough provides end-to-end encryption.
DDoS Mitigation: L4 balancers can absorb volumetric attacks better due to lower per-connection overhead. Consider SYN cookies and connection rate limiting.
Observability: Expose metrics for connections, latency percentiles, error rates, and backend health. Structured logging with request IDs enables distributed tracing.
Scaling the Load Balancer: Use DNS round-robin or anycast for multiple LB instances. For cloud deployments, managed load balancers (AWS ALB/NLB, GCP Load Balancing) handle this automatically.
Tool Comparison:
- HAProxy: Battle-tested, excellent L4/L7 support, configuration-based
- NGINX: Great L7 features, familiar to web developers, Lua scripting
- Envoy: Modern, designed for service mesh, excellent observability
- Cloud LBs: Managed, auto-scaling, integrated with cloud ecosystems
The right choice depends on your specific requirements. Start with the simplest solution that meets your needs, and evolve as traffic patterns become clearer.