Load Balancer Algorithms: Round Robin, Least Connections, Weighted
Load balancers distribute incoming traffic across multiple servers, but the algorithm that determines this distribution fundamentally impacts your system's performance, reliability, and cost...
Key Insights
- Round robin works well for homogeneous server pools with similar request patterns, but fails when servers have different capacities or requests have varying durations
- Least connections prevents overloading servers handling long-running requests, making it superior for applications with unpredictable processing times like video streaming or file uploads
- Weighted algorithms are essential for heterogeneous infrastructure—a server with 16 cores shouldn’t receive the same traffic as one with 4 cores
Understanding Load Balancer Algorithm Selection
Load balancers distribute incoming traffic across multiple servers, but the algorithm that determines this distribution fundamentally impacts your system’s performance, reliability, and cost efficiency. Choose the wrong algorithm, and you’ll see cascading failures as some servers get overwhelmed while others sit idle. Choose correctly, and you’ll maximize resource utilization while maintaining consistent response times.
The three core algorithms—round robin, least connections, and weighted variants—each solve different problems. Understanding their mechanics and trade-offs lets you match the algorithm to your specific workload characteristics.
Round Robin: Simple Sequential Distribution
Round robin cycles through your server pool sequentially, sending the first request to server A, the second to server B, the third to server C, then back to server A. It’s the simplest load balancing algorithm and requires minimal state tracking.
Here’s a basic implementation:
class RoundRobinBalancer:
def __init__(self, servers):
self.servers = servers
self.current_index = 0
def get_next_server(self):
server = self.servers[self.current_index]
self.current_index = (self.current_index + 1) % len(self.servers)
return server
# Usage
balancer = RoundRobinBalancer(['server1', 'server2', 'server3'])
for i in range(9):
print(f"Request {i+1} -> {balancer.get_next_server()}")
# Output:
# Request 1 -> server1
# Request 2 -> server2
# Request 3 -> server3
# Request 4 -> server1
# ...
Round robin excels when your servers are identical and requests take similar amounts of time to process. It provides perfect distribution fairness—each server receives exactly the same number of requests over time.
The problems emerge with heterogeneous workloads. If request 1 takes 10 seconds and request 2 takes 100 milliseconds, round robin doesn’t care. Server A could be processing five long-running requests while server B handles quick ones, leading to wildly different actual loads despite equal request counts.
Use round robin for stateless applications with predictable request durations: API endpoints that query cached data, static file servers, or microservices with consistent processing times.
Least Connections: Load-Aware Distribution
Least connections tracks active connections per server and routes new requests to the server currently handling the fewest connections. This adapts to actual server load rather than blindly distributing requests.
class LeastConnectionsBalancer:
def __init__(self, servers):
self.servers = {server: 0 for server in servers}
def get_next_server(self):
# Find server with minimum active connections
server = min(self.servers.items(), key=lambda x: x[1])[0]
self.servers[server] += 1
return server
def release_connection(self, server):
if server in self.servers and self.servers[server] > 0:
self.servers[server] -= 1
# Usage
balancer = LeastConnectionsBalancer(['server1', 'server2', 'server3'])
# Simulate varying request durations
print(f"Request 1 -> {balancer.get_next_server()}") # server1
print(f"Request 2 -> {balancer.get_next_server()}") # server2
print(f"Request 3 -> {balancer.get_next_server()}") # server3
# Request 1 completes
balancer.release_connection('server1')
print(f"Request 4 -> {balancer.get_next_server()}") # server1 (now has 0 connections)
# Current state: server1=1, server2=1, server3=1
Least connections shines when request processing times vary significantly. Consider a video transcoding service where some uploads take seconds and others take minutes. Round robin would eventually queue multiple long jobs on the same server while others finish their quick jobs and sit idle. Least connections naturally balances this by routing new requests away from busy servers.
The algorithm requires more overhead than round robin—the load balancer must track connection state and perform a minimum-finding operation for each request. For high-throughput systems handling tens of thousands of requests per second, this computational cost matters.
Use least connections for applications with unpredictable processing times: database query handlers, file processing services, WebSocket connections, or any scenario where connections persist for varying durations.
Weighted Algorithms: Handling Heterogeneous Infrastructure
Real infrastructure is rarely homogeneous. Your server pool might include older 4-core machines alongside new 32-core instances. Weighted algorithms assign each server a capacity value and distribute traffic proportionally.
Weighted round robin maintains the sequential distribution pattern but repeats servers according to their weights:
class WeightedRoundRobinBalancer:
def __init__(self, servers_with_weights):
# servers_with_weights: [('server1', 3), ('server2', 2), ('server3', 1)]
self.servers = []
for server, weight in servers_with_weights:
self.servers.extend([server] * weight)
self.current_index = 0
def get_next_server(self):
server = self.servers[self.current_index]
self.current_index = (self.current_index + 1) % len(self.servers)
return server
# Usage with 3:2:1 weight ratio
balancer = WeightedRoundRobinBalancer([
('server1', 3), # High-capacity server
('server2', 2), # Medium-capacity server
('server3', 1) # Low-capacity server
])
distribution = {}
for i in range(60):
server = balancer.get_next_server()
distribution[server] = distribution.get(server, 0) + 1
print(distribution)
# Output: {'server1': 30, 'server2': 20, 'server3': 10}
# Perfect 3:2:1 ratio
Weighted least connections combines both approaches—it considers active connections but adjusts the comparison by server capacity:
class WeightedLeastConnectionsBalancer:
def __init__(self, servers_with_weights):
self.servers = {server: {'weight': weight, 'connections': 0}
for server, weight in servers_with_weights}
def get_next_server(self):
# Calculate connection ratio (connections / weight)
# Lower ratio = more available capacity
server = min(self.servers.items(),
key=lambda x: x[1]['connections'] / x[1]['weight'])[0]
self.servers[server]['connections'] += 1
return server
def release_connection(self, server):
if server in self.servers and self.servers[server]['connections'] > 0:
self.servers[server]['connections'] -= 1
# Usage
balancer = WeightedLeastConnectionsBalancer([
('server1', 4), # Can handle 4x baseline load
('server2', 2), # Can handle 2x baseline load
('server3', 1) # Baseline capacity
])
Weight assignment should reflect actual server capacity. Common strategies include:
- CPU cores: Weight proportional to core count
- Memory: For memory-intensive applications
- Benchmarking: Run load tests and assign weights based on measured throughput
- Composite metrics: Combine CPU, memory, and network capacity
Performance Comparison and Selection Criteria
Here’s a simulation comparing all three algorithms under different load patterns:
import random
import time
class LoadBalancerSimulator:
def __init__(self, balancer, num_servers):
self.balancer = balancer
self.server_loads = {f'server{i}': [] for i in range(1, num_servers + 1)}
def simulate(self, num_requests, duration_generator):
for i in range(num_requests):
server = self.balancer.get_next_server()
duration = duration_generator()
self.server_loads[server].append(duration)
# Simulate connection release for least connections
if hasattr(self.balancer, 'release_connection'):
# Simplified: immediate release for simulation
self.balancer.release_connection(server)
# Calculate metrics
for server, loads in self.server_loads.items():
total_time = sum(loads)
avg_time = total_time / len(loads) if loads else 0
print(f"{server}: {len(loads)} requests, "
f"total time: {total_time:.2f}s, avg: {avg_time:.2f}s")
# Test with uniform request durations
print("=== Uniform Load (Round Robin) ===")
rr_balancer = RoundRobinBalancer(['server1', 'server2', 'server3'])
sim = LoadBalancerSimulator(rr_balancer, 3)
sim.simulate(300, lambda: random.uniform(0.1, 0.3))
# Test with variable request durations
print("\n=== Variable Load (Least Connections) ===")
lc_balancer = LeastConnectionsBalancer(['server1', 'server2', 'server3'])
sim = LoadBalancerSimulator(lc_balancer, 3)
sim.simulate(300, lambda: random.choice([0.1, 0.1, 0.1, 5.0])) # Occasional long request
Selection criteria:
| Scenario | Algorithm | Reasoning |
|---|---|---|
| Identical servers, predictable requests | Round Robin | Simplest, lowest overhead |
| Identical servers, variable request times | Least Connections | Adapts to actual load |
| Different server capacities | Weighted Round Robin | Proportional distribution |
| Different capacities + variable requests | Weighted Least Connections | Best utilization |
Production Implementation Considerations
Real-world load balancers need health checks to avoid routing traffic to failed servers:
class HealthAwareBalancer:
def __init__(self, servers):
self.servers = servers
self.health_status = {server: True for server in servers}
self.current_index = 0
def check_health(self, server):
# In production, this would be an actual HTTP health check
# For now, simulate with random failures
try:
# Simulate health check logic
return random.random() > 0.1 # 90% success rate
except:
return False
def get_next_server(self):
attempts = 0
max_attempts = len(self.servers)
while attempts < max_attempts:
server = self.servers[self.current_index]
self.current_index = (self.current_index + 1) % len(self.servers)
# Update health status
self.health_status[server] = self.check_health(server)
if self.health_status[server]:
return server
attempts += 1
raise Exception("No healthy servers available")
In production, use proven tools rather than rolling your own:
- Nginx: Supports round robin, least connections, IP hash, and weighted variants
- HAProxy: Advanced algorithms including least connections, source IP hashing, and custom logic
- Cloud load balancers: AWS ALB, Google Cloud Load Balancing, Azure Load Balancer
Configure session persistence (sticky sessions) when your application maintains server-side state. This ensures a user’s requests consistently route to the same backend server.
Choosing the Right Algorithm
Start with round robin for homogeneous infrastructure and predictable workloads. It’s simple, fast, and works well for stateless microservices.
Upgrade to least connections when you observe uneven server utilization or handle requests with variable processing times. The additional overhead is negligible compared to the performance gains.
Implement weighted algorithms when your infrastructure includes servers with different capacities. Don’t waste money running oversized instances if your load balancer treats them identically to smaller ones.
Monitor your actual traffic patterns and server utilization. The best algorithm for your system depends on your specific workload characteristics, and those change as your application evolves. What works today might need adjustment as request patterns shift or infrastructure scales.