Load Balancer Algorithms: Round Robin, Least Connections, Weighted

Key Insights

Round robin works well for homogeneous server pools with similar request patterns, but fails when servers have different capacities or requests have varying durations
Least connections prevents overloading servers handling long-running requests, making it superior for applications with unpredictable processing times like video streaming or file uploads
Weighted algorithms are essential for heterogeneous infrastructure—a server with 16 cores shouldn’t receive the same traffic as one with 4 cores

Understanding Load Balancer Algorithm Selection

Load balancers distribute incoming traffic across multiple servers, but the algorithm that determines this distribution fundamentally impacts your system’s performance, reliability, and cost efficiency. Choose the wrong algorithm, and you’ll see cascading failures as some servers get overwhelmed while others sit idle. Choose correctly, and you’ll maximize resource utilization while maintaining consistent response times.

The three core algorithms—round robin, least connections, and weighted variants—each solve different problems. Understanding their mechanics and trade-offs lets you match the algorithm to your specific workload characteristics.

Round Robin: Simple Sequential Distribution

Round robin cycles through your server pool sequentially, sending the first request to server A, the second to server B, the third to server C, then back to server A. It’s the simplest load balancing algorithm and requires minimal state tracking.

Here’s a basic implementation:

class RoundRobinBalancer:
    def __init__(self, servers):
        self.servers = servers
        self.current_index = 0
    
    def get_next_server(self):
        server = self.servers[self.current_index]
        self.current_index = (self.current_index + 1) % len(self.servers)
        return server

# Usage
balancer = RoundRobinBalancer(['server1', 'server2', 'server3'])
for i in range(9):
    print(f"Request {i+1} -> {balancer.get_next_server()}")

# Output:
# Request 1 -> server1
# Request 2 -> server2
# Request 3 -> server3
# Request 4 -> server1
# ...

Round robin excels when your servers are identical and requests take similar amounts of time to process. It provides perfect distribution fairness—each server receives exactly the same number of requests over time.

The problems emerge with heterogeneous workloads. If request 1 takes 10 seconds and request 2 takes 100 milliseconds, round robin doesn’t care. Server A could be processing five long-running requests while server B handles quick ones, leading to wildly different actual loads despite equal request counts.

Use round robin for stateless applications with predictable request durations: API endpoints that query cached data, static file servers, or microservices with consistent processing times.

Least Connections: Load-Aware Distribution

Least connections tracks active connections per server and routes new requests to the server currently handling the fewest connections. This adapts to actual server load rather than blindly distributing requests.

class LeastConnectionsBalancer:
    def __init__(self, servers):
        self.servers = {server: 0 for server in servers}
    
    def get_next_server(self):
        # Find server with minimum active connections
        server = min(self.servers.items(), key=lambda x: x[1])[0]
        self.servers[server] += 1
        return server
    
    def release_connection(self, server):
        if server in self.servers and self.servers[server] > 0:
            self.servers[server] -= 1

# Usage
balancer = LeastConnectionsBalancer(['server1', 'server2', 'server3'])

# Simulate varying request durations
print(f"Request 1 -> {balancer.get_next_server()}")  # server1
print(f"Request 2 -> {balancer.get_next_server()}")  # server2
print(f"Request 3 -> {balancer.get_next_server()}")  # server3

# Request 1 completes
balancer.release_connection('server1')

print(f"Request 4 -> {balancer.get_next_server()}")  # server1 (now has 0 connections)

# Current state: server1=1, server2=1, server3=1

Least connections shines when request processing times vary significantly. Consider a video transcoding service where some uploads take seconds and others take minutes. Round robin would eventually queue multiple long jobs on the same server while others finish their quick jobs and sit idle. Least connections naturally balances this by routing new requests away from busy servers.

The algorithm requires more overhead than round robin—the load balancer must track connection state and perform a minimum-finding operation for each request. For high-throughput systems handling tens of thousands of requests per second, this computational cost matters.

Use least connections for applications with unpredictable processing times: database query handlers, file processing services, WebSocket connections, or any scenario where connections persist for varying durations.

Weighted Algorithms: Handling Heterogeneous Infrastructure

Real infrastructure is rarely homogeneous. Your server pool might include older 4-core machines alongside new 32-core instances. Weighted algorithms assign each server a capacity value and distribute traffic proportionally.

Weighted round robin maintains the sequential distribution pattern but repeats servers according to their weights:

class WeightedRoundRobinBalancer:
    def __init__(self, servers_with_weights):
        # servers_with_weights: [('server1', 3), ('server2', 2), ('server3', 1)]
        self.servers = []
        for server, weight in servers_with_weights:
            self.servers.extend([server] * weight)
        self.current_index = 0
    
    def get_next_server(self):
        server = self.servers[self.current_index]
        self.current_index = (self.current_index + 1) % len(self.servers)
        return server

# Usage with 3:2:1 weight ratio
balancer = WeightedRoundRobinBalancer([
    ('server1', 3),  # High-capacity server
    ('server2', 2),  # Medium-capacity server
    ('server3', 1)   # Low-capacity server
])

distribution = {}
for i in range(60):
    server = balancer.get_next_server()
    distribution[server] = distribution.get(server, 0) + 1

print(distribution)
# Output: {'server1': 30, 'server2': 20, 'server3': 10}
# Perfect 3:2:1 ratio

Weighted least connections combines both approaches—it considers active connections but adjusts the comparison by server capacity:

class WeightedLeastConnectionsBalancer:
    def __init__(self, servers_with_weights):
        self.servers = {server: {'weight': weight, 'connections': 0} 
                       for server, weight in servers_with_weights}
    
    def get_next_server(self):
        # Calculate connection ratio (connections / weight)
        # Lower ratio = more available capacity
        server = min(self.servers.items(), 
                    key=lambda x: x[1]['connections'] / x[1]['weight'])[0]
        self.servers[server]['connections'] += 1
        return server
    
    def release_connection(self, server):
        if server in self.servers and self.servers[server]['connections'] > 0:
            self.servers[server]['connections'] -= 1

# Usage
balancer = WeightedLeastConnectionsBalancer([
    ('server1', 4),  # Can handle 4x baseline load
    ('server2', 2),  # Can handle 2x baseline load
    ('server3', 1)   # Baseline capacity
])

Weight assignment should reflect actual server capacity. Common strategies include:

CPU cores: Weight proportional to core count
Memory: For memory-intensive applications
Benchmarking: Run load tests and assign weights based on measured throughput
Composite metrics: Combine CPU, memory, and network capacity

Performance Comparison and Selection Criteria

Here’s a simulation comparing all three algorithms under different load patterns:

import random
import time

class LoadBalancerSimulator:
    def __init__(self, balancer, num_servers):
        self.balancer = balancer
        self.server_loads = {f'server{i}': [] for i in range(1, num_servers + 1)}
    
    def simulate(self, num_requests, duration_generator):
        for i in range(num_requests):
            server = self.balancer.get_next_server()
            duration = duration_generator()
            self.server_loads[server].append(duration)
            
            # Simulate connection release for least connections
            if hasattr(self.balancer, 'release_connection'):
                # Simplified: immediate release for simulation
                self.balancer.release_connection(server)
        
        # Calculate metrics
        for server, loads in self.server_loads.items():
            total_time = sum(loads)
            avg_time = total_time / len(loads) if loads else 0
            print(f"{server}: {len(loads)} requests, "
                  f"total time: {total_time:.2f}s, avg: {avg_time:.2f}s")

# Test with uniform request durations
print("=== Uniform Load (Round Robin) ===")
rr_balancer = RoundRobinBalancer(['server1', 'server2', 'server3'])
sim = LoadBalancerSimulator(rr_balancer, 3)
sim.simulate(300, lambda: random.uniform(0.1, 0.3))

# Test with variable request durations
print("\n=== Variable Load (Least Connections) ===")
lc_balancer = LeastConnectionsBalancer(['server1', 'server2', 'server3'])
sim = LoadBalancerSimulator(lc_balancer, 3)
sim.simulate(300, lambda: random.choice([0.1, 0.1, 0.1, 5.0]))  # Occasional long request

Selection criteria:

Scenario	Algorithm	Reasoning
Identical servers, predictable requests	Round Robin	Simplest, lowest overhead
Identical servers, variable request times	Least Connections	Adapts to actual load
Different server capacities	Weighted Round Robin	Proportional distribution
Different capacities + variable requests	Weighted Least Connections	Best utilization

Production Implementation Considerations

Real-world load balancers need health checks to avoid routing traffic to failed servers:

class HealthAwareBalancer:
    def __init__(self, servers):
        self.servers = servers
        self.health_status = {server: True for server in servers}
        self.current_index = 0
    
    def check_health(self, server):
        # In production, this would be an actual HTTP health check
        # For now, simulate with random failures
        try:
            # Simulate health check logic
            return random.random() > 0.1  # 90% success rate
        except:
            return False
    
    def get_next_server(self):
        attempts = 0
        max_attempts = len(self.servers)
        
        while attempts < max_attempts:
            server = self.servers[self.current_index]
            self.current_index = (self.current_index + 1) % len(self.servers)
            
            # Update health status
            self.health_status[server] = self.check_health(server)
            
            if self.health_status[server]:
                return server
            
            attempts += 1
        
        raise Exception("No healthy servers available")

In production, use proven tools rather than rolling your own:

Nginx: Supports round robin, least connections, IP hash, and weighted variants
HAProxy: Advanced algorithms including least connections, source IP hashing, and custom logic
Cloud load balancers: AWS ALB, Google Cloud Load Balancing, Azure Load Balancer

Configure session persistence (sticky sessions) when your application maintains server-side state. This ensures a user’s requests consistently route to the same backend server.

Choosing the Right Algorithm

Start with round robin for homogeneous infrastructure and predictable workloads. It’s simple, fast, and works well for stateless microservices.

Upgrade to least connections when you observe uneven server utilization or handle requests with variable processing times. The additional overhead is negligible compared to the performance gains.

Implement weighted algorithms when your infrastructure includes servers with different capacities. Don’t waste money running oversized instances if your load balancer treats them identically to smaller ones.

Monitor your actual traffic patterns and server utilization. The best algorithm for your system depends on your specific workload characteristics, and those change as your application evolves. What works today might need adjustment as request patterns shift or infrastructure scales.