System Design: Service Discovery Patterns

Hardcoded endpoints are the first thing that breaks when you move from a monolith to distributed services. That `http://localhost:8080` or even `http://user-service.internal:8080` in your...

Key Insights

  • Client-side discovery offers maximum flexibility and eliminates proxy bottlenecks, but pushes complexity into every service—use it when you need fine-grained control over load balancing and can standardize on a single framework.
  • Server-side discovery simplifies clients dramatically but introduces an additional network hop; it’s the right choice when running on managed infrastructure like Kubernetes or AWS ECS.
  • DNS-based discovery works surprisingly well for stable services with long-lived instances, but falls apart when you need sub-second failover due to TTL caching behavior.

The Service Discovery Problem

Hardcoded endpoints are the first thing that breaks when you move from a monolith to distributed services. That http://localhost:8080 or even http://user-service.internal:8080 in your configuration file assumes the service lives at a fixed location. In production, this assumption fails constantly.

Services scale horizontally. Containers restart with new IP addresses. Deployments roll through instances one by one. Cloud VMs get terminated and replaced. A service that existed at 10.0.1.45:8080 five minutes ago might now be running at 10.0.2.112:8080, 10.0.2.113:8080, and 10.0.2.114:8080.

Service discovery solves this by introducing indirection: instead of knowing where a service is, you know how to find where it is right now. The patterns for implementing this indirection vary significantly in complexity, operational overhead, and failure modes.

Client-Side Discovery Pattern

In client-side discovery, the consuming service takes responsibility for finding healthy instances. The client queries a service registry, receives a list of available endpoints, and applies its own load balancing logic to select one.

This pattern gives you maximum control. You can implement sophisticated load balancing—weighted round-robin, least connections, latency-based routing—without depending on infrastructure components. There’s no proxy in the middle adding latency or becoming a bottleneck.

The tradeoff is complexity in every client. Each service needs discovery logic, and that logic needs to handle registry failures, stale data, and connection errors gracefully.

Spring Cloud Netflix Eureka popularized this pattern in the Java ecosystem:

# application.yml - Eureka client configuration
eureka:
  client:
    serviceUrl:
      defaultZone: http://eureka-server:8761/eureka/
    registryFetchIntervalSeconds: 5
  instance:
    preferIpAddress: true
    leaseRenewalIntervalInSeconds: 10
    leaseExpirationDurationInSeconds: 30

spring:
  application:
    name: order-service
@Service
public class UserServiceClient {
    
    private final WebClient.Builder webClientBuilder;
    
    @Autowired
    private DiscoveryClient discoveryClient;
    
    public UserServiceClient(WebClient.Builder webClientBuilder) {
        this.webClientBuilder = webClientBuilder;
    }
    
    public User getUser(String userId) {
        // Direct discovery client usage
        List<ServiceInstance> instances = discoveryClient
            .getInstances("user-service");
        
        if (instances.isEmpty()) {
            throw new ServiceUnavailableException("user-service");
        }
        
        // Simple random selection - production code would use 
        // more sophisticated load balancing
        ServiceInstance instance = instances
            .get(ThreadLocalRandom.current().nextInt(instances.size()));
        
        String url = instance.getUri() + "/users/" + userId;
        
        return webClientBuilder.build()
            .get()
            .uri(url)
            .retrieve()
            .bodyToMono(User.class)
            .block();
    }
    
    // Or use @LoadBalanced WebClient for automatic discovery
    public User getUserWithLoadBalancer(String userId) {
        return webClientBuilder.build()
            .get()
            .uri("http://user-service/users/" + userId)  // logical name
            .retrieve()
            .bodyToMono(User.class)
            .block();
    }
}

The @LoadBalanced annotation integrates with Spring Cloud LoadBalancer to resolve logical service names automatically. Under the hood, it intercepts requests, queries Eureka, and rewrites URLs to actual instance addresses.

Server-Side Discovery Pattern

Server-side discovery moves the registry lookup behind a load balancer or API gateway. Clients send requests to a single known endpoint, and the infrastructure handles routing to healthy instances.

This dramatically simplifies clients—they just need to know one address. The complexity shifts to infrastructure, which is often managed by a platform team or cloud provider anyway.

Kubernetes implements this pattern natively:

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: user-service
  template:
    metadata:
      labels:
        app: user-service
    spec:
      containers:
      - name: user-service
        image: myregistry/user-service:v1.2.0
        ports:
        - containerPort: 8080
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 10
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 20
---
apiVersion: v1
kind: Service
metadata:
  name: user-service
spec:
  selector:
    app: user-service
  ports:
  - port: 80
    targetPort: 8080
  type: ClusterIP

With this configuration, any pod in the cluster can reach the user service at http://user-service or http://user-service.default.svc.cluster.local. The kube-proxy component maintains iptables rules (or IPVS entries) that load balance across healthy pods. The client code is blissfully unaware of discovery mechanics:

import requests

def get_user(user_id: str) -> dict:
    # Just use the service name - Kubernetes handles the rest
    response = requests.get(
        f"http://user-service/users/{user_id}",
        timeout=5.0
    )
    response.raise_for_status()
    return response.json()

Service Registry Implementations

The registry is the source of truth for service locations. Your choice of registry affects consistency guarantees, operational complexity, and integration options.

Consul offers service discovery plus key-value storage, with first-class health checking. It uses the Raft consensus protocol and supports multi-datacenter deployments.

etcd is a distributed key-value store that Kubernetes uses internally. It’s battle-tested at scale but requires you to build discovery abstractions on top.

ZooKeeper is the oldest option, originally built for Hadoop coordination. It’s powerful but operationally complex and showing its age.

Eureka is simple and works well in the Spring ecosystem but lacks features like health checking and multi-datacenter support.

Here’s a Consul registration with health checks in Go:

package main

import (
    "fmt"
    "log"
    "net/http"
    "os"
    "os/signal"
    "syscall"

    "github.com/hashicorp/consul/api"
)

func main() {
    // Create Consul client
    config := api.DefaultConfig()
    config.Address = "consul:8500"
    client, err := api.NewClient(config)
    if err != nil {
        log.Fatalf("Failed to create Consul client: %v", err)
    }

    serviceID := fmt.Sprintf("user-service-%s", os.Getenv("HOSTNAME"))
    
    // Register service with health check
    registration := &api.AgentServiceRegistration{
        ID:      serviceID,
        Name:    "user-service",
        Port:    8080,
        Address: os.Getenv("POD_IP"),
        Tags:    []string{"v1", "users"},
        Check: &api.AgentServiceCheck{
            HTTP:                           fmt.Sprintf("http://%s:8080/health", os.Getenv("POD_IP")),
            Interval:                       "10s",
            Timeout:                        "5s",
            DeregisterCriticalServiceAfter: "30s",
        },
        Meta: map[string]string{
            "version": "1.2.0",
            "region":  os.Getenv("AWS_REGION"),
        },
    }

    if err := client.Agent().ServiceRegister(registration); err != nil {
        log.Fatalf("Failed to register service: %v", err)
    }
    log.Printf("Registered service: %s", serviceID)

    // Start HTTP server
    http.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {
        w.WriteHeader(http.StatusOK)
        w.Write([]byte(`{"status": "healthy"}`))
    })
    
    go http.ListenAndServe(":8080", nil)

    // Graceful shutdown - deregister on termination
    sigChan := make(chan os.Signal, 1)
    signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
    <-sigChan

    log.Println("Deregistering service...")
    if err := client.Agent().ServiceDeregister(serviceID); err != nil {
        log.Printf("Failed to deregister: %v", err)
    }
}

The DeregisterCriticalServiceAfter setting is crucial—it removes instances that fail health checks for too long, preventing stale entries from accumulating.

DNS-Based Discovery

DNS is the original service discovery mechanism. It’s universally supported, requires no client libraries, and works across any language or framework.

Kubernetes CoreDNS automatically creates DNS records for services:

# From any pod in the cluster
$ nslookup user-service
Server:    10.96.0.10
Address:   10.96.0.10#53

Name:      user-service.default.svc.cluster.local
Address:   10.100.200.15

# For headless services, you get individual pod IPs
$ nslookup user-service-headless
Server:    10.96.0.10
Address:   10.96.0.10#53

Name:      user-service-headless.default.svc.cluster.local
Address:   10.244.1.5
Address:   10.244.2.8
Address:   10.244.1.6

The limitation is TTL caching. DNS clients cache responses, and even with low TTLs, you can’t guarantee immediate failover. If an instance dies, clients might keep trying to connect to it until their cache expires. For services that scale frequently or have short-lived instances, this creates reliability problems.

Service Mesh Approach

Service meshes like Istio and Linkerd inject sidecar proxies alongside each service instance. These proxies handle discovery, load balancing, retries, and observability transparently.

# Istio VirtualService for advanced routing
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: user-service
spec:
  hosts:
  - user-service
  http:
  - match:
    - headers:
        x-api-version:
          exact: "v2"
    route:
    - destination:
        host: user-service
        subset: v2
  - route:
    - destination:
        host: user-service
        subset: v1
      weight: 90
    - destination:
        host: user-service
        subset: v2
      weight: 10
    retries:
      attempts: 3
      perTryTimeout: 2s
      retryOn: 5xx,reset,connect-failure
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: user-service
spec:
  host: user-service
  trafficPolicy:
    connectionPool:
      http:
        h2UpgradePolicy: UPGRADE
        maxRequestsPerConnection: 100
    loadBalancer:
      simple: LEAST_REQUEST
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2

This configuration implements canary deployments, automatic retries, and least-request load balancing—all without changing application code. The Envoy sidecar handles everything.

The cost is operational complexity. Running a mesh means managing control plane components, debugging proxy configurations, and understanding a significant abstraction layer. For teams under 50 engineers or systems with fewer than 20 services, the overhead rarely pays off.

Choosing the Right Pattern

Use client-side discovery when you need fine-grained control over load balancing, your team has standardized on a framework that supports it well (Spring Cloud, for example), and you’re willing to accept the coupling between services and discovery infrastructure.

Use server-side discovery when running on Kubernetes or managed container services, when you want to minimize client complexity, or when your services are polyglot and you can’t standardize on a single discovery library.

Use DNS-based discovery for stable services with predictable scaling patterns, external service integration, or as a fallback mechanism.

Use a service mesh when you need advanced traffic management (canary deployments, circuit breaking, mutual TLS), you have the operational capacity to run it, and you’re at a scale where the observability benefits justify the complexity.

In practice, most organizations use hybrid approaches. Kubernetes services for internal communication, DNS for external dependencies, and perhaps a mesh for critical user-facing paths. Start simple, measure what breaks, and add complexity only when you have evidence it’s needed.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.