System Design: API Gateway Design

An API Gateway sits between your clients and your backend services, acting as the single entry point for all API traffic. Think of it as a smart reverse proxy that does far more than route requests.

Key Insights

  • An API Gateway is not optional at scale—it’s the control plane that transforms a chaotic mesh of client-to-service calls into a manageable, secure, and observable system.
  • The choice between single gateway, BFF, and multi-tier patterns depends on your client diversity and team autonomy; there’s no universal “best” approach.
  • Gateway resilience patterns (circuit breakers, rate limiting, timeouts) protect your entire system from cascading failures—implement them before you need them.

What is an API Gateway?

An API Gateway sits between your clients and your backend services, acting as the single entry point for all API traffic. Think of it as a smart reverse proxy that does far more than route requests.

Without a gateway, clients must know about every service, handle different protocols, manage authentication with each endpoint, and deal with the complexity of service discovery. This direct client-to-service communication works fine with three services. It becomes a nightmare with thirty.

The gateway pattern consolidates cross-cutting concerns—authentication, rate limiting, logging, protocol translation—into one layer. Your services focus on business logic. The gateway handles the operational complexity.

Core Responsibilities and Features

A production API Gateway handles several distinct responsibilities:

Request Routing maps incoming requests to backend services based on path, headers, or other criteria. This is the fundamental job.

Protocol Translation converts between protocols. Your mobile app speaks HTTP/JSON while your legacy service expects SOAP. The gateway bridges that gap.

Request/Response Transformation reshapes payloads. You might strip internal fields from responses, add correlation IDs to requests, or aggregate multiple service calls into a single response.

Cross-Cutting Concerns include authentication, authorization, rate limiting, caching, and observability. These apply to all traffic and belong at the edge.

Here’s a basic routing configuration for Kong:

# kong.yml
_format_version: "3.0"

services:
  - name: user-service
    url: http://user-service:8080
    routes:
      - name: user-routes
        paths:
          - /api/v1/users
        strip_path: false
        
  - name: order-service
    url: http://order-service:8080
    routes:
      - name: order-routes
        paths:
          - /api/v1/orders
        strip_path: false

  - name: inventory-service
    url: http://inventory-service:8080
    routes:
      - name: inventory-routes
        paths:
          - /api/v1/inventory
        methods:
          - GET
        strip_path: false

plugins:
  - name: rate-limiting
    config:
      minute: 100
      policy: local
  - name: cors
    config:
      origins:
        - "https://app.example.com"
      methods:
        - GET
        - POST
        - PUT
        - DELETE

This configuration routes traffic to three services, applies global rate limiting, and handles CORS. Without the gateway, each service would implement these concerns independently—and inconsistently.

Common Architecture Patterns

Three patterns dominate API Gateway design. Your choice depends on client diversity and organizational structure.

Single Gateway routes all traffic through one gateway instance (or cluster). Simple to operate, but becomes a bottleneck as teams grow. Every change requires coordination.

Backend for Frontend (BFF) creates dedicated gateways for each client type. Your mobile app, web app, and third-party integrations each get their own gateway tailored to their needs.

Multi-Tier Gateway combines an edge gateway for security and rate limiting with internal gateways for service-to-service communication. Common in large organizations with strict security requirements.

Here’s a BFF implementation showing how mobile and web clients might receive different responses:

// bff-router.ts
import express, { Request, Response } from 'express';
import axios from 'axios';

const app = express();

interface UserProfile {
  id: string;
  email: string;
  name: string;
  avatar: string;
  preferences: Record<string, unknown>;
  activityHistory: Array<unknown>;
}

interface MobileUserResponse {
  id: string;
  name: string;
  avatarUrl: string;
}

interface WebUserResponse {
  id: string;
  email: string;
  name: string;
  avatar: string;
  preferences: Record<string, unknown>;
  recentActivity: Array<unknown>;
}

// Mobile BFF - optimized for bandwidth and battery
app.get('/mobile/api/user/:id', async (req: Request, res: Response) => {
  const userId = req.params.id;
  
  // Fetch only what mobile needs
  const user = await axios.get<UserProfile>(`http://user-service/users/${userId}`);
  
  // Transform for mobile consumption
  const mobileResponse: MobileUserResponse = {
    id: user.data.id,
    name: user.data.name,
    avatarUrl: user.data.avatar, // Pre-sized for mobile
  };
  
  res.json(mobileResponse);
});

// Web BFF - richer data, aggregated calls
app.get('/web/api/user/:id', async (req: Request, res: Response) => {
  const userId = req.params.id;
  
  // Parallel fetch - web can handle more data
  const [user, preferences, activity] = await Promise.all([
    axios.get<UserProfile>(`http://user-service/users/${userId}`),
    axios.get(`http://preference-service/users/${userId}/preferences`),
    axios.get(`http://activity-service/users/${userId}/recent`),
  ]);
  
  // Aggregate into single response
  const webResponse: WebUserResponse = {
    id: user.data.id,
    email: user.data.email,
    name: user.data.name,
    avatar: user.data.avatar,
    preferences: preferences.data,
    recentActivity: activity.data.slice(0, 10),
  };
  
  res.json(webResponse);
});

app.listen(3000);

The BFF pattern shines when clients have fundamentally different needs. Mobile clients need smaller payloads and fewer round trips. Web clients can handle richer data. Third-party integrations need stable, versioned contracts.

Authentication and Security

The gateway is your security perimeter. It validates credentials, terminates TLS, and ensures only authenticated requests reach your services.

JWT validation is the most common pattern. The gateway validates tokens, extracts claims, and passes user context to downstream services via headers.

// jwt-middleware.ts
import { Request, Response, NextFunction } from 'express';
import jwt, { JwtPayload } from 'jsonwebtoken';
import jwksClient, { SigningKey } from 'jwks-rsa';

interface TokenPayload extends JwtPayload {
  sub: string;
  email: string;
  roles: string[];
}

const client = jwksClient({
  jwksUri: 'https://auth.example.com/.well-known/jwks.json',
  cache: true,
  cacheMaxAge: 600000, // 10 minutes
});

function getSigningKey(header: jwt.JwtHeader): Promise<string> {
  return new Promise((resolve, reject) => {
    if (!header.kid) {
      reject(new Error('No kid in token header'));
      return;
    }
    client.getSigningKey(header.kid, (err: Error | null, key?: SigningKey) => {
      if (err || !key) {
        reject(err || new Error('Key not found'));
        return;
      }
      resolve(key.getPublicKey());
    });
  });
}

export async function validateJwt(
  req: Request,
  res: Response,
  next: NextFunction
): Promise<void> {
  const authHeader = req.headers.authorization;
  
  if (!authHeader?.startsWith('Bearer ')) {
    res.status(401).json({ error: 'Missing or invalid authorization header' });
    return;
  }
  
  const token = authHeader.substring(7);
  
  try {
    const decoded = jwt.decode(token, { complete: true });
    if (!decoded || typeof decoded === 'string') {
      throw new Error('Invalid token structure');
    }
    
    const signingKey = await getSigningKey(decoded.header);
    
    const payload = jwt.verify(token, signingKey, {
      algorithms: ['RS256'],
      issuer: 'https://auth.example.com',
      audience: 'api.example.com',
    }) as TokenPayload;
    
    // Pass user context to downstream services
    req.headers['x-user-id'] = payload.sub;
    req.headers['x-user-email'] = payload.email;
    req.headers['x-user-roles'] = payload.roles.join(',');
    
    next();
  } catch (error) {
    res.status(401).json({ error: 'Invalid or expired token' });
  }
}

This middleware validates JWTs against a JWKS endpoint, caches signing keys, and propagates user context downstream. Your services trust these headers because they trust the gateway.

Rate Limiting and Throttling

Rate limiting protects your services from abuse and ensures fair resource allocation. The two dominant algorithms are token bucket and sliding window.

Token Bucket allows bursts up to bucket capacity, then enforces a steady rate. Good for APIs that need to handle occasional spikes.

Sliding Window provides smoother rate enforcement without allowing bursts. Better for strict rate limiting requirements.

Here’s a Redis-based sliding window implementation:

// rate-limiter.ts
import Redis from 'ioredis';
import { Request, Response, NextFunction } from 'express';

const redis = new Redis(process.env.REDIS_URL);

interface RateLimitConfig {
  windowMs: number;
  maxRequests: number;
}

interface RateLimitTier {
  [key: string]: RateLimitConfig;
}

const tierLimits: RateLimitTier = {
  free: { windowMs: 60000, maxRequests: 100 },
  pro: { windowMs: 60000, maxRequests: 1000 },
  enterprise: { windowMs: 60000, maxRequests: 10000 },
};

export async function slidingWindowRateLimit(
  req: Request,
  res: Response,
  next: NextFunction
): Promise<void> {
  const clientId = req.headers['x-api-key'] as string || req.ip || 'anonymous';
  const tier = (req.headers['x-client-tier'] as string) || 'free';
  const config = tierLimits[tier] || tierLimits.free;
  
  const now = Date.now();
  const windowStart = now - config.windowMs;
  const key = `ratelimit:${clientId}`;
  
  const multi = redis.multi();
  
  // Remove old entries outside the window
  multi.zremrangebyscore(key, 0, windowStart);
  
  // Count requests in current window
  multi.zcard(key);
  
  // Add current request
  multi.zadd(key, now.toString(), `${now}-${Math.random()}`);
  
  // Set expiry to clean up old keys
  multi.expire(key, Math.ceil(config.windowMs / 1000));
  
  const results = await multi.exec();
  const requestCount = (results?.[1]?.[1] as number) || 0;
  
  // Set rate limit headers
  res.setHeader('X-RateLimit-Limit', config.maxRequests);
  res.setHeader('X-RateLimit-Remaining', Math.max(0, config.maxRequests - requestCount - 1));
  res.setHeader('X-RateLimit-Reset', Math.ceil((now + config.windowMs) / 1000));
  
  if (requestCount >= config.maxRequests) {
    res.status(429).json({
      error: 'Rate limit exceeded',
      retryAfter: Math.ceil(config.windowMs / 1000),
    });
    return;
  }
  
  next();
}

This implementation uses Redis sorted sets for efficient sliding window counting. It supports tiered access levels and returns standard rate limit headers so clients can self-throttle.

Resilience Patterns

Your gateway must protect backend services from cascading failures. Three patterns are essential: circuit breakers, retries with backoff, and timeouts.

// circuit-breaker.ts
interface CircuitState {
  failures: number;
  lastFailure: number;
  state: 'closed' | 'open' | 'half-open';
}

class CircuitBreaker {
  private circuits: Map<string, CircuitState> = new Map();
  private readonly failureThreshold: number;
  private readonly resetTimeout: number;
  private readonly halfOpenRequests: number;

  constructor(
    failureThreshold = 5,
    resetTimeoutMs = 30000,
    halfOpenRequests = 3
  ) {
    this.failureThreshold = failureThreshold;
    this.resetTimeout = resetTimeoutMs;
    this.halfOpenRequests = halfOpenRequests;
  }

  async execute<T>(
    serviceId: string,
    request: () => Promise<T>,
    fallback?: () => T
  ): Promise<T> {
    const circuit = this.getCircuit(serviceId);
    
    if (circuit.state === 'open') {
      if (Date.now() - circuit.lastFailure > this.resetTimeout) {
        circuit.state = 'half-open';
        circuit.failures = 0;
      } else if (fallback) {
        return fallback();
      } else {
        throw new Error(`Circuit open for ${serviceId}`);
      }
    }

    try {
      const result = await this.withTimeout(request(), 5000);
      this.recordSuccess(serviceId);
      return result;
    } catch (error) {
      this.recordFailure(serviceId);
      if (fallback) {
        return fallback();
      }
      throw error;
    }
  }

  private async withTimeout<T>(promise: Promise<T>, ms: number): Promise<T> {
    const timeout = new Promise<never>((_, reject) =>
      setTimeout(() => reject(new Error('Request timeout')), ms)
    );
    return Promise.race([promise, timeout]);
  }

  private getCircuit(serviceId: string): CircuitState {
    if (!this.circuits.has(serviceId)) {
      this.circuits.set(serviceId, { failures: 0, lastFailure: 0, state: 'closed' });
    }
    return this.circuits.get(serviceId)!;
  }

  private recordSuccess(serviceId: string): void {
    const circuit = this.getCircuit(serviceId);
    circuit.failures = 0;
    circuit.state = 'closed';
  }

  private recordFailure(serviceId: string): void {
    const circuit = this.getCircuit(serviceId);
    circuit.failures++;
    circuit.lastFailure = Date.now();
    
    if (circuit.failures >= this.failureThreshold) {
      circuit.state = 'open';
    }
  }
}

export const breaker = new CircuitBreaker();

The circuit breaker prevents your gateway from hammering a failing service. After five failures, it opens the circuit and returns fallback responses for 30 seconds before attempting recovery.

Technology Selection and Trade-offs

Your gateway choice involves three trade-offs: control vs. operational burden, latency vs. features, and flexibility vs. vendor lock-in.

Self-Hosted Options:

  • Kong: Lua-based, extensive plugin ecosystem, PostgreSQL dependency
  • NGINX: Raw performance, limited built-in features, requires custom modules
  • Envoy: Modern, gRPC-native, steep learning curve, excellent observability

Managed Services:

  • AWS API Gateway: Deep AWS integration, pay-per-request, cold start latency
  • Azure APIM: Enterprise features, complex pricing, strong policy engine
  • Google Cloud Endpoints: Simple setup, limited customization

For most teams, I recommend starting with a managed service. The operational overhead of running your own gateway—high availability, zero-downtime deployments, certificate management—is substantial. Move to self-hosted when you hit the limits of managed offerings or when latency becomes critical.

If you choose self-hosted, Kong offers the best balance of features and operational simplicity. Envoy is superior for service mesh architectures where you need fine-grained traffic control.

The gateway is infrastructure. Treat it as such: automate everything, monitor aggressively, and plan for failure. Your entire system depends on it.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.