System Design: Back Pressure Handling
Back pressure is a flow control mechanism that allows consumers to signal producers to slow down when they can't keep up with incoming data. Think of it like a water pipe system: if you pump water...
Key Insights
- Back pressure is the mechanism that prevents fast producers from overwhelming slow consumers, and without it, your system will eventually collapse under load
- The four fundamental strategies—blocking, bounded buffering, dropping, and rate limiting—each have distinct trade-offs between latency, throughput, and data loss that must match your use case
- Effective back pressure requires end-to-end design: from API rate limits to message queue configurations to database connection pools, every component must participate in the flow control chain
What is Back Pressure?
Back pressure is a flow control mechanism that allows consumers to signal producers to slow down when they can’t keep up with incoming data. Think of it like a water pipe system: if you pump water faster than the drain can handle, pressure builds up. Without a release valve or flow regulator, pipes burst.
In software systems, “pipes bursting” means out-of-memory errors, cascading failures, and 3 AM pages. The fundamental problem is simple: producers and consumers rarely operate at the same speed. Networks have variable latency. Databases have connection limits. CPUs have finite cycles. When a fast upstream component meets a slow downstream component, something has to give.
Real-world scenarios where back pressure becomes critical:
- Message queues: A burst of events from user activity overwhelms your event processor
- API gateways: A viral moment sends 100x normal traffic to your service
- Streaming pipelines: A slow database write blocks a real-time analytics pipeline
- Microservices: Service A calls Service B, which calls Service C—any slowdown propagates upstream
Symptoms of Missing Back Pressure
Before diving into solutions, recognize the symptoms. Systems without proper back pressure handling exhibit predictable failure modes.
Memory exhaustion happens when unbounded queues or buffers grow indefinitely. The JVM runs out of heap, the container hits its memory limit, and the OOM killer terminates your process.
Cascading failures occur when one slow service causes upstream services to accumulate pending requests, eventually exhausting their resources too. This domino effect can take down entire clusters.
Timeout storms emerge when requests queue up so long that they exceed client timeouts. Clients retry, adding more load, creating a death spiral.
Here’s what a naive producer-consumer looks like without back pressure:
public class UnboundedProducerConsumer {
// This will grow until OOM
private final Queue<Event> queue = new LinkedList<>();
public void produce(Event event) {
// No limit, no waiting, just pile it on
queue.add(event);
}
public void consume() {
while (true) {
Event event = queue.poll();
if (event != null) {
// Slow processing: 100ms per event
processSlowly(event);
}
}
}
// Producer: 10,000 events/sec
// Consumer: 10 events/sec
// Result: Queue grows by 9,990 events/sec until memory exhaustion
}
This code works fine in testing with small loads. In production, it’s a time bomb.
Back Pressure Strategies
Four fundamental strategies exist for handling back pressure. Each makes different trade-offs.
Blocking/Synchronous
The producer blocks until the consumer is ready. This is the simplest approach and guarantees no data loss, but it couples producer throughput to consumer throughput.
Buffering with Bounds
Use a fixed-size buffer. When full, either block the producer or reject new items. This provides some elasticity for bursts while preventing unbounded growth.
Dropping
When overwhelmed, drop data. This could be random sampling, dropping oldest items, or intelligent load shedding. Appropriate when some data loss is acceptable.
Rate Limiting
Explicitly limit the rate of incoming requests using algorithms like token bucket or leaky bucket. This smooths out bursts and provides predictable throughput.
Here’s a bounded queue implementation demonstrating different overflow policies:
public class BoundedQueue<T> {
private final Queue<T> queue = new LinkedList<>();
private final int maxSize;
private final OverflowPolicy policy;
private final Lock lock = new ReentrantLock();
private final Condition notFull = lock.newCondition();
private final Condition notEmpty = lock.newCondition();
public enum OverflowPolicy {
BLOCK, // Wait until space available
DROP_NEW, // Reject incoming item
DROP_OLD // Remove oldest, add new
}
public BoundedQueue(int maxSize, OverflowPolicy policy) {
this.maxSize = maxSize;
this.policy = policy;
}
public boolean offer(T item) throws InterruptedException {
lock.lock();
try {
if (queue.size() >= maxSize) {
switch (policy) {
case BLOCK:
while (queue.size() >= maxSize) {
notFull.await();
}
break;
case DROP_NEW:
return false; // Reject the item
case DROP_OLD:
queue.poll(); // Remove oldest
break;
}
}
queue.add(item);
notEmpty.signal();
return true;
} finally {
lock.unlock();
}
}
public T take() throws InterruptedException {
lock.lock();
try {
while (queue.isEmpty()) {
notEmpty.await();
}
T item = queue.poll();
notFull.signal();
return item;
} finally {
lock.unlock();
}
}
}
Implementation Patterns
Reactive Streams and the request(n) Protocol
The Reactive Streams specification provides a standard for asynchronous stream processing with back pressure. The key insight is the request(n) method: consumers explicitly request how many items they can handle.
import reactor.core.publisher.Flux;
import reactor.core.scheduler.Schedulers;
import java.time.Duration;
public class ReactiveBackPressure {
public void demonstrateBackPressure() {
Flux.range(1, 1_000_000)
.doOnNext(i -> System.out.println("Produced: " + i))
// Back pressure strategy: buffer up to 256, then error
.onBackpressureBuffer(256,
dropped -> System.out.println("Dropped: " + dropped))
.publishOn(Schedulers.boundedElastic())
.doOnNext(i -> {
// Simulate slow consumer
try {
Thread.sleep(10);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
System.out.println("Consumed: " + i);
})
.subscribe();
}
// Alternative strategies:
// .onBackpressureDrop() - Silently drop items
// .onBackpressureLatest() - Keep only most recent
// .onBackpressureError() - Throw exception
}
Circuit Breakers as Back Pressure Signals
Circuit breakers serve as back pressure signals in distributed systems. When a downstream service is overwhelmed, the circuit opens, immediately rejecting requests instead of queuing them.
import io.github.resilience4j.circuitbreaker.CircuitBreaker;
import io.github.resilience4j.circuitbreaker.CircuitBreakerConfig;
import java.time.Duration;
public class CircuitBreakerBackPressure {
private final CircuitBreaker circuitBreaker;
public CircuitBreakerBackPressure() {
CircuitBreakerConfig config = CircuitBreakerConfig.custom()
.failureRateThreshold(50)
.slowCallRateThreshold(80)
.slowCallDurationThreshold(Duration.ofSeconds(2))
.waitDurationInOpenState(Duration.ofSeconds(30))
.slidingWindowSize(100)
.build();
this.circuitBreaker = CircuitBreaker.of("downstream-service", config);
}
public Response callDownstream(Request request) {
return circuitBreaker.executeSupplier(() -> {
// When circuit is open, this throws CallNotPermittedException
// This IS back pressure - fast failure instead of queuing
return downstreamClient.call(request);
});
}
}
Back Pressure in Distributed Systems
Distributed systems require explicit back pressure signaling across network boundaries.
HTTP 429 and Retry-After
REST APIs should return HTTP 429 (Too Many Requests) with a Retry-After header to signal back pressure:
from flask import Flask, jsonify, request
from functools import wraps
import time
from collections import defaultdict
app = Flask(__name__)
# Simple rate limiter using token bucket
class RateLimiter:
def __init__(self, rate: float, capacity: int):
self.rate = rate # tokens per second
self.capacity = capacity
self.tokens = defaultdict(lambda: capacity)
self.last_update = defaultdict(time.time)
def allow(self, key: str) -> tuple[bool, int]:
now = time.time()
elapsed = now - self.last_update[key]
self.last_update[key] = now
# Replenish tokens
self.tokens[key] = min(
self.capacity,
self.tokens[key] + elapsed * self.rate
)
if self.tokens[key] >= 1:
self.tokens[key] -= 1
return True, 0
else:
# Calculate retry-after
retry_after = int((1 - self.tokens[key]) / self.rate) + 1
return False, retry_after
limiter = RateLimiter(rate=10, capacity=20) # 10 req/sec, burst of 20
def rate_limited(f):
@wraps(f)
def decorated(*args, **kwargs):
client_ip = request.remote_addr
allowed, retry_after = limiter.allow(client_ip)
if not allowed:
response = jsonify({
"error": "Rate limit exceeded",
"retry_after": retry_after
})
response.status_code = 429
response.headers["Retry-After"] = str(retry_after)
response.headers["X-RateLimit-Remaining"] = "0"
return response
return f(*args, **kwargs)
return decorated
@app.route("/api/resource")
@rate_limited
def get_resource():
return jsonify({"data": "your resource"})
Kafka Consumer Lag
In Kafka, consumer lag (the difference between the latest offset and the consumer’s current offset) serves as a back pressure indicator. Monitor it and scale consumers accordingly.
Monitoring and Observability
You can’t manage what you can’t measure. Track these key metrics:
from prometheus_client import Counter, Gauge, Histogram, start_http_server
# Queue metrics
queue_depth = Gauge(
'message_queue_depth',
'Current number of items in queue',
['queue_name']
)
queue_capacity = Gauge(
'message_queue_capacity',
'Maximum queue capacity',
['queue_name']
)
# Back pressure events
items_rejected = Counter(
'items_rejected_total',
'Items rejected due to back pressure',
['queue_name', 'reason']
)
items_dropped = Counter(
'items_dropped_total',
'Items dropped due to overflow',
['queue_name']
)
# Processing metrics
processing_time = Histogram(
'item_processing_seconds',
'Time to process each item',
['queue_name'],
buckets=[.001, .005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10]
)
# Usage in your queue
class InstrumentedQueue:
def __init__(self, name: str, max_size: int):
self.name = name
self.max_size = max_size
self.queue = []
queue_capacity.labels(queue_name=name).set(max_size)
def offer(self, item) -> bool:
if len(self.queue) >= self.max_size:
items_rejected.labels(
queue_name=self.name,
reason='queue_full'
).inc()
return False
self.queue.append(item)
queue_depth.labels(queue_name=self.name).set(len(self.queue))
return True
Set alerts when queue depth exceeds 80% capacity or rejection rate spikes above baseline.
Trade-offs and Best Practices
Latency vs throughput: Blocking strategies minimize data loss but increase latency. Dropping strategies maintain low latency but lose data. Choose based on your requirements—financial transactions need different handling than analytics events.
Choose the right strategy:
- User-facing APIs: Rate limiting with clear 429 responses
- Event processing: Bounded buffers with monitoring
- Real-time analytics: Sampling/dropping is often acceptable
- Financial systems: Blocking with strict ordering guarantees
Common pitfalls to avoid:
- Unbounded queues: They always overflow eventually
- Silent drops: If you drop data, log and metric it
- Ignoring upstream: Back pressure must propagate all the way to the source
- Fixed timeouts: Use adaptive timeouts based on current load
Back pressure isn’t optional—it’s a fundamental requirement for any system that handles variable load. Design it in from the start, instrument it thoroughly, and test it under realistic conditions. Your 3 AM self will thank you.