Rate Limiting: Protecting Against Brute Force

Key Insights

Rate limiting is your first line of defense against brute force attacks, but choosing the wrong algorithm or placement can leave gaps or frustrate legitimate users
Token bucket and sliding window algorithms offer the best balance of accuracy and performance for most applications, while Redis provides the distributed coordination needed for multi-server deployments
Effective rate limiting requires layered implementation—combine edge-level protection with application-aware limits that understand your authentication and API patterns

The Brute Force Threat

Every exposed endpoint is a target. Login forms get hammered with credential stuffing attacks using billions of leaked username/password combinations. APIs face enumeration attacks probing for valid user IDs or sensitive data. Password reset endpoints become vectors for account takeover.

The math is simple and terrifying: an attacker making 1,000 requests per second against your login endpoint can try 86 million password combinations per day. Without rate limiting, your authentication system is essentially an open invitation.

Rate limiting isn’t just about security—it’s about survival. A single attacker can exhaust your server resources, database connections, and third-party API quotas. Rate limiting protects your infrastructure while buying time for other security measures to kick in.

Rate Limiting Algorithms Explained

Not all rate limiting is created equal. The algorithm you choose affects memory usage, accuracy, and how users experience hitting limits.

Fixed Window divides time into discrete buckets (e.g., 100 requests per minute). Simple to implement, but suffers from boundary problems—a user can make 100 requests at 11:59:59 and another 100 at 12:00:01, effectively doubling their rate.

Sliding Window tracks requests over a rolling time period, eliminating boundary issues. More accurate but requires storing individual request timestamps.

Token Bucket grants tokens at a fixed rate up to a maximum bucket size. Users can burst up to the bucket capacity, then must wait for tokens to regenerate. This naturally handles legitimate traffic spikes while preventing sustained abuse.

Leaky Bucket processes requests at a constant rate, queuing excess requests. Great for smoothing traffic but adds latency, which isn’t ideal for user-facing endpoints.

For most applications, token bucket offers the best trade-off. Here’s a clean implementation:

import time
from dataclasses import dataclass
from threading import Lock

@dataclass
class TokenBucket:
    capacity: int
    refill_rate: float  # tokens per second
    tokens: float = None
    last_refill: float = None
    lock: Lock = None
    
    def __post_init__(self):
        self.tokens = float(self.capacity)
        self.last_refill = time.monotonic()
        self.lock = Lock()
    
    def consume(self, tokens: int = 1) -> bool:
        with self.lock:
            now = time.monotonic()
            elapsed = now - self.last_refill
            
            # Refill tokens based on elapsed time
            self.tokens = min(
                self.capacity,
                self.tokens + elapsed * self.refill_rate
            )
            self.last_refill = now
            
            if self.tokens >= tokens:
                self.tokens -= tokens
                return True
            return False
    
    def time_until_available(self, tokens: int = 1) -> float:
        if self.tokens >= tokens:
            return 0
        return (tokens - self.tokens) / self.refill_rate


# Usage
bucket = TokenBucket(capacity=10, refill_rate=1.0)  # 10 burst, 1/sec sustained

if bucket.consume():
    process_request()
else:
    retry_after = bucket.time_until_available()
    return rate_limit_response(retry_after)

Implementation Strategies by Layer

Rate limiting can happen at multiple points in your stack. Each has trade-offs.

CDN/Edge Level (Cloudflare, AWS WAF): Stops attacks before they hit your infrastructure. Lowest latency, but limited context about your application logic. Use for broad protection against volumetric attacks.

Reverse Proxy (Nginx, HAProxy): Efficient and language-agnostic. Good for per-IP limiting but lacks user context before authentication.

API Gateway: Combines infrastructure efficiency with API-key awareness. Ideal for protecting public APIs.

Application Code: Full context about users, endpoints, and business logic. More overhead but enables sophisticated rules.

The answer isn’t picking one—it’s layering them. Here’s Nginx configuration for first-line defense:

# Define rate limit zones
limit_req_zone $binary_remote_addr zone=ip_limit:10m rate=10r/s;
limit_req_zone $binary_remote_addr zone=login_limit:10m rate=1r/s;

server {
    # General API endpoints - allow bursting
    location /api/ {
        limit_req zone=ip_limit burst=20 nodelay;
        limit_req_status 429;
        proxy_pass http://backend;
    }
    
    # Strict limits on authentication
    location /api/auth/ {
        limit_req zone=login_limit burst=5;
        limit_req_status 429;
        proxy_pass http://backend;
    }
}

For application-level control, Express.js middleware provides flexibility:

const rateLimit = require('express-rate-limit');
const RedisStore = require('rate-limit-redis');
const Redis = require('ioredis');

const redis = new Redis(process.env.REDIS_URL);

// Strict limiter for authentication endpoints
const authLimiter = rateLimit({
  store: new RedisStore({
    sendCommand: (...args) => redis.call(...args),
  }),
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 5, // 5 attempts per window
  keyGenerator: (req) => {
    // Rate limit by IP + username combination
    const username = req.body?.username || 'anonymous';
    return `auth:${req.ip}:${username}`;
  },
  handler: (req, res) => {
    res.status(429).json({
      error: 'Too many login attempts',
      retryAfter: Math.ceil(req.rateLimit.resetTime / 1000),
    });
  },
});

app.post('/api/auth/login', authLimiter, loginHandler);

Designing Effective Rate Limit Rules

Different endpoints deserve different limits. A static asset endpoint can handle thousands of requests per second. A login endpoint should allow maybe five attempts per fifteen minutes.

Consider these tiers:

Endpoint Type	Rate Limit	Reasoning
Login/Password Reset	5/15min per IP+user	High-value target, low legitimate volume
API (authenticated)	100/min per user	Business-dependent, balance usability
API (public)	20/min per IP	Prevent scraping, encourage authentication
Webhooks	1000/min per source	High volume but trusted sources

The key challenge is identification. IP addresses are unreliable—NAT and VPNs mean many legitimate users share IPs, while attackers rotate through thousands. Use layered identification:

import hashlib
from redis import Redis

redis = Redis()

def get_rate_limit_keys(request, user=None):
    """Generate multiple rate limit keys for layered protection."""
    keys = []
    
    # Layer 1: IP-based (catches unsophisticated attacks)
    keys.append(f"rl:ip:{request.remote_addr}")
    
    # Layer 2: IP + User-Agent fingerprint
    fingerprint = hashlib.sha256(
        f"{request.remote_addr}:{request.headers.get('User-Agent', '')}".encode()
    ).hexdigest()[:16]
    keys.append(f"rl:fp:{fingerprint}")
    
    # Layer 3: Authenticated user (if available)
    if user:
        keys.append(f"rl:user:{user.id}")
    
    # Layer 4: API key (for API consumers)
    api_key = request.headers.get('X-API-Key')
    if api_key:
        keys.append(f"rl:api:{api_key}")
    
    return keys

def check_rate_limits(keys, limits):
    """Check all rate limit keys, fail if any exceeded."""
    for key, (max_requests, window_seconds) in zip(keys, limits):
        current = redis.get(key)
        if current and int(current) >= max_requests:
            ttl = redis.ttl(key)
            return False, ttl
    return True, 0

Handling Rate Limit Responses Gracefully

When you rate limit a request, communicate clearly. Use HTTP 429 (Too Many Requests), include a Retry-After header, and return a structured error body:

from flask import jsonify, make_response

def rate_limit_response(retry_after_seconds, limit_type="requests"):
    response = make_response(jsonify({
        "error": {
            "code": "RATE_LIMITED",
            "message": f"Too many {limit_type}. Please retry later.",
            "retryAfter": retry_after_seconds,
        }
    }), 429)
    
    response.headers['Retry-After'] = str(retry_after_seconds)
    response.headers['X-RateLimit-Limit'] = str(current_limit)
    response.headers['X-RateLimit-Remaining'] = '0'
    response.headers['X-RateLimit-Reset'] = str(int(time.time()) + retry_after_seconds)
    
    return response

Clients should implement exponential backoff with jitter:

async function fetchWithRetry(url, options, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, options);
    
    if (response.status !== 429) {
      return response;
    }
    
    const retryAfter = parseInt(response.headers.get('Retry-After') || '1');
    const jitter = Math.random() * 1000;
    const delay = (retryAfter * 1000 * Math.pow(2, attempt)) + jitter;
    
    console.log(`Rate limited. Retrying in ${delay}ms`);
    await new Promise(resolve => setTimeout(resolve, delay));
  }
  
  throw new Error('Max retries exceeded');
}

Advanced Techniques: Adaptive and Distributed Rate Limiting

Static limits are a starting point. Sophisticated systems adjust dynamically based on behavior patterns and coordinate across multiple servers.

Redis with Lua scripting provides atomic distributed rate limiting:

-- sliding_window_rate_limit.lua
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])

-- Remove old entries outside the window
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)

-- Count current requests in window
local count = redis.call('ZCARD', key)

if count < limit then
    -- Add current request
    redis.call('ZADD', key, now, now .. ':' .. math.random())
    redis.call('EXPIRE', key, window)
    return {1, limit - count - 1}  -- allowed, remaining
else
    -- Get oldest entry to calculate retry time
    local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
    local retry_after = window - (now - oldest[2])
    return {0, retry_after}  -- denied, retry_after
end

# Python wrapper
def sliding_window_limit(redis, key, window_seconds, max_requests):
    now = time.time()
    result = redis.eval(
        SLIDING_WINDOW_SCRIPT,
        1, key,
        now, window_seconds, max_requests
    )
    allowed, remaining_or_retry = result
    return bool(allowed), remaining_or_retry

Monitoring, Testing, and Tuning

Rate limiting without monitoring is flying blind. Track these metrics:

Rate limit hits by endpoint: Identifies attack targets and overly aggressive limits
Unique IPs hitting limits: Distinguishes attacks from misconfigured limits
False positive rate: Legitimate users blocked (requires user feedback channels)
Request latency impact: Ensure rate limiting doesn’t add significant overhead

Load test your limits before attackers do. Use tools like wrk or k6 to simulate attack patterns and verify your limits hold:

# Simulate brute force against login endpoint
wrk -t4 -c100 -d30s -R1000 \
  -s post_login.lua \
  https://api.example.com/auth/login

Start conservative and loosen based on data. It’s easier to increase limits for legitimate users than to recover from a successful brute force attack. Review your limits quarterly, correlating with support tickets about rate limiting and security incident patterns.

Rate limiting is never “done.” It’s an ongoing conversation between your security requirements and your users’ needs. Get the fundamentals right, instrument everything, and iterate.