Redis Cluster: Horizontal Scaling

Key Insights

Redis Cluster automatically shards data across multiple nodes using 16,384 hash slots, enabling horizontal scaling beyond single-instance memory limits without application-level partitioning logic.
A production Redis Cluster requires at least 6 nodes (3 masters with 3 replicas) to provide automatic failover and high availability, with clients handling MOVED/ASK redirections transparently.
Multi-key operations only work when all keys map to the same hash slot using hash tags like {user:1000}:profile and {user:1000}:settings, making data modeling crucial for cluster deployments.

Introduction to Redis Cluster Architecture

Redis Cluster is Redis’s native solution for horizontal scaling and high availability. Unlike standalone Redis, which limits you to a single instance’s memory capacity (typically 25-50GB in production), Redis Cluster distributes data across multiple master nodes, allowing you to scale to hundreds of gigabytes or terabytes.

You should consider Redis Cluster when you’re approaching 70-80% memory utilization on a standalone instance, experiencing more than 50,000 operations per second, or need built-in high availability without Sentinel. The architecture uses automatic sharding through 16,384 hash slots, with each master node owning a subset of these slots.

The topology consists of master nodes (handling reads and writes) and replica nodes (providing redundancy). Each master should have at least one replica for automatic failover. When a master fails, the cluster automatically promotes a replica to master, maintaining availability without manual intervention.

# Basic cluster topology
# Master 1 (slots 0-5460) → Replica 1
# Master 2 (slots 5461-10922) → Replica 2  
# Master 3 (slots 10923-16383) → Replica 3

Setting Up a Redis Cluster

Redis Cluster requires a minimum of three master nodes to form a functional cluster. This isn’t arbitrary—it’s the minimum needed for the cluster to reach consensus during failover scenarios. For production, you’ll want six nodes: three masters and three replicas.

Here’s a complete Docker Compose setup that creates a 6-node cluster:

version: '3.8'
services:
  redis-node-1:
    image: redis:7.2-alpine
    command: redis-server --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes --port 6379
    ports:
      - "6379:6379"
    volumes:
      - redis-node-1-data:/data

  redis-node-2:
    image: redis:7.2-alpine
    command: redis-server --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes --port 6380
    ports:
      - "6380:6380"
    volumes:
      - redis-node-2-data:/data

  redis-node-3:
    image: redis:7.2-alpine
    command: redis-server --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes --port 6381
    ports:
      - "6381:6381"
    volumes:
      - redis-node-3-data:/data

  redis-node-4:
    image: redis:7.2-alpine
    command: redis-server --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes --port 6382
    ports:
      - "6382:6382"
    volumes:
      - redis-node-4-data:/data

  redis-node-5:
    image: redis:7.2-alpine
    command: redis-server --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes --port 6383
    ports:
      - "6383:6383"
    volumes:
      - redis-node-5-data:/data

  redis-node-6:
    image: redis:7.2-alpine
    command: redis-server --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes --port 6384
    ports:
      - "6384:6384"
    volumes:
      - redis-node-6-data:/data

volumes:
  redis-node-1-data:
  redis-node-2-data:
  redis-node-3-data:
  redis-node-4-data:
  redis-node-5-data:
  redis-node-6-data:

After starting the containers, create the cluster:

docker exec -it redis-node-1 redis-cli --cluster create \
  127.0.0.1:6379 127.0.0.1:6380 127.0.0.1:6381 \
  127.0.0.1:6382 127.0.0.1:6383 127.0.0.1:6384 \
  --cluster-replicas 1

The --cluster-replicas 1 flag tells Redis to create one replica for each master.

Data Distribution and Hash Slots

Redis Cluster uses CRC16 hashing to map keys to one of 16,384 hash slots. The algorithm is straightforward: HASH_SLOT = CRC16(key) mod 16384. This deterministic mapping ensures that the same key always goes to the same slot.

import binascii

def calculate_hash_slot(key):
    # Redis uses CRC16 XMODEM variant
    crc = binascii.crc_hqx(key.encode('utf-8'), 0)
    return crc % 16384

# Examples
print(f"user:1000 → slot {calculate_hash_slot('user:1000')}")  # slot 3918
print(f"user:2000 → slot {calculate_hash_slot('user:2000')}")  # slot 7537
print(f"order:5000 → slot {calculate_hash_slot('order:5000')}")  # slot 1584

The problem with this approach is multi-key operations. Commands like MGET, MSET, or transactions that span multiple keys will fail if the keys don’t belong to the same hash slot. This is where hash tags become critical.

Hash tags force multiple keys into the same slot by only hashing the content between curly braces:

# These all map to the same slot
print(f"{{user:1000}}:profile → slot {calculate_hash_slot('user:1000')}")
print(f"{{user:1000}}:settings → slot {calculate_hash_slot('user:1000')}")  
print(f"{{user:1000}}:orders → slot {calculate_hash_slot('user:1000')}")

Use hash tags strategically. Group related data that you’ll query together, but don’t overuse them—you’ll create hotspots where one node handles disproportionate traffic.

Client Configuration and Connection Handling

Cluster-aware clients automatically handle the complexity of routing requests to the correct node. When you connect to any cluster node, the client downloads the cluster topology and caches the slot-to-node mapping.

Here’s a Python example using redis-py-cluster:

from redis.cluster import RedisCluster

# Connect to cluster (only need one node, client discovers others)
rc = RedisCluster(
    host='localhost',
    port=6379,
    decode_responses=True,
    skip_full_coverage_check=True,
    max_connections_per_node=50
)

# Client automatically routes to correct node
rc.set('user:1000:profile', '{"name": "Alice"}')
rc.set('user:2000:profile', '{"name": "Bob"}')

# Multi-key operations require hash tags
rc.mset({
    '{user:1000}:profile': '{"name": "Alice"}',
    '{user:1000}:settings': '{"theme": "dark"}',
    '{user:1000}:cart': '[]'
})

# Get values (client handles routing)
profile = rc.get('{user:1000}:profile')

Node.js with ioredis handles clusters similarly:

const Redis = require('ioredis');

const cluster = new Redis.Cluster([
  { host: 'localhost', port: 6379 },
  { host: 'localhost', port: 6380 },
  { host: 'localhost', port: 6381 }
], {
  redisOptions: {
    maxRetriesPerRequest: 3,
    enableReadyCheck: true
  },
  clusterRetryStrategy: (times) => {
    return Math.min(times * 100, 2000);
  }
});

cluster.set('session:abc123', JSON.stringify({ userId: 1000 }));

When a client connects to the wrong node, Redis responds with a MOVED redirection containing the correct node’s address. The client updates its slot mapping and retries. During resharding, you’ll see ASK redirections, which are temporary.

Scaling Operations

Adding capacity to a cluster involves two steps: adding nodes and redistributing hash slots.

# Add new node to cluster
redis-cli --cluster add-node 127.0.0.1:6385 127.0.0.1:6379

# Add replica for the new master
redis-cli --cluster add-node 127.0.0.1:6386 127.0.0.1:6385 --cluster-slave

# Reshard slots to new node (redistribute 4096 slots)
redis-cli --cluster reshard 127.0.0.1:6379 \
  --cluster-from all \
  --cluster-to <new-node-id> \
  --cluster-slots 4096 \
  --cluster-yes

The resharding process migrates keys slot-by-slot. Redis uses the MIGRATE command internally, which is atomic and doesn’t block the cluster. During migration, clients may receive ASK redirections pointing to the destination node for keys that have already moved.

Removing nodes requires draining their slots first:

# Reshard all slots away from node
redis-cli --cluster reshard 127.0.0.1:6379 \
  --cluster-from <node-id-to-remove> \
  --cluster-to <destination-node-id> \
  --cluster-slots <slot-count> \
  --cluster-yes

# Remove the empty node
redis-cli --cluster del-node 127.0.0.1:6379 <node-id>

High Availability and Failover

Redis Cluster’s failover mechanism activates when a master becomes unreachable. The cluster uses a gossip protocol where nodes constantly exchange ping messages. When a majority of masters mark a node as PFAIL (possibly failing), it transitions to FAIL state.

At this point, replicas of the failed master compete to become the new master. The replica with the most recent replication offset wins and promotes itself. The entire process typically completes in 1-2 seconds.

Monitor cluster health with these commands:

# Cluster overview
redis-cli -c -p 6379 CLUSTER INFO

# Node status and slot distribution
redis-cli -c -p 6379 CLUSTER NODES

# Check specific node
redis-cli -c -p 6379 CLUSTER SLOTS

For manual failover (during maintenance):

# Connect to replica and force promotion
redis-cli -c -p 6382 CLUSTER FAILOVER

Performance Considerations and Best Practices

Redis Cluster isn’t always the right choice. Don’t use it if:

Your dataset fits comfortably in a single instance (< 20GB)
You heavily rely on multi-key operations without hash tags
Network latency between nodes exceeds 5ms
You need Lua scripts that access multiple keys across slots

Benchmark your specific workload:

# Standalone Redis
redis-benchmark -h localhost -p 6379 -t set,get -n 1000000 -q

# Cluster
redis-benchmark -h localhost -p 6379 -t set,get -n 1000000 -q --cluster

Monitor these metrics in production:

Slot distribution (should be roughly equal across masters)
Memory usage per node (watch for hotspots)
Network bandwidth between nodes
Failover frequency (should be rare)
Client redirection rate (high rates indicate topology issues)

Design your key schema with the cluster in mind. Use hash tags judiciously, avoid transactions spanning multiple slots, and test failover scenarios in staging. Redis Cluster is powerful when used correctly, but it introduces complexity that standalone Redis doesn’t have. Make sure you need that complexity before adopting it.