Redis Sentinel: High Availability Setup

Key Insights

Redis Sentinel provides automatic failover and high availability by monitoring Redis master-replica topologies with a minimum of three Sentinel nodes to achieve quorum-based consensus
Proper Sentinel configuration requires understanding quorum (typically N/2 + 1), down-after-milliseconds for failure detection, and parallel-syncs to control replica synchronization during failover
Production deployments must distribute Sentinel nodes across failure domains and use Sentinel-aware clients that automatically discover the current master after failover events

Introduction to Redis Sentinel

Redis Sentinel solves a critical problem in production Redis deployments: the single point of failure inherent in standalone Redis instances. When your master Redis node crashes, your application loses its cache or data store, potentially causing cascading failures across your infrastructure.

Sentinel provides three essential capabilities: continuous monitoring of Redis instances, automatic failover when the master fails, and service discovery for clients to find the current master. Unlike manual failover processes that require human intervention (and the associated delays), Sentinel detects failures within seconds and promotes a replica to master automatically.

The system works through distributed consensus—multiple Sentinel processes monitor your Redis instances and agree on health status through quorum-based voting. This prevents false positives from network partitions or temporary issues that might affect a single Sentinel node.

Sentinel Architecture and Components

A minimum of three Sentinel nodes is required for production deployments. This isn’t arbitrary—it’s based on quorum mathematics. With three Sentinels, you need two to agree (quorum of 2) that the master is down before triggering failover. This configuration tolerates one Sentinel failure while maintaining the ability to achieve consensus.

The architecture consists of:

One Redis master (accepts writes)
Multiple Redis replicas (read-only copies of master data)
Multiple Sentinel processes (monitoring and coordination layer)

Here’s how the components relate:

┌─────────────┐         ┌─────────────┐         ┌─────────────┐
│  Sentinel1  │◄───────►│  Sentinel2  │◄───────►│  Sentinel3  │
└──────┬──────┘         └──────┬──────┘         └──────┬──────┘
       │                       │                       │
       └───────────┬───────────┴───────────┬───────────┘
                   ▼                       ▼
              ┌─────────┐             ┌─────────┐
              │ Master  │────────────►│ Replica │
              │ :6379   │             │ :6380   │
              └─────────┘             └─────────┘
                   │                       ▲
                   └──────────────────────►│
                                      ┌─────────┐
                                      │ Replica │
                                      │ :6381   │
                                      └─────────┘

Sentinels communicate using Redis’ pub/sub mechanism and the Sentinel protocol. They continuously exchange information about the master’s health, replica status, and their own availability.

Setting Up Redis Master and Replicas

Start with configuring the master Redis instance. Create a minimal production-ready configuration:

# redis-master.conf
port 6379
bind 0.0.0.0
protected-mode yes
requirepass "your-strong-password"
masterauth "your-strong-password"

# Persistence
save 900 1
save 300 10
save 60 10000
appendonly yes
appendfsync everysec

# Replication
min-replicas-to-write 1
min-replicas-max-lag 10

The min-replicas-to-write setting is critical—it prevents writes to the master if fewer than one replica is connected, protecting against data loss during network partitions.

For replica instances, the configuration is nearly identical with one key addition:

# redis-replica1.conf
port 6380
bind 0.0.0.0
protected-mode yes
requirepass "your-strong-password"
masterauth "your-strong-password"

# Replication
replicaof 127.0.0.1 6379

# Persistence
save 900 1
save 300 10
save 60 10000
appendonly yes
appendfsync everysec

For the second replica, change the port to 6381 and use the same replicaof directive pointing to the master.

Start the instances:

redis-server /path/to/redis-master.conf
redis-server /path/to/redis-replica1.conf
redis-server /path/to/redis-replica2.conf

Verify replication is working:

redis-cli -p 6379 -a "your-strong-password" INFO replication

You should see output like:

# Replication
role:master
connected_slaves:2
slave0:ip=127.0.0.1,port=6380,state=online,offset=1234,lag=0
slave1:ip=127.0.0.1,port=6381,state=online,offset=1234,lag=1

Configuring Redis Sentinel

Sentinel configuration requires careful tuning of several parameters that directly impact failover behavior. Create a Sentinel configuration file:

# sentinel.conf
port 26379
bind 0.0.0.0
protected-mode yes

# Monitor the master
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel auth-pass mymaster "your-strong-password"

# Failure detection
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000

# Replication during failover
sentinel parallel-syncs mymaster 1

# Notification scripts (optional)
# sentinel notification-script mymaster /path/to/notify.sh
# sentinel client-reconfig-script mymaster /path/to/reconfig.sh

Key parameters explained:

sentinel monitor mymaster 127.0.0.1 6379 2: Monitor master at this address, requiring quorum of 2 for failover
down-after-milliseconds 5000: Consider master down after 5 seconds of no response
failover-timeout 60000: Maximum time for failover completion (60 seconds)
parallel-syncs 1: Only one replica syncs from new master at a time (reduces load)

Create three Sentinel configuration files (sentinel1.conf, sentinel2.conf, sentinel3.conf) with different ports (26379, 26380, 26381).

Start the Sentinels:

redis-sentinel /path/to/sentinel1.conf
redis-sentinel /path/to/sentinel2.conf
redis-sentinel /path/to/sentinel3.conf

Verify Sentinel status:

redis-cli -p 26379 SENTINEL masters
redis-cli -p 26379 SENTINEL slaves mymaster
redis-cli -p 26379 SENTINEL sentinels mymaster

The output shows detailed information about the monitored master, its replicas, and peer Sentinels.

Testing Failover and Client Integration

Testing failover is essential before trusting Sentinel in production. Simulate a master failure:

# Stop the master
redis-cli -p 6379 -a "your-strong-password" DEBUG sleep 30
# Or kill the process
pkill -9 -f "redis-server.*6379"

Watch the Sentinel logs. You’ll see entries like:

+sdown master mymaster 127.0.0.1 6379
+odown master mymaster 127.0.0.1 6379 #quorum 2/2
+vote-for-leader abc123 2
+elected-leader master mymaster 127.0.0.1 6379
+failover-state-select-slave master mymaster 127.0.0.1 6379
+selected-slave slave 127.0.0.1:6380 mymaster
+failover-state-send-slaveof-noone slave 127.0.0.1:6380 mymaster
+failover-state-wait-promotion slave 127.0.0.1:6380 mymaster
+promoted-slave slave 127.0.0.1:6380 mymaster
+failover-end master mymaster 127.0.0.1 6379
+switch-master mymaster 127.0.0.1 6379 127.0.0.1 6380

Client applications must use Sentinel-aware connection libraries. Here’s a Python example using redis-py:

from redis.sentinel import Sentinel

# Connect to Sentinels
sentinel = Sentinel([
    ('localhost', 26379),
    ('localhost', 26380),
    ('localhost', 26381)
], socket_timeout=0.5)

# Get master connection
master = sentinel.master_for(
    'mymaster',
    socket_timeout=0.5,
    password='your-strong-password',
    db=0
)

# Get replica connection for reads
replica = sentinel.slave_for(
    'mymaster',
    socket_timeout=0.5,
    password='your-strong-password',
    db=0
)

# Use connections
master.set('key', 'value')
value = replica.get('key')

The client automatically discovers the current master by querying Sentinels. After failover, subsequent operations use the new master without application code changes.

Node.js example using ioredis:

const Redis = require('ioredis');

const sentinel = new Redis({
  sentinels: [
    { host: 'localhost', port: 26379 },
    { host: 'localhost', port: 26380 },
    { host: 'localhost', port: 26381 }
  ],
  name: 'mymaster',
  password: 'your-strong-password'
});

sentinel.set('key', 'value');
sentinel.get('key', (err, result) => {
  console.log(result);
});

Monitoring and Troubleshooting

Production Sentinel deployments require continuous monitoring. Use these commands to check cluster health:

# Get current master address
redis-cli -p 26379 SENTINEL get-master-addr-by-name mymaster

# Check if master is down
redis-cli -p 26379 SENTINEL ckquorum mymaster

# Get detailed master info
redis-cli -p 26379 SENTINEL master mymaster

# Monitor failover events in real-time
redis-cli -p 26379 SUBSCRIBE +switch-master

Create a monitoring script:

#!/usr/bin/env python3
import redis
import time

def check_sentinel_health(sentinel_hosts):
    for host, port in sentinel_hosts:
        try:
            r = redis.Redis(host=host, port=port)
            master_info = r.execute_command('SENTINEL', 'master', 'mymaster')
            status = dict(zip(master_info[::2], master_info[1::2]))
            
            print(f"Sentinel {host}:{port}")
            print(f"  Master: {status[b'ip'].decode()}:{status[b'port'].decode()}")
            print(f"  Status: {status[b'flags'].decode()}")
            print(f"  Replicas: {status[b'num-slaves'].decode()}")
            print(f"  Sentinels: {status[b'num-other-sentinels'].decode()}")
        except Exception as e:
            print(f"Error connecting to {host}:{port}: {e}")

sentinels = [('localhost', 26379), ('localhost', 26380), ('localhost', 26381)]
while True:
    check_sentinel_health(sentinels)
    time.sleep(10)

Common issues and solutions:

Split-brain scenarios: Occur when network partitions prevent Sentinels from communicating. Always deploy Sentinels across different availability zones or racks.

Flapping failovers: If down-after-milliseconds is too aggressive, temporary network issues trigger unnecessary failovers. Start with 5000ms and increase if needed.

Replica lag: Monitor replication offset. Large lags indicate the replica won’t have recent data if promoted.

Production Best Practices

Deploy Sentinels and Redis instances across failure domains. Never run all Sentinels on the same physical host or availability zone. A typical production setup uses:

Three Sentinel nodes in different availability zones
One master and two replicas across zones
Odd number of Sentinels (3, 5, or 7) for clear quorum

Set conservative timeouts initially:

down-after-milliseconds: 5000-10000ms
failover-timeout: 60000-180000ms

Use parallel-syncs 1 in production to prevent replica synchronization from overwhelming the new master during failover.

Configure notification scripts to alert your team when failovers occur. Even automatic failover deserves investigation to understand root causes.

Enable persistence on all instances, including replicas. Use both RDB snapshots and AOF for maximum durability. The configuration save 60 10000 with appendonly yes provides good balance.

Monitor Sentinel logs continuously. Failed quorum attempts, frequent reconnections, or repeated failovers indicate infrastructure problems requiring immediate attention.

Finally, test failover regularly in staging environments. Chaos engineering practices—randomly killing Redis instances—ensure your Sentinel configuration actually works when needed.