Redis Persistence: RDB and AOF
Redis is fundamentally an in-memory database, which makes it blazingly fast. But memory is volatile—when your Redis server restarts, everything vanishes unless you've configured persistence. This...
Key Insights
- RDB provides compact point-in-time snapshots ideal for backups but risks losing data between intervals, while AOF logs every write operation for maximum durability at the cost of larger files and slower restarts.
- Redis 4.0+ hybrid persistence combines RDB’s fast recovery with AOF’s durability by using snapshots as a base and replaying recent append-only logs during restart.
- Production deployments should enable hybrid persistence with
appendfsync everysec, implement automated backups to separate storage, and monitor persistence metrics to detect rewrite failures or fork issues.
Introduction to Redis Persistence
Redis is fundamentally an in-memory database, which makes it blazingly fast. But memory is volatile—when your Redis server restarts, everything vanishes unless you’ve configured persistence. This creates a classic trade-off: pure in-memory operation gives you maximum performance, while persistence adds durability at the cost of some overhead.
Redis offers two persistence mechanisms that you can use independently or together: RDB (Redis Database) snapshots and AOF (Append-Only File) logging. Understanding how each works, their performance characteristics, and when to use them is critical for production deployments where data loss is unacceptable.
RDB (Redis Database) Snapshots
RDB persistence creates point-in-time snapshots of your dataset at specified intervals. Redis forks a child process that writes the entire dataset to a binary dump file (typically dump.rdb) while the parent process continues serving requests. This copy-on-write mechanism means snapshots happen in the background without blocking client operations.
The snapshot frequency is controlled by save directives in your redis.conf:
# Save if at least 1 key changed in 900 seconds
save 900 1
# Save if at least 10 keys changed in 300 seconds
save 300 10
# Save if at least 10000 keys changed in 60 seconds
save 60 10000
These rules are evaluated continuously. If any condition is met, Redis triggers a background save. You can also trigger snapshots manually:
# Blocking save (don't use in production)
redis-cli SAVE
# Background save (recommended)
redis-cli BGSAVE
Here’s a Python script for automated RDB backups with rotation:
import redis
import time
import shutil
from datetime import datetime
from pathlib import Path
def backup_rdb(redis_client, backup_dir, retention_days=7):
"""Create RDB backup with timestamp and cleanup old backups"""
backup_path = Path(backup_dir)
backup_path.mkdir(exist_ok=True)
# Trigger background save
redis_client.bgsave()
# Wait for save to complete
while redis_client.info('persistence')['rdb_bgsave_in_progress']:
time.sleep(1)
# Get RDB file location
config = redis_client.config_get('dir')
rdb_dir = Path(config['dir'])
rdb_file = rdb_dir / 'dump.rdb'
# Copy with timestamp
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
backup_file = backup_path / f'dump_{timestamp}.rdb'
shutil.copy2(rdb_file, backup_file)
# Cleanup old backups
cutoff = time.time() - (retention_days * 86400)
for old_backup in backup_path.glob('dump_*.rdb'):
if old_backup.stat().st_mtime < cutoff:
old_backup.unlink()
return backup_file
# Usage
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
backup_file = backup_rdb(r, '/var/backups/redis')
print(f"Backup created: {backup_file}")
RDB Advantages:
- Compact single-file format, perfect for backups
- Faster restarts compared to AOF (binary format loads quickly)
- Minimal performance impact during normal operation
- Good for disaster recovery and replication
RDB Disadvantages:
- Data loss potential between snapshots (if Redis crashes, you lose changes since last save)
- Fork operation can cause latency spikes on large datasets
- Not suitable when you can’t afford to lose any data
AOF (Append-Only File)
AOF takes a different approach: it logs every write operation received by the server to an append-only file. When Redis restarts, it replays these operations to reconstruct the dataset. This provides much better durability than RDB snapshots.
The critical configuration is the fsync policy, which controls when data is actually written to disk:
# Enable AOF
appendonly yes
appendfilename "appendonly.aof"
# Fsync policy (choose one)
appendfsync always # Fsync after every write (slowest, safest)
appendfsync everysec # Fsync every second (good balance)
appendfsync no # Let OS decide (fastest, least safe)
The everysec policy is the recommended default—it provides good durability (at most 1 second of data loss) with minimal performance impact.
AOF files grow continuously, so Redis includes an automatic rewrite mechanism that rebuilds the AOF from the current dataset, eliminating redundant operations:
# Trigger rewrite when AOF is 100% larger than after last rewrite
auto-aof-rewrite-percentage 100
# Don't rewrite if AOF is smaller than 64MB
auto-aof-rewrite-min-size 64mb
You can also trigger rewrites manually:
redis-cli BGREWRITEAOF
Here’s what an AOF file looks like:
*2
$6
SELECT
$1
0
*3
$3
SET
$5
mykey
$7
myvalue
*3
$4
INCR
$7
counter
$1
1
This is the Redis protocol format—human-readable but verbose. Each command is preserved exactly as received.
RDB vs AOF: Comparison and Use Cases
| Aspect | RDB | AOF |
|---|---|---|
| Durability | Lose data between snapshots | Lose at most 1 sec (everysec) |
| File Size | Compact binary format | Larger, grows continuously |
| Recovery Time | Fast (binary load) | Slower (replay operations) |
| Performance Impact | Periodic fork latency | Continuous write overhead |
| Use Case | Backups, can tolerate data loss | Critical data, max durability |
Here’s a benchmark comparing write performance:
import redis
import time
def benchmark_writes(redis_client, num_operations=100000):
"""Benchmark write performance"""
start = time.time()
for i in range(num_operations):
redis_client.set(f'key:{i}', f'value:{i}')
duration = time.time() - start
ops_per_sec = num_operations / duration
return {
'duration': duration,
'ops_per_sec': ops_per_sec
}
# Test different configurations
configs = {
'RDB only': {'appendonly': 'no'},
'AOF everysec': {'appendonly': 'yes', 'appendfsync': 'everysec'},
'AOF always': {'appendonly': 'yes', 'appendfsync': 'always'}
}
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
for name, config in configs.items():
# Apply config
for key, value in config.items():
r.config_set(key, value)
# Flush and benchmark
r.flushall()
result = benchmark_writes(r, 10000)
print(f"{name}:")
print(f" Operations/sec: {result['ops_per_sec']:.2f}")
print(f" Duration: {result['duration']:.2f}s\n")
When to use RDB:
- You can tolerate some data loss (5-15 minutes)
- You need fast restarts
- You want simple, compact backups
- Caching layer where data can be regenerated
When to use AOF:
- Data loss is unacceptable
- You need audit trail of operations
- Dataset changes frequently
- Transactional or financial data
Hybrid Persistence (RDB + AOF)
Since Redis 4.0, you can enable both mechanisms simultaneously. During a restart, Redis loads the RDB snapshot first (fast) then replays the AOF log for changes since the snapshot. This gives you RDB’s fast recovery with AOF’s durability.
Configuration for hybrid mode:
# Enable both mechanisms
appendonly yes
save 900 1
save 300 10
save 60 10000
# Use RDB format for AOF rewrites (hybrid mode)
aof-use-rdb-preamble yes
# Standard AOF settings
appendfsync everysec
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
With aof-use-rdb-preamble yes, when Redis rewrites the AOF, it writes an RDB snapshot at the beginning followed by incremental AOF data. This dramatically reduces AOF file size and restart time.
Recovery process demonstration:
import redis
import subprocess
import time
def demonstrate_hybrid_recovery():
"""Show hybrid persistence recovery"""
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
# Write test data
print("Writing test data...")
for i in range(1000):
r.set(f'key:{i}', f'value:{i}')
# Force save and AOF rewrite
print("Creating RDB snapshot...")
r.bgsave()
time.sleep(2)
print("Rewriting AOF...")
r.bgrewriteaof()
time.sleep(2)
# Write more data (only in AOF)
print("Writing additional data...")
for i in range(1000, 1100):
r.set(f'key:{i}', f'value:{i}')
# Check persistence info
info = r.info('persistence')
print(f"\nPersistence status:")
print(f" RDB last save: {info['rdb_last_save_time']}")
print(f" AOF size: {info['aof_current_size']} bytes")
print(f" AOF base size: {info['aof_base_size']} bytes")
return info
demonstrate_hybrid_recovery()
Best Practices and Production Recommendations
For production deployments, follow these guidelines:
- Enable hybrid persistence with
appendfsync everysecfor the best balance of durability and performance - Monitor persistence health continuously
- Implement off-server backups of RDB files
- Size your server to handle fork operations (need 2x memory headroom)
- Use separate disks for AOF writes when possible
Here’s a monitoring script:
import redis
from datetime import datetime
def check_persistence_health(redis_client):
"""Monitor Redis persistence status"""
info = redis_client.info('persistence')
issues = []
# Check RDB status
if info['rdb_last_bgsave_status'] != 'ok':
issues.append(f"RDB save failed: {info['rdb_last_bgsave_status']}")
# Check AOF status
if info.get('aof_enabled') and info.get('aof_last_bgrewrite_status') != 'ok':
issues.append(f"AOF rewrite failed: {info['aof_last_bgrewrite_status']}")
# Check for ongoing operations
if info['rdb_bgsave_in_progress']:
issues.append("RDB save in progress")
if info.get('aof_rewrite_in_progress'):
issues.append("AOF rewrite in progress")
# Check last save time
last_save = datetime.fromtimestamp(info['rdb_last_save_time'])
age_minutes = (datetime.now() - last_save).total_seconds() / 60
if age_minutes > 60:
issues.append(f"Last RDB save was {age_minutes:.0f} minutes ago")
return {
'healthy': len(issues) == 0,
'issues': issues,
'info': info
}
# Usage in monitoring system
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
health = check_persistence_health(r)
if not health['healthy']:
print("ALERT: Persistence issues detected:")
for issue in health['issues']:
print(f" - {issue}")
else:
print("Persistence healthy")
For critical production systems, configure hybrid persistence, implement automated backups to separate storage (S3, network storage), and monitor both RDB and AOF health metrics. Test your recovery procedures regularly—persistence configurations are worthless if you can’t actually restore from them when disaster strikes.