Redis Data Structures: Strings, Lists, Sets, Hashes, Sorted Sets

Key Insights

• Redis provides five core data structures—strings, lists, sets, hashes, and sorted sets—each optimized for specific access patterns and use cases that go far beyond simple key-value storage. • Understanding the O(n) complexity and memory characteristics of each data structure is critical for building performant applications, as the wrong choice can turn O(1) operations into O(n) scans. • Redis data structures support atomic operations that enable building distributed systems primitives like rate limiters, leaderboards, and session stores without external coordination.

Strings: The Foundation

Redis strings are binary-safe byte sequences up to 512MB, serving as the foundation for all Redis operations. Despite the name, they store any binary data—text, serialized JSON, Protocol Buffers, or raw bytes.

import redis

r = redis.Redis(host='localhost', port=6379, decode_responses=True)

# Basic operations
r.set('user:1000:name', 'Alice')
r.get('user:1000:name')  # 'Alice'

# Atomic increment - critical for counters
r.set('page:views', 0)
r.incr('page:views')  # Returns 1
r.incrby('page:views', 10)  # Returns 11

# Set with expiration (TTL)
r.setex('session:abc123', 3600, 'user_data')

# Set if not exists - distributed locking primitive
acquired = r.setnx('lock:resource', 'owner_id')
if acquired:
    # Critical section
    r.delete('lock:resource')

Strings excel at caching, session storage, and atomic counters. Use INCR/DECR for rate limiting:

def rate_limit(user_id, limit=100, window=60):
    key = f'rate:{user_id}:{int(time.time() // window)}'
    current = r.incr(key)
    if current == 1:
        r.expire(key, window)
    return current <= limit

The GETSET operation atomically sets a new value and returns the old one, useful for implementing flags:

# Atomic flag swap
old_value = r.getset('maintenance_mode', 'true')

Lists: Ordered Collections

Redis lists are linked lists of strings, optimized for operations at both ends. They maintain insertion order and allow duplicate elements.

# Push operations - O(1)
r.lpush('queue:tasks', 'task1', 'task2')  # Left push
r.rpush('queue:tasks', 'task3')  # Right push

# Pop operations - O(1)
task = r.lpop('queue:tasks')  # Returns 'task2'
task = r.rpop('queue:tasks')  # Returns 'task3'

# Blocking pop - waits until element available
task = r.blpop('queue:tasks', timeout=5)  # Blocks up to 5 seconds

# Range access - O(n)
recent_items = r.lrange('feed:user:1000', 0, 9)  # First 10 items

# Trim to maintain fixed size - O(n)
r.lpush('logs:app', new_log)
r.ltrim('logs:app', 0, 999)  # Keep only 1000 most recent

Lists are ideal for queues, activity feeds, and recent item lists. Implement a simple job queue:

class JobQueue:
    def __init__(self, redis_client, queue_name):
        self.r = redis_client
        self.queue = queue_name
    
    def enqueue(self, job_data):
        self.r.rpush(self.queue, json.dumps(job_data))
    
    def dequeue(self, timeout=0):
        result = self.r.blpop(self.queue, timeout=timeout)
        if result:
            _, job_data = result
            return json.loads(job_data)
        return None
    
    def size(self):
        return self.r.llen(self.queue)

The LINSERT command allows insertion before or after specific elements, but requires O(n) traversal:

r.linsert('mylist', 'BEFORE', 'pivot_value', 'new_value')

Sets: Unique Unordered Collections

Sets store unique strings with O(1) membership testing. They’re unordered and support powerful set algebra operations.

# Add members - O(1)
r.sadd('tags:article:100', 'python', 'redis', 'database')
r.sadd('tags:article:101', 'python', 'django')

# Membership test - O(1)
exists = r.sismember('tags:article:100', 'python')  # True

# Get all members - O(n)
all_tags = r.smembers('tags:article:100')

# Remove member - O(1)
r.srem('tags:article:100', 'database')

# Set operations
common_tags = r.sinter('tags:article:100', 'tags:article:101')  # {'python'}
all_tags = r.sunion('tags:article:100', 'tags:article:101')
unique_to_100 = r.sdiff('tags:article:100', 'tags:article:101')

# Store result of set operation
r.sinterstore('common_tags', 'tags:article:100', 'tags:article:101')

Sets excel at tracking unique items, managing tags, and finding relationships:

# Track unique daily visitors
def track_visitor(date, user_id):
    key = f'visitors:{date}'
    r.sadd(key, user_id)
    r.expire(key, 86400 * 7)  # Keep for 7 days

def unique_visitors(date):
    return r.scard(f'visitors:{date}')

# Find mutual friends
def mutual_friends(user1_id, user2_id):
    return r.sinter(f'friends:{user1_id}', f'friends:{user2_id}')

The SPOP and SRANDMEMBER commands enable random selection, useful for sampling:

# Random selection without removal
random_user = r.srandmember('active_users')

# Random selection with removal
winner = r.spop('lottery_participants')

Hashes: Field-Value Maps

Hashes map field names to values within a single key, perfect for representing objects. They’re memory-efficient for storing structured data.

# Set fields - O(1) per field
r.hset('user:1000', 'name', 'Alice')
r.hset('user:1000', 'email', 'alice@example.com')

# Set multiple fields atomically
r.hset('user:1000', mapping={
    'name': 'Alice',
    'email': 'alice@example.com',
    'age': '30'
})

# Get field - O(1)
name = r.hget('user:1000', 'name')

# Get all fields - O(n)
user_data = r.hgetall('user:1000')

# Increment numeric field - O(1)
r.hincrby('user:1000', 'login_count', 1)

# Check field exists - O(1)
has_email = r.hexists('user:1000', 'email')

# Delete field - O(1)
r.hdel('user:1000', 'age')

Hashes are optimal for storing objects with multiple attributes:

class UserRepository:
    def __init__(self, redis_client):
        self.r = redis_client
    
    def save(self, user_id, user_data):
        key = f'user:{user_id}'
        self.r.hset(key, mapping=user_data)
    
    def get(self, user_id):
        key = f'user:{user_id}'
        return self.r.hgetall(key)
    
    def update_field(self, user_id, field, value):
        key = f'user:{user_id}'
        self.r.hset(key, field, value)
    
    def increment_counter(self, user_id, counter_name):
        key = f'user:{user_id}'
        return self.r.hincrby(key, counter_name, 1)

Hashes use less memory than separate string keys for small objects (< 512 fields by default):

# Memory efficient: single hash
r.hset('users:1000', mapping={'name': 'Alice', 'age': '30'})

# Memory inefficient: multiple strings
r.set('users:1000:name', 'Alice')
r.set('users:1000:age', '30')

Sorted Sets: Ranked Collections

Sorted sets combine sets with scores, maintaining elements in score order. They enable O(log n) insertion and range queries.

# Add members with scores - O(log n)
r.zadd('leaderboard', {'player1': 1000, 'player2': 1500, 'player3': 1200})

# Increment score - O(log n)
r.zincrby('leaderboard', 100, 'player1')

# Get rank (0-based) - O(log n)
rank = r.zrank('leaderboard', 'player1')  # Lower scores = lower rank
rev_rank = r.zrevrank('leaderboard', 'player1')  # Higher scores = lower rank

# Range by rank - O(log n + m) where m is range size
top_10 = r.zrevrange('leaderboard', 0, 9, withscores=True)

# Range by score - O(log n + m)
mid_tier = r.zrangebyscore('leaderboard', 1000, 2000, withscores=True)

# Count in score range - O(log n)
count = r.zcount('leaderboard', 1000, 2000)

# Remove by rank - O(log n + m)
r.zremrangebyrank('leaderboard', 0, 9)  # Remove bottom 10

Sorted sets power leaderboards, time-series data, and priority queues:

# Gaming leaderboard
def update_score(player_id, points):
    r.zincrby('global:leaderboard', points, player_id)

def get_leaderboard(start=0, end=9):
    return r.zrevrange('global:leaderboard', start, end, withscores=True)

def get_player_rank(player_id):
    rank = r.zrevrank('global:leaderboard', player_id)
    return rank + 1 if rank is not None else None

# Time-series events using timestamp as score
def add_event(event_id, timestamp):
    r.zadd('events:timeline', {event_id: timestamp})

def get_recent_events(minutes=60):
    cutoff = time.time() - (minutes * 60)
    return r.zrangebyscore('events:timeline', cutoff, '+inf')

# Priority queue
def add_task(task_id, priority):
    r.zadd('tasks:queue', {task_id: priority})

def get_next_task():
    tasks = r.zrange('tasks:queue', 0, 0)
    if tasks:
        task_id = tasks[0]
        r.zrem('tasks:queue', task_id)
        return task_id

Use ZPOPMIN and ZPOPMAX for atomic pop operations:

# Atomic pop lowest score
task = r.zpopmin('tasks:queue')

# Atomic pop highest score  
task = r.zpopmax('tasks:queue')

Performance and Memory Considerations

Each data structure has different performance characteristics. Strings use the least overhead for single values. Lists perform best at the ends (avoid LINDEX on large lists). Sets provide O(1) membership testing but use more memory than lists. Hashes are memory-efficient for objects with many fields. Sorted sets use the most memory but provide powerful range queries.

Monitor memory usage with MEMORY USAGE key and use appropriate data structures based on access patterns, not just data shape. A leaderboard needs a sorted set, not a list you sort application-side. Session data fits in hashes, not individual string keys. Choose the structure that makes your operations atomic and efficient.