Design a Chat Application: Real-Time Messaging System
Building a chat application seems straightforward until you hit scale. What starts as a simple 'send message, receive message' flow quickly becomes a distributed systems challenge involving real-time...
Key Insights
- WebSockets are essential for real-time chat, but the real complexity lies in horizontal scaling—use a pub/sub layer like Redis to coordinate messages across multiple server instances.
- Choose your fanout strategy based on group size: fan-out on write works for small groups (under 500 members), while fan-out on read is necessary for large channels to avoid write amplification.
- Message ordering and delivery guarantees require careful schema design—use composite keys with conversation ID and timestamp, and implement idempotency keys to handle network retries gracefully.
Introduction & Requirements Analysis
Building a chat application seems straightforward until you hit scale. What starts as a simple “send message, receive message” flow quickly becomes a distributed systems challenge involving real-time delivery, message persistence, presence tracking, and graceful degradation.
Let’s establish our requirements upfront.
Functional Requirements:
- One-to-one direct messaging
- Group chats (up to 500 members for standard groups, unlimited for broadcast channels)
- Online/offline presence indicators
- Typing indicators
- Message history with pagination
- Read receipts
- Offline message sync
Non-Functional Requirements:
- Sub-200ms message delivery latency
- 99.9% availability
- Message ordering within conversations
- At-least-once delivery guarantee
- Support for 10 million daily active users
- 1 billion messages per day at peak
These numbers inform every architectural decision. At 1 billion messages daily, that’s roughly 12,000 messages per second sustained, with peaks potentially 3-5x higher.
High-Level Architecture
The architecture separates concerns into distinct services that can scale independently:
┌─────────────┐ ┌─────────────────┐ ┌──────────────────┐
│ Clients │────▶│ API Gateway │────▶│ Auth Service │
└─────────────┘ └─────────────────┘ └──────────────────┘
│ │
│ ┌─────┴─────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────┐ ┌─────────────┐
│ WebSocket │ │ Chat │ │ Presence │
│ Servers │ │ Service │ │ Service │
└─────────────┘ └─────────┘ └─────────────┘
│ │ │
└──────────────┼──────────────┘
▼
┌───────────────┐
│ Redis Pub/Sub │
└───────────────┘
│
┌──────────────┼──────────────┐
▼ ▼ ▼
┌─────────────┐ ┌───────────┐ ┌─────────────┐
│ Cassandra │ │ Redis │ │ Kafka │
│ (Messages) │ │ (Cache) │ │ (Queue) │
└─────────────┘ └───────────┘ └─────────────┘
API Gateway handles HTTP requests for non-real-time operations: fetching message history, managing groups, updating profiles. WebSocket Servers maintain persistent connections for real-time message delivery. Chat Service processes message sending, storage, and fanout logic. Presence Service tracks online status and typing indicators. Redis Pub/Sub coordinates messages across WebSocket server instances.
Real-Time Communication Layer
For real-time messaging, WebSockets are the clear choice. Server-Sent Events (SSE) only support server-to-client communication, requiring a separate HTTP channel for sending messages. Long polling wastes resources with constant connection cycling. WebSockets provide full-duplex communication with minimal overhead.
The challenge is horizontal scaling. When User A connects to Server 1 and User B connects to Server 2, how does A’s message reach B? You need a pub/sub coordination layer.
// WebSocket server with Redis pub/sub coordination
const WebSocket = require('ws');
const Redis = require('ioredis');
const publisher = new Redis(process.env.REDIS_URL);
const subscriber = new Redis(process.env.REDIS_URL);
// Map of userId -> WebSocket connection (local to this server)
const connections = new Map();
const wss = new WebSocket.Server({ port: 8080 });
wss.on('connection', async (ws, req) => {
const userId = await authenticateConnection(req);
connections.set(userId, ws);
// Subscribe to user's personal channel
await subscriber.subscribe(`user:${userId}`);
ws.on('message', async (data) => {
const message = JSON.parse(data);
if (message.type === 'chat') {
await handleChatMessage(userId, message);
} else if (message.type === 'typing') {
await handleTypingIndicator(userId, message);
}
});
ws.on('close', () => {
connections.delete(userId);
subscriber.unsubscribe(`user:${userId}`);
updatePresence(userId, 'offline');
});
// Heartbeat for connection health
ws.isAlive = true;
ws.on('pong', () => { ws.isAlive = true; });
});
// Heartbeat interval to detect dead connections
setInterval(() => {
wss.clients.forEach((ws) => {
if (!ws.isAlive) return ws.terminate();
ws.isAlive = false;
ws.ping();
});
}, 30000);
// Handle incoming pub/sub messages
subscriber.on('message', (channel, message) => {
const userId = channel.split(':')[1];
const ws = connections.get(userId);
if (ws && ws.readyState === WebSocket.OPEN) {
ws.send(message);
}
});
async function handleChatMessage(senderId, message) {
const { conversationId, content, recipientIds } = message;
// Persist message first
const savedMessage = await chatService.saveMessage({
id: generateMessageId(),
conversationId,
senderId,
content,
timestamp: Date.now(),
idempotencyKey: message.idempotencyKey
});
// Fan out to all recipients via Redis pub/sub
for (const recipientId of recipientIds) {
await publisher.publish(`user:${recipientId}`, JSON.stringify({
type: 'new_message',
message: savedMessage
}));
}
}
The key insight: each WebSocket server only knows about its local connections. Redis pub/sub broadcasts messages to all servers, and each server delivers to any locally-connected recipients.
Message Storage & Delivery
Message storage needs to handle high write throughput while supporting efficient reads by conversation. Cassandra excels here due to its write-optimized architecture and flexible partitioning.
-- Cassandra schema for messages
CREATE TABLE messages (
conversation_id UUID,
message_id TIMEUUID,
sender_id UUID,
content TEXT,
content_type TEXT, -- 'text', 'image', 'file'
created_at TIMESTAMP,
PRIMARY KEY (conversation_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);
-- Index for fetching user's conversations
CREATE TABLE user_conversations (
user_id UUID,
last_message_at TIMESTAMP,
conversation_id UUID,
last_message_preview TEXT,
unread_count INT,
PRIMARY KEY (user_id, last_message_at, conversation_id)
) WITH CLUSTERING ORDER BY (last_message_at DESC);
The messages table partitions by conversation_id, ensuring all messages in a conversation live on the same node for efficient range queries. The clustering order by message_id (a TIMEUUID) provides automatic chronological ordering.
For reliable delivery, use Kafka as a durable message queue:
# Message processing pipeline
from kafka import KafkaProducer, KafkaConsumer
import json
producer = KafkaProducer(
bootstrap_servers=['kafka:9092'],
value_serializer=lambda v: json.dumps(v).encode('utf-8'),
acks='all', # Wait for all replicas
retries=3
)
def send_message(message):
# Publish to Kafka for reliable processing
producer.send(
'chat-messages',
key=message['conversation_id'].encode(), # Partition by conversation
value=message
)
# Consumer for message processing
consumer = KafkaConsumer(
'chat-messages',
bootstrap_servers=['kafka:9092'],
group_id='message-processors',
enable_auto_commit=False
)
for record in consumer:
message = json.loads(record.value)
# Idempotency check
if not is_duplicate(message['idempotency_key']):
save_to_cassandra(message)
fanout_to_recipients(message)
mark_processed(message['idempotency_key'])
consumer.commit()
Kafka’s partitioning by conversation ID ensures message ordering within conversations while allowing parallel processing across different conversations.
Group Chat & Fanout Strategies
The fanout strategy dramatically impacts system performance. Consider a group with 500 members: fan-out on write means 500 write operations per message. For active groups, this creates massive write amplification.
# Fanout strategy selection
async def fanout_message(message, group):
member_count = len(group.members)
if member_count <= 500:
# Fan-out on write: push to each member's inbox
await fanout_on_write(message, group.members)
else:
# Fan-out on read: store once, members pull
await fanout_on_read(message, group.id)
async def fanout_on_write(message, members):
"""
Best for small groups. Each member gets the message
pushed to their personal queue immediately.
Pros: Fast reads, simple client logic
Cons: Write amplification, storage duplication
"""
pipeline = redis.pipeline()
for member_id in members:
pipeline.publish(f'user:{member_id}', json.dumps(message))
pipeline.lpush(f'inbox:{member_id}', json.dumps(message))
await pipeline.execute()
async def fanout_on_read(message, group_id):
"""
Best for large channels. Message stored once,
clients fetch from group timeline.
Pros: Efficient writes, no duplication
Cons: More complex reads, potential hotspots
"""
await redis.lpush(f'group:{group_id}:messages', json.dumps(message))
# Notify online members that new message exists
online_members = await get_online_members(group_id)
for member_id in online_members:
await redis.publish(f'user:{member_id}', json.dumps({
'type': 'group_update',
'group_id': group_id
}))
For large channels (think Slack’s #general with 10,000 members), fan-out on read is essential. The “thundering herd” problem—thousands of clients simultaneously fetching after a notification—requires rate limiting and caching at the edge.
Presence & Typing Indicators
Presence tracking uses Redis with TTL-based expiration. Users send periodic heartbeats; absence of heartbeats means offline.
# Presence service
import asyncio
from redis import Redis
redis = Redis()
PRESENCE_TTL = 60 # seconds
HEARTBEAT_INTERVAL = 30
async def handle_heartbeat(user_id: str):
"""Update user's online status with TTL"""
await redis.setex(f'presence:{user_id}', PRESENCE_TTL, 'online')
async def get_presence(user_ids: list[str]) -> dict:
"""Batch fetch presence for multiple users"""
pipeline = redis.pipeline()
for user_id in user_ids:
pipeline.get(f'presence:{user_id}')
results = await pipeline.execute()
return {
user_id: 'online' if result else 'offline'
for user_id, result in zip(user_ids, results)
}
async def handle_typing(user_id: str, conversation_id: str):
"""Typing indicator with short TTL and debouncing"""
key = f'typing:{conversation_id}:{user_id}'
# Only publish if not already typing (debounce)
if not await redis.exists(key):
await notify_conversation_members(conversation_id, {
'type': 'typing_start',
'user_id': user_id
})
# Reset TTL
await redis.setex(key, 3, '1') # 3 second TTL
Typing indicators are ephemeral—they don’t need persistence or guaranteed delivery. A 3-second TTL automatically clears stale typing states without explicit “stopped typing” messages.
Scaling & Reliability Considerations
Sharding Strategy: Shard by conversation ID for message storage (keeps conversation data co-located) and by user ID for user data and presence (keeps user’s data together).
Caching: Cache recent messages per conversation, group membership lists, and user profiles. Invalidate on writes using cache-aside pattern.
Geographic Distribution: Deploy WebSocket servers in multiple regions. Route users to nearest region, but ensure messages can cross regions via the Kafka backbone.
Message Deduplication: Clients should send idempotency keys with each message. The server checks against a short-lived cache (Redis with 24-hour TTL) before processing.
End-to-End Encryption: For E2E encryption, the server never sees plaintext. Clients exchange keys via the server, but encryption/decryption happens client-side. This complicates features like server-side search but is essential for privacy-focused applications.
The architecture outlined here handles the stated requirements while leaving room for optimization. Start with the simpler approaches (fan-out on write, single-region deployment) and add complexity only when metrics demand it. Premature optimization in distributed systems often creates more problems than it solves.