Design a Distributed ID Generator: Unique Identifiers at Scale
Auto-incrementing database IDs work beautifully until they don't. The moment you add a second database server, you've introduced a coordination problem. Every insert needs to ask: 'What's the next...
Key Insights
- Auto-incrementing IDs create a single point of failure and coordination bottleneck that breaks down at scale—distributed ID generators trade some simplicity for independence and availability.
- Snowflake-style IDs pack timestamp, worker identity, and sequence into 64 bits, giving you time-ordering, uniqueness, and high throughput without coordination between nodes.
- Your choice between UUID, Snowflake, and database-based approaches depends on whether you prioritize simplicity, sortability, or existing infrastructure—there’s no universal best answer.
Why Distributed IDs Matter
Auto-incrementing database IDs work beautifully until they don’t. The moment you add a second database server, you’ve introduced a coordination problem. Every insert needs to ask: “What’s the next ID?” That question requires either a single authority (creating a bottleneck and single point of failure) or consensus between nodes (adding latency and complexity).
Distributed systems need ID generators that satisfy four requirements:
- Global uniqueness: No two records should ever share an ID, across all nodes, all time
- Rough ordering: Ideally, IDs generated later should be larger (helps with database indexing)
- Low latency: ID generation shouldn’t add meaningful delay to writes
- High availability: The system should keep generating IDs even when components fail
The solutions range from “just use UUIDs” to sophisticated bit-packing schemes. Each makes different trade-offs, and understanding those trade-offs is essential for making the right choice.
UUID: The Simple Baseline
UUIDs are the path of least resistance. They require zero coordination—any node can generate one independently with near-zero collision probability. But not all UUIDs are equal.
UUIDv4 uses 122 random bits. It’s simple and collision-resistant, but the randomness destroys database index locality. Inserting random keys into a B-tree causes page splits and fragmentation, degrading write performance over time.
UUIDv7 (RFC 9562, finalized in 2024) fixes this by putting a Unix timestamp in the first 48 bits. IDs generated later sort after earlier ones, preserving index locality while maintaining the coordination-free property.
import os
import time
import uuid
def generate_uuidv7() -> uuid.UUID:
"""Generate a UUIDv7 with millisecond precision timestamp."""
# 48 bits of Unix timestamp in milliseconds
timestamp_ms = int(time.time() * 1000)
timestamp_bytes = timestamp_ms.to_bytes(6, byteorder='big')
# 74 bits of randomness (we'll use 10 bytes, trim later)
random_bytes = os.urandom(10)
# Combine: 6 bytes timestamp + 10 bytes random
uuid_bytes = bytearray(timestamp_bytes + random_bytes)
# Set version (7) in bits 48-51
uuid_bytes[6] = (uuid_bytes[6] & 0x0F) | 0x70
# Set variant (RFC 4122) in bits 64-65
uuid_bytes[8] = (uuid_bytes[8] & 0x3F) | 0x80
return uuid.UUID(bytes=bytes(uuid_bytes))
def extract_timestamp(uuidv7: uuid.UUID) -> float:
"""Extract Unix timestamp from a UUIDv7."""
uuid_bytes = uuidv7.bytes
timestamp_ms = int.from_bytes(uuid_bytes[:6], byteorder='big')
return timestamp_ms / 1000.0
# Usage
id1 = generate_uuidv7()
print(f"Generated: {id1}")
print(f"Timestamp: {extract_timestamp(id1)}")
The trade-off with UUIDs is size: 128 bits versus 64 bits for integer-based schemes. That’s double the storage, double the index size, and slower comparisons. For most applications, this doesn’t matter. For high-volume systems with tight latency budgets, it might.
Snowflake Architecture Deep Dive
Twitter’s Snowflake (2010) established the template that most custom ID generators follow. It packs everything into 64 bits:
| 1 bit unused | 41 bits timestamp | 10 bits worker ID | 12 bits sequence |
- 41 bits timestamp: Milliseconds since a custom epoch. Gives you ~69 years of IDs.
- 10 bits worker ID: Supports 1,024 unique workers. Often split into datacenter (5 bits) + machine (5 bits).
- 12 bits sequence: 4,096 IDs per millisecond per worker.
This design guarantees uniqueness through partitioning: each worker owns its ID space, and the sequence counter handles bursts within a millisecond.
import threading
import time
class SnowflakeGenerator:
# Custom epoch: 2024-01-01 00:00:00 UTC
EPOCH = 1704067200000
TIMESTAMP_BITS = 41
WORKER_BITS = 10
SEQUENCE_BITS = 12
MAX_WORKER_ID = (1 << WORKER_BITS) - 1
MAX_SEQUENCE = (1 << SEQUENCE_BITS) - 1
WORKER_SHIFT = SEQUENCE_BITS
TIMESTAMP_SHIFT = SEQUENCE_BITS + WORKER_BITS
def __init__(self, worker_id: int):
if not 0 <= worker_id <= self.MAX_WORKER_ID:
raise ValueError(f"Worker ID must be between 0 and {self.MAX_WORKER_ID}")
self.worker_id = worker_id
self.sequence = 0
self.last_timestamp = -1
self.lock = threading.Lock()
def _current_millis(self) -> int:
return int(time.time() * 1000)
def _wait_next_millis(self, last_ts: int) -> int:
ts = self._current_millis()
while ts <= last_ts:
ts = self._current_millis()
return ts
def generate(self) -> int:
with self.lock:
timestamp = self._current_millis()
if timestamp < self.last_timestamp:
# Clock went backwards - this is bad
raise ClockDriftError(
f"Clock moved backwards by {self.last_timestamp - timestamp}ms"
)
if timestamp == self.last_timestamp:
# Same millisecond - increment sequence
self.sequence = (self.sequence + 1) & self.MAX_SEQUENCE
if self.sequence == 0:
# Sequence exhausted - wait for next millisecond
timestamp = self._wait_next_millis(self.last_timestamp)
else:
# New millisecond - reset sequence
self.sequence = 0
self.last_timestamp = timestamp
# Pack the ID
return (
((timestamp - self.EPOCH) << self.TIMESTAMP_SHIFT) |
(self.worker_id << self.WORKER_SHIFT) |
self.sequence
)
class ClockDriftError(Exception):
pass
# Parsing an ID back into components
def parse_snowflake(id: int) -> dict:
sequence = id & 0xFFF
worker_id = (id >> 12) & 0x3FF
timestamp = (id >> 22) + SnowflakeGenerator.EPOCH
return {
"timestamp": timestamp,
"worker_id": worker_id,
"sequence": sequence,
"generated_at": time.strftime(
"%Y-%m-%d %H:%M:%S",
time.gmtime(timestamp / 1000)
)
}
The key insight is that uniqueness comes from the worker ID assignment, not from any runtime coordination. As long as no two workers share an ID, collisions are impossible.
Database-Based Approaches
Sometimes you don’t need a custom service. Databases can generate distributed IDs with some clever configuration.
Flickr’s Ticket Server approach uses dedicated MySQL instances that hand out ID ranges:
-- Ticket server schema
CREATE TABLE id_tickets (
stub CHAR(1) NOT NULL PRIMARY KEY,
id BIGINT UNSIGNED NOT NULL
) ENGINE=InnoDB;
INSERT INTO id_tickets (stub, id) VALUES ('a', 0);
-- Application requests a batch of IDs
-- REPLACE atomically increments and returns the new value
REPLACE INTO id_tickets (stub, id) VALUES ('a', LAST_INSERT_ID(id + 1000));
SELECT LAST_INSERT_ID(); -- Returns the starting ID of your 1000-ID batch
The application caches the range locally and allocates from it without further database calls. When the range is exhausted, it fetches another batch.
Multi-master with offsets works when you have multiple database primaries:
-- Server 1: generates 1, 3, 5, 7, ...
SET @@auto_increment_increment = 2;
SET @@auto_increment_offset = 1;
-- Server 2: generates 2, 4, 6, 8, ...
SET @@auto_increment_increment = 2;
SET @@auto_increment_offset = 2;
This is simple but inflexible—adding a third server requires reconfiguration and careful migration.
Handling Failure Modes
Distributed ID generators have three main failure modes, and your production system needs strategies for each.
Clock skew is the most insidious. If a server’s clock jumps backward (NTP correction, VM migration, manual adjustment), Snowflake-style generators could produce duplicate timestamps:
class RobustSnowflakeGenerator(SnowflakeGenerator):
MAX_CLOCK_DRIFT_MS = 5 # Tolerate up to 5ms drift
def generate(self) -> int:
with self.lock:
timestamp = self._current_millis()
drift = self.last_timestamp - timestamp
if drift > 0:
if drift <= self.MAX_CLOCK_DRIFT_MS:
# Small drift - wait it out
time.sleep(drift / 1000.0)
timestamp = self._current_millis()
else:
# Large drift - refuse to generate
raise ClockDriftError(
f"Clock drift of {drift}ms exceeds threshold"
)
# ... rest of generation logic
Worker ID collision happens when two processes think they’re the same worker. Prevent this with coordination services:
import etcd3
def acquire_worker_id(etcd_client: etcd3.Etcd3Client,
service_name: str,
max_workers: int = 1024) -> int:
"""Acquire a unique worker ID using etcd."""
lease = etcd_client.lease(ttl=30) # 30-second lease
for worker_id in range(max_workers):
key = f"/snowflake/{service_name}/workers/{worker_id}"
# try to claim this worker ID
success, _ = etcd_client.transaction(
compare=[etcd_client.transactions.create(key) == 0],
success=[etcd_client.transactions.put(key, "claimed", lease=lease)],
failure=[]
)
if success:
return worker_id
raise RuntimeError("No available worker IDs")
Sequence exhaustion occurs when you generate more than 4,096 IDs in a single millisecond on one worker. The standard solution is to wait for the next millisecond, but you can also allocate more sequence bits at the cost of worker ID space.
Production Considerations
Bit allocation should match your actual scale. Running 50 servers? You don’t need 10 bits for worker IDs—reallocate some to sequence for higher throughput:
| Configuration | Workers | IDs/ms/worker | Use Case |
|---|---|---|---|
| 10-bit worker, 12-bit sequence | 1,024 | 4,096 | Large distributed system |
| 6-bit worker, 16-bit sequence | 64 | 65,536 | Fewer nodes, high throughput |
| 5-bit worker, 5-bit DC, 12-bit sequence | 32 × 32 | 4,096 | Multi-datacenter |
Monitoring should track:
- ID generation latency (p50, p99)
- Sequence utilization per millisecond (approaching 4,096 means you’re at risk)
- Clock drift events
- Worker ID lease renewals
Migration from existing schemes requires a transition period. Generate new IDs in a non-overlapping range, update application code to handle both formats, then backfill old records if needed.
Decision Framework
| Criteria | UUID (v7) | Snowflake | DB Ticket Server |
|---|---|---|---|
| Coordination required | None | Worker ID assignment | Range fetching |
| Size | 128 bits | 64 bits | 64 bits |
| Time-ordered | Yes | Yes | No (within range) |
| Throughput ceiling | Unlimited | ~4M/sec/worker | Batch-dependent |
| Operational complexity | Trivial | Medium | Low |
| Clock sensitivity | Low | High | None |
Choose UUIDv7 when you want simplicity, don’t have extreme performance requirements, and can tolerate larger keys.
Choose Snowflake when you need compact IDs, high throughput, and can manage worker ID coordination.
Choose database-based when you already have reliable database infrastructure and moderate throughput needs.
Most systems should start with UUIDv7. Move to Snowflake when you have concrete evidence that UUID size or generation overhead is a problem. The added operational complexity of Snowflake is only worth it at significant scale.