Design a Distributed ID Generator: Unique Identifiers at Scale

Key Insights

Auto-incrementing IDs create a single point of failure and coordination bottleneck that breaks down at scale—distributed ID generators trade some simplicity for independence and availability.
Snowflake-style IDs pack timestamp, worker identity, and sequence into 64 bits, giving you time-ordering, uniqueness, and high throughput without coordination between nodes.
Your choice between UUID, Snowflake, and database-based approaches depends on whether you prioritize simplicity, sortability, or existing infrastructure—there’s no universal best answer.

Why Distributed IDs Matter

Auto-incrementing database IDs work beautifully until they don’t. The moment you add a second database server, you’ve introduced a coordination problem. Every insert needs to ask: “What’s the next ID?” That question requires either a single authority (creating a bottleneck and single point of failure) or consensus between nodes (adding latency and complexity).

Distributed systems need ID generators that satisfy four requirements:

Global uniqueness: No two records should ever share an ID, across all nodes, all time
Rough ordering: Ideally, IDs generated later should be larger (helps with database indexing)
Low latency: ID generation shouldn’t add meaningful delay to writes
High availability: The system should keep generating IDs even when components fail

The solutions range from “just use UUIDs” to sophisticated bit-packing schemes. Each makes different trade-offs, and understanding those trade-offs is essential for making the right choice.

UUID: The Simple Baseline

UUIDs are the path of least resistance. They require zero coordination—any node can generate one independently with near-zero collision probability. But not all UUIDs are equal.

UUIDv4 uses 122 random bits. It’s simple and collision-resistant, but the randomness destroys database index locality. Inserting random keys into a B-tree causes page splits and fragmentation, degrading write performance over time.

UUIDv7 (RFC 9562, finalized in 2024) fixes this by putting a Unix timestamp in the first 48 bits. IDs generated later sort after earlier ones, preserving index locality while maintaining the coordination-free property.

import os
import time
import uuid

def generate_uuidv7() -> uuid.UUID:
    """Generate a UUIDv7 with millisecond precision timestamp."""
    # 48 bits of Unix timestamp in milliseconds
    timestamp_ms = int(time.time() * 1000)
    timestamp_bytes = timestamp_ms.to_bytes(6, byteorder='big')
    
    # 74 bits of randomness (we'll use 10 bytes, trim later)
    random_bytes = os.urandom(10)
    
    # Combine: 6 bytes timestamp + 10 bytes random
    uuid_bytes = bytearray(timestamp_bytes + random_bytes)
    
    # Set version (7) in bits 48-51
    uuid_bytes[6] = (uuid_bytes[6] & 0x0F) | 0x70
    
    # Set variant (RFC 4122) in bits 64-65
    uuid_bytes[8] = (uuid_bytes[8] & 0x3F) | 0x80
    
    return uuid.UUID(bytes=bytes(uuid_bytes))

def extract_timestamp(uuidv7: uuid.UUID) -> float:
    """Extract Unix timestamp from a UUIDv7."""
    uuid_bytes = uuidv7.bytes
    timestamp_ms = int.from_bytes(uuid_bytes[:6], byteorder='big')
    return timestamp_ms / 1000.0

# Usage
id1 = generate_uuidv7()
print(f"Generated: {id1}")
print(f"Timestamp: {extract_timestamp(id1)}")

The trade-off with UUIDs is size: 128 bits versus 64 bits for integer-based schemes. That’s double the storage, double the index size, and slower comparisons. For most applications, this doesn’t matter. For high-volume systems with tight latency budgets, it might.

Snowflake Architecture Deep Dive

Twitter’s Snowflake (2010) established the template that most custom ID generators follow. It packs everything into 64 bits:

| 1 bit unused | 41 bits timestamp | 10 bits worker ID | 12 bits sequence |

41 bits timestamp: Milliseconds since a custom epoch. Gives you ~69 years of IDs.
10 bits worker ID: Supports 1,024 unique workers. Often split into datacenter (5 bits) + machine (5 bits).
12 bits sequence: 4,096 IDs per millisecond per worker.

This design guarantees uniqueness through partitioning: each worker owns its ID space, and the sequence counter handles bursts within a millisecond.

import threading
import time

class SnowflakeGenerator:
    # Custom epoch: 2024-01-01 00:00:00 UTC
    EPOCH = 1704067200000
    
    TIMESTAMP_BITS = 41
    WORKER_BITS = 10
    SEQUENCE_BITS = 12
    
    MAX_WORKER_ID = (1 << WORKER_BITS) - 1
    MAX_SEQUENCE = (1 << SEQUENCE_BITS) - 1
    
    WORKER_SHIFT = SEQUENCE_BITS
    TIMESTAMP_SHIFT = SEQUENCE_BITS + WORKER_BITS
    
    def __init__(self, worker_id: int):
        if not 0 <= worker_id <= self.MAX_WORKER_ID:
            raise ValueError(f"Worker ID must be between 0 and {self.MAX_WORKER_ID}")
        
        self.worker_id = worker_id
        self.sequence = 0
        self.last_timestamp = -1
        self.lock = threading.Lock()
    
    def _current_millis(self) -> int:
        return int(time.time() * 1000)
    
    def _wait_next_millis(self, last_ts: int) -> int:
        ts = self._current_millis()
        while ts <= last_ts:
            ts = self._current_millis()
        return ts
    
    def generate(self) -> int:
        with self.lock:
            timestamp = self._current_millis()
            
            if timestamp < self.last_timestamp:
                # Clock went backwards - this is bad
                raise ClockDriftError(
                    f"Clock moved backwards by {self.last_timestamp - timestamp}ms"
                )
            
            if timestamp == self.last_timestamp:
                # Same millisecond - increment sequence
                self.sequence = (self.sequence + 1) & self.MAX_SEQUENCE
                if self.sequence == 0:
                    # Sequence exhausted - wait for next millisecond
                    timestamp = self._wait_next_millis(self.last_timestamp)
            else:
                # New millisecond - reset sequence
                self.sequence = 0
            
            self.last_timestamp = timestamp
            
            # Pack the ID
            return (
                ((timestamp - self.EPOCH) << self.TIMESTAMP_SHIFT) |
                (self.worker_id << self.WORKER_SHIFT) |
                self.sequence
            )

class ClockDriftError(Exception):
    pass

# Parsing an ID back into components
def parse_snowflake(id: int) -> dict:
    sequence = id & 0xFFF
    worker_id = (id >> 12) & 0x3FF
    timestamp = (id >> 22) + SnowflakeGenerator.EPOCH
    return {
        "timestamp": timestamp,
        "worker_id": worker_id,
        "sequence": sequence,
        "generated_at": time.strftime(
            "%Y-%m-%d %H:%M:%S", 
            time.gmtime(timestamp / 1000)
        )
    }

The key insight is that uniqueness comes from the worker ID assignment, not from any runtime coordination. As long as no two workers share an ID, collisions are impossible.

Database-Based Approaches

Sometimes you don’t need a custom service. Databases can generate distributed IDs with some clever configuration.

Flickr’s Ticket Server approach uses dedicated MySQL instances that hand out ID ranges:

-- Ticket server schema
CREATE TABLE id_tickets (
    stub CHAR(1) NOT NULL PRIMARY KEY,
    id BIGINT UNSIGNED NOT NULL
) ENGINE=InnoDB;

INSERT INTO id_tickets (stub, id) VALUES ('a', 0);

-- Application requests a batch of IDs
-- REPLACE atomically increments and returns the new value
REPLACE INTO id_tickets (stub, id) VALUES ('a', LAST_INSERT_ID(id + 1000));
SELECT LAST_INSERT_ID();  -- Returns the starting ID of your 1000-ID batch

The application caches the range locally and allocates from it without further database calls. When the range is exhausted, it fetches another batch.

Multi-master with offsets works when you have multiple database primaries:

-- Server 1: generates 1, 3, 5, 7, ...
SET @@auto_increment_increment = 2;
SET @@auto_increment_offset = 1;

-- Server 2: generates 2, 4, 6, 8, ...
SET @@auto_increment_increment = 2;
SET @@auto_increment_offset = 2;

This is simple but inflexible—adding a third server requires reconfiguration and careful migration.

Handling Failure Modes

Distributed ID generators have three main failure modes, and your production system needs strategies for each.

Clock skew is the most insidious. If a server’s clock jumps backward (NTP correction, VM migration, manual adjustment), Snowflake-style generators could produce duplicate timestamps:

class RobustSnowflakeGenerator(SnowflakeGenerator):
    MAX_CLOCK_DRIFT_MS = 5  # Tolerate up to 5ms drift
    
    def generate(self) -> int:
        with self.lock:
            timestamp = self._current_millis()
            drift = self.last_timestamp - timestamp
            
            if drift > 0:
                if drift <= self.MAX_CLOCK_DRIFT_MS:
                    # Small drift - wait it out
                    time.sleep(drift / 1000.0)
                    timestamp = self._current_millis()
                else:
                    # Large drift - refuse to generate
                    raise ClockDriftError(
                        f"Clock drift of {drift}ms exceeds threshold"
                    )
            
            # ... rest of generation logic

Worker ID collision happens when two processes think they’re the same worker. Prevent this with coordination services:

import etcd3

def acquire_worker_id(etcd_client: etcd3.Etcd3Client, 
                      service_name: str,
                      max_workers: int = 1024) -> int:
    """Acquire a unique worker ID using etcd."""
    lease = etcd_client.lease(ttl=30)  # 30-second lease
    
    for worker_id in range(max_workers):
        key = f"/snowflake/{service_name}/workers/{worker_id}"
        # try to claim this worker ID
        success, _ = etcd_client.transaction(
            compare=[etcd_client.transactions.create(key) == 0],
            success=[etcd_client.transactions.put(key, "claimed", lease=lease)],
            failure=[]
        )
        if success:
            return worker_id
    
    raise RuntimeError("No available worker IDs")

Sequence exhaustion occurs when you generate more than 4,096 IDs in a single millisecond on one worker. The standard solution is to wait for the next millisecond, but you can also allocate more sequence bits at the cost of worker ID space.

Production Considerations

Bit allocation should match your actual scale. Running 50 servers? You don’t need 10 bits for worker IDs—reallocate some to sequence for higher throughput:

Configuration	Workers	IDs/ms/worker	Use Case
10-bit worker, 12-bit sequence	1,024	4,096	Large distributed system
6-bit worker, 16-bit sequence	64	65,536	Fewer nodes, high throughput
5-bit worker, 5-bit DC, 12-bit sequence	32 × 32	4,096	Multi-datacenter

Monitoring should track:

ID generation latency (p50, p99)
Sequence utilization per millisecond (approaching 4,096 means you’re at risk)
Clock drift events
Worker ID lease renewals

Migration from existing schemes requires a transition period. Generate new IDs in a non-overlapping range, update application code to handle both formats, then backfill old records if needed.

Decision Framework

Criteria	UUID (v7)	Snowflake	DB Ticket Server
Coordination required	None	Worker ID assignment	Range fetching
Size	128 bits	64 bits	64 bits
Time-ordered	Yes	Yes	No (within range)
Throughput ceiling	Unlimited	~4M/sec/worker	Batch-dependent
Operational complexity	Trivial	Medium	Low
Clock sensitivity	Low	High	None

Choose UUIDv7 when you want simplicity, don’t have extreme performance requirements, and can tolerate larger keys.

Choose Snowflake when you need compact IDs, high throughput, and can manage worker ID coordination.

Choose database-based when you already have reliable database infrastructure and moderate throughput needs.

Most systems should start with UUIDv7. Move to Snowflake when you have concrete evidence that UUID size or generation overhead is a problem. The added operational complexity of Snowflake is only worth it at significant scale.