UUIDs: Generation and Use Cases

A Universally Unique Identifier (UUID) is a 128-bit value designed to be unique across space and time without requiring a central authority. The standard format looks like this:...

Key Insights

  • UUID v7 should be your default choice for new projects—it combines the distributed generation benefits of UUIDs with database-friendly sequential ordering that dramatically improves index performance.
  • Store UUIDs as binary (16 bytes) rather than strings (36 characters) in your database; the storage savings compound into significant performance gains at scale.
  • Never expose UUID v1 in public APIs—the embedded MAC address and timestamp leak information about your infrastructure and record creation times.

What is a UUID?

A Universally Unique Identifier (UUID) is a 128-bit value designed to be unique across space and time without requiring a central authority. The standard format looks like this: 550e8400-e29b-41d4-a716-446655440000—32 hexadecimal digits separated by four hyphens into five groups.

The “universally unique” claim isn’t hyperbole. With 2^128 possible values (approximately 340 undecillion), the probability of generating duplicate UUIDs is effectively zero for any practical application. You could generate one billion UUIDs per second for 100 years and still have less than a 50% chance of a single collision.

This matters enormously in distributed systems. When you have multiple application servers, microservices, or database nodes that need to create identifiers independently, UUIDs eliminate coordination overhead. No central ID server, no distributed locks, no sequence gaps from failed transactions.

UUID Versions Explained

Not all UUIDs are created equal. The version number (the digit after the second hyphen) tells you how the UUID was generated.

Version 1 uses the current timestamp and the generating machine’s MAC address. This guarantees uniqueness but leaks information—anyone can extract when and where the UUID was created.

Version 4 is purely random. Each of the 122 available bits (6 bits are reserved for version and variant) comes from a cryptographic random source. This is the most common version in web applications.

Version 5 generates deterministic UUIDs by hashing a namespace UUID with an input string using SHA-1. Given the same inputs, you always get the same UUID. Useful for creating consistent identifiers from natural keys.

Version 7 is the newest addition (RFC 9562, 2024). It embeds a Unix timestamp in the first 48 bits, followed by random data. This gives you the best of both worlds: distributed generation with natural chronological sorting.

import uuid
from datetime import datetime

# Version 4: Pure random
v4 = uuid.uuid4()
print(f"v4: {v4}")
# Output: v4: 7c9e6679-7425-40de-944b-e07fc1f90ae7

# Version 7: Timestamp + random (Python 3.13+)
v7 = uuid.uuid7()
print(f"v7: {v7}")
# Output: v7: 018f6b5e-4d2a-7000-8000-1b4f5a2c3d4e

# Version 5: Deterministic from namespace + name
namespace = uuid.NAMESPACE_DNS
v5 = uuid.uuid5(namespace, "example.com")
print(f"v5: {v5}")
# Output: v5: cfbff0d1-9375-5685-968c-48ce8b15ae17 (always the same)
// Node.js with uuid package
import { v4 as uuidv4, v7 as uuidv7, v5 as uuidv5 } from 'uuid';

// Version 4
const id4 = uuidv4();
console.log(`v4: ${id4}`);

// Version 7
const id7 = uuidv7();
console.log(`v7: ${id7}`);

// Version 5 with DNS namespace
const DNS_NAMESPACE = '6ba7b810-9dad-11d1-80b4-00c04fd430c8';
const id5 = uuidv5('example.com', DNS_NAMESPACE);
console.log(`v5: ${id5}`);

The structural difference becomes clear when you generate multiple v7 UUIDs in sequence—the first segment increments predictably, while v4 UUIDs show no pattern.

UUIDs vs. Auto-Increment IDs

The choice between UUIDs and auto-incrementing integers involves real trade-offs.

Auto-increment advantages: 4 or 8 bytes versus 16 bytes for UUIDs. Sequential insertion is optimal for B-tree indexes. Human-readable and easy to communicate verbally.

UUID advantages: Generate anywhere without database coordination. No information leakage about record count or creation order (with v4). Safe to expose publicly. Enables database sharding without ID conflicts.

A hybrid approach often works best:

-- PostgreSQL: Internal integer PK, external UUID
CREATE TABLE orders (
    id BIGSERIAL PRIMARY KEY,
    public_id UUID NOT NULL DEFAULT gen_random_uuid(),
    customer_id BIGINT NOT NULL,
    total_cents INTEGER NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE UNIQUE INDEX idx_orders_public_id ON orders(public_id);

-- Use 'id' for joins and foreign keys (fast, compact)
-- Use 'public_id' in API responses (secure, portable)
-- MySQL: Same pattern with binary UUID storage
CREATE TABLE orders (
    id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    public_id BINARY(16) NOT NULL,
    customer_id BIGINT UNSIGNED NOT NULL,
    total_cents INT UNSIGNED NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE KEY idx_public_id (public_id)
);

-- Insert with UUID_TO_BIN for ordered storage
INSERT INTO orders (public_id, customer_id, total_cents)
VALUES (UUID_TO_BIN(UUID(), 1), 12345, 9999);

Common Use Cases

API Resource Identifiers: UUIDs prevent enumeration attacks. Users can’t guess other resource IDs by incrementing numbers.

from fastapi import FastAPI, HTTPException
from uuid import UUID
import uuid

app = FastAPI()

# In-memory store for demonstration
orders = {}

@app.post("/orders")
def create_order(customer_id: int, items: list[dict]):
    order_id = uuid.uuid7()  # Sortable, no coordination needed
    orders[order_id] = {
        "id": str(order_id),
        "customer_id": customer_id,
        "items": items,
        "status": "pending"
    }
    return orders[order_id]

@app.get("/orders/{order_id}")
def get_order(order_id: UUID):
    if order_id not in orders:
        raise HTTPException(status_code=404, detail="Order not found")
    return orders[order_id]

Idempotency Keys: Clients include a UUID with requests to prevent duplicate processing. The server stores processed keys and returns cached responses for repeats.

Distributed Database Keys: When sharding across multiple database nodes, UUIDs eliminate the need for a central ID allocation service.

Object Storage Naming: S3 and similar systems perform better when object keys have random prefixes. UUIDs naturally provide this distribution.

Session Tokens: UUID v4 provides sufficient entropy for session identifiers, though dedicated token libraries often add additional security features.

Database Considerations

Storage format dramatically impacts performance. A UUID stored as a 36-character string consumes more than twice the space of binary storage and requires character-by-character comparison.

-- PostgreSQL: Native UUID type (16 bytes, optimized comparison)
CREATE TABLE events (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    event_type VARCHAR(50) NOT NULL,
    payload JSONB,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- For UUID v7, the default ordering is chronological
CREATE INDEX idx_events_created ON events(id);
-- MySQL: Binary storage with swap for v7 ordering
CREATE TABLE events (
    id BINARY(16) PRIMARY KEY,
    event_type VARCHAR(50) NOT NULL,
    payload JSON,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Application layer: Use UUID_TO_BIN with swap_flag=1 for v7
-- This reorders bytes so timestamp bits come first
INSERT INTO events (id, event_type, payload)
VALUES (UUID_TO_BIN('018f6b5e-4d2a-7000-8000-1b4f5a2c3d4e', 1), 'order.created', '{}');

UUID v7’s timestamp prefix means sequential inserts cluster together in B-tree indexes, eliminating the random I/O pattern that makes v4 UUIDs problematic for write-heavy workloads. This single change can improve insert performance by 10-20x in some scenarios.

Implementation Pitfalls

String comparison issues: UUIDs are case-insensitive, but string comparison is often case-sensitive. Always normalize to lowercase before comparing.

# Wrong: May fail due to case differences
user_input = "550E8400-E29B-41D4-A716-446655440000"
stored = "550e8400-e29b-41d4-a716-446655440000"
assert user_input == stored  # False!

# Correct: Parse as UUID objects
from uuid import UUID
assert UUID(user_input) == UUID(stored)  # True

Timestamp extraction from v1: Never use v1 UUIDs in public APIs. The timestamp is trivially extractable.

import uuid
from datetime import datetime, timezone

# Generate a v1 UUID
v1 = uuid.uuid1()
print(f"UUID v1: {v1}")

# Extract the timestamp (this is the vulnerability)
timestamp = (v1.time - 0x01b21dd213814000) / 1e7
creation_time = datetime.fromtimestamp(timestamp, tz=timezone.utc)
print(f"Created at: {creation_time}")
# Attacker now knows exactly when this record was created

Insufficient entropy: In containerized environments, ensure your random source is properly seeded. Check that /dev/urandom is available and the system has gathered sufficient entropy.

Validation gaps: Always validate UUID format before database operations to prevent injection attacks.

import re
from uuid import UUID

UUID_PATTERN = re.compile(
    r'^[0-9a-f]{8}-[0-9a-f]{4}-[1-7][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$',
    re.IGNORECASE
)

def validate_uuid(value: str) -> UUID:
    if not UUID_PATTERN.match(value):
        raise ValueError(f"Invalid UUID format: {value}")
    return UUID(value)

Best Practices Summary

Choose UUID v7 for new projects. The timestamp ordering solves the primary performance objection to UUIDs while maintaining all their distributed generation benefits.

Store as binary. The 16-byte native format is faster to compare and index than 36-character strings. PostgreSQL’s UUID type handles this automatically; MySQL requires explicit BINARY(16) with conversion functions.

Use the hybrid pattern when you need both internal efficiency and external safety. Integer primary keys for joins, UUID columns for API exposure.

Never expose v1 UUIDs publicly. The embedded timestamp and MAC address leak operational information. Stick to v4 or v7 for any client-facing identifiers.

Consider alternatives for specific needs: ULID provides the same benefits as UUID v7 with a more compact string representation (26 characters). Nanoid offers customizable length and alphabet when you need shorter identifiers. Snowflake IDs work well when you need 64-bit integers with embedded timestamps.

The UUID ecosystem has matured significantly with the v7 specification. For most applications, UUID v7 with binary storage provides an excellent balance of distributed generation, database performance, and operational simplicity.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.