Cryptographic Random Numbers: Secure Generation

Key Insights

Standard random number generators like Math.random() are predictable and must never be used for security-sensitive applications—always use your platform’s cryptographically secure PRNG.
Entropy is the foundation of secure randomness; modern operating systems continuously gather entropy from hardware events, and the /dev/urandom interface is safe for virtually all cryptographic purposes.
Common mistakes like modulo bias, insufficient token length, and time-based seeding have caused real-world security breaches—understanding these pitfalls is essential for secure implementation.

Why Randomness Matters in Security

In 2012, researchers discovered that 0.2% of all HTTPS certificates shared private keys due to weak random number generation during key creation. The PlayStation 3’s master signing key was extracted because Sony reused a random nonce. Cloudflare famously used a wall of lava lamps to add entropy to their systems. These aren’t edge cases—they’re reminders that randomness is foundational to security.

When you’re shuffling a playlist or running a Monte Carlo simulation, “random enough” works fine. When you’re generating session tokens, encryption keys, or password reset links, predictable randomness means complete system compromise. An attacker who can guess your next token can hijack sessions. An attacker who can predict your key generation can decrypt traffic. The stakes are binary: either your randomness is cryptographically secure, or your security is theater.

PRNG vs CSPRNG: Understanding the Difference

A pseudo-random number generator (PRNG) produces sequences that appear random but are entirely deterministic. Given the same seed, you get the same sequence. This is useful for reproducibility in testing or simulations, but catastrophic for security.

A cryptographically secure PRNG (CSPRNG) adds critical properties: unpredictability and backtracking resistance. Even if an attacker observes previous outputs, they cannot predict future values or reconstruct past ones.

Consider a linear congruential generator (LCG), the algorithm behind many standard library random functions:

#include <stdio.h>
#include <stdint.h>

// Simple LCG matching common implementations
uint32_t lcg_state = 1;

uint32_t lcg_next() {
    // Constants from Numerical Recipes
    lcg_state = lcg_state * 1664525 + 1013904223;
    return lcg_state;
}

// Given just 2 consecutive outputs, we can predict all future values
void predict_lcg(uint32_t output1, uint32_t output2) {
    // The state IS the output in simple LCGs
    // With known constants, next = output2 * 1664525 + 1013904223
    uint32_t predicted = output2 * 1664525 + 1013904223;
    printf("After observing %u and %u\n", output1, output2);
    printf("Predicted next value: %u\n", predicted);
}

int main() {
    uint32_t first = lcg_next();
    uint32_t second = lcg_next();
    uint32_t third = lcg_next();
    
    printf("Actual sequence: %u, %u, %u\n", first, second, third);
    predict_lcg(first, second);
    // The prediction will match 'third' exactly
    return 0;
}

This isn’t a theoretical attack. JavaScript’s Math.random() in V8 used xorshift128+, which researchers demonstrated could be fully recovered from just a few outputs. If you’re generating tokens with Math.random(), an attacker observing a handful of tokens can predict every future token your system will generate.

Entropy Sources and Seeding

Cryptographic randomness starts with entropy—genuine unpredictability harvested from the physical world. Modern operating systems continuously collect entropy from multiple sources:

Timing variations in hardware interrupts
Mouse movements and keyboard timing
Disk I/O timing jitter
CPU instruction timing (RDRAND/RDSEED on Intel, similar on ARM)
Dedicated hardware random number generators

Linux maintains an entropy pool that mixes these sources together. The historic distinction between /dev/random (blocking when entropy is “low”) and /dev/urandom (non-blocking) has caused confusion for years. Modern consensus: use /dev/urandom. Once the system has gathered initial entropy after boot, /dev/urandom is cryptographically secure and won’t block your application.

#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdint.h>

int generate_secure_bytes(uint8_t *buffer, size_t length) {
    int fd = open("/dev/urandom", O_RDONLY);
    if (fd < 0) {
        perror("Failed to open /dev/urandom");
        return -1;
    }
    
    ssize_t bytes_read = read(fd, buffer, length);
    close(fd);
    
    if (bytes_read != (ssize_t)length) {
        return -1;
    }
    return 0;
}

int main() {
    uint8_t token[32];
    if (generate_secure_bytes(token, sizeof(token)) == 0) {
        printf("Secure random bytes: ");
        for (int i = 0; i < 32; i++) {
            printf("%02x", token[i]);
        }
        printf("\n");
    }
    return 0;
}

You can check available entropy on Linux, though this number is less meaningful than historically believed:

# Check available entropy (bits)
cat /proc/sys/kernel/random/entropy_avail

# Watch entropy pool in real-time
watch -n 1 cat /proc/sys/kernel/random/entropy_avail

On modern kernels (5.6+), the distinction is even less relevant—both interfaces use the same CSPRNG once initialized.

Platform-Specific Secure APIs

Every major platform provides a CSPRNG. Use it. Here are the correct APIs:

Python (3.6+):

import secrets
import os

# Generate secure random bytes
secure_bytes = secrets.token_bytes(32)

# Generate URL-safe token (base64 encoded)
url_token = secrets.token_urlsafe(32)

# Generate hex token
hex_token = secrets.token_hex(32)

# Secure random integer in range [0, n)
secure_int = secrets.randbelow(1000000)

# For raw bytes, os.urandom also works
raw_bytes = os.urandom(32)

print(f"URL-safe token: {url_token}")
print(f"Hex token: {hex_token}")

JavaScript (Node.js):

const crypto = require('crypto');

// Generate secure random bytes
const secureBytes = crypto.randomBytes(32);
console.log('Hex:', secureBytes.toString('hex'));
console.log('Base64:', secureBytes.toString('base64'));

// Async version for non-blocking generation
crypto.randomBytes(32, (err, buffer) => {
    if (err) throw err;
    console.log('Async hex:', buffer.toString('hex'));
});

// Secure random integer in range (Node.js 14.10+)
const randomInt = crypto.randomInt(0, 1000000);
console.log('Random integer:', randomInt);

JavaScript (Browser):

// Generate secure random bytes in browser
const array = new Uint8Array(32);
crypto.getRandomValues(array);

// Convert to hex string
const hexToken = Array.from(array)
    .map(b => b.toString(16).padStart(2, '0'))
    .join('');
console.log('Browser token:', hexToken);

Go:

package main

import (
    "crypto/rand"
    "encoding/base64"
    "encoding/hex"
    "fmt"
    "math/big"
)

func main() {
    // Generate secure random bytes
    bytes := make([]byte, 32)
    _, err := rand.Read(bytes)
    if err != nil {
        panic(err)
    }
    
    fmt.Println("Hex:", hex.EncodeToString(bytes))
    fmt.Println("Base64:", base64.URLEncoding.EncodeToString(bytes))
    
    // Secure random integer in range [0, max)
    max := big.NewInt(1000000)
    n, err := rand.Int(rand.Reader, max)
    if err != nil {
        panic(err)
    }
    fmt.Println("Random int:", n)
}

Note that Go’s math/rand is not cryptographically secure—always use crypto/rand for security purposes.

Common Use Cases and Implementation Patterns

Different security contexts require different token lengths and formats:

Use Case	Minimum Bits	Recommended
Session tokens	128 bits	256 bits
CSRF tokens	128 bits	128 bits
Password reset tokens	128 bits	256 bits
API keys	128 bits	256 bits
Encryption keys (AES-256)	256 bits	256 bits
Initialization vectors (AES)	128 bits	128 bits

Here’s a practical token generator in Python:

import secrets
import string

def generate_token(
    length: int = 32,
    encoding: str = 'hex'
) -> str:
    """
    Generate a cryptographically secure token.
    
    Args:
        length: Number of random bytes (not output length)
        encoding: 'hex', 'base64', 'urlsafe', or 'alphanumeric'
    
    Returns:
        Encoded token string
    """
    if encoding == 'hex':
        return secrets.token_hex(length)
    elif encoding == 'base64':
        return secrets.token_urlsafe(length)
    elif encoding == 'urlsafe':
        return secrets.token_urlsafe(length)
    elif encoding == 'alphanumeric':
        alphabet = string.ascii_letters + string.digits
        # Use secrets.choice for unbiased selection
        return ''.join(secrets.choice(alphabet) for _ in range(length))
    else:
        raise ValueError(f"Unknown encoding: {encoding}")

# Generate various token types
session_token = generate_token(32, 'urlsafe')  # 256 bits
csrf_token = generate_token(16, 'hex')          # 128 bits
api_key = generate_token(32, 'alphanumeric')    # ~190 bits

print(f"Session: {session_token}")
print(f"CSRF: {csrf_token}")
print(f"API Key: {api_key}")

Pitfalls and Anti-Patterns

Time-based seeding: Seeding with the current timestamp reduces your entropy to the number of plausible timestamps. If an attacker knows roughly when a token was generated, they can brute-force all possibilities in seconds.

Modulo bias: When you need a random number in a range, naive modulo operations create bias:

import secrets

def biased_random(max_val: int) -> int:
    """DON'T DO THIS - demonstrates modulo bias"""
    random_byte = secrets.token_bytes(1)[0]  # 0-255
    return random_byte % max_val

def unbiased_random(max_val: int) -> int:
    """Correct approach using rejection sampling"""
    return secrets.randbelow(max_val)

# Demonstrate bias with max_val = 100
# 256 % 100 = 56, so values 0-55 are slightly more likely
from collections import Counter

biased_results = Counter(biased_random(100) for _ in range(100000))
unbiased_results = Counter(unbiased_random(100) for _ in range(100000))

# Values 0-55 will appear ~2% more often in biased version
print(f"Biased - value 0 count: {biased_results[0]}")
print(f"Biased - value 99 count: {biased_results[99]}")
print(f"Unbiased - value 0 count: {unbiased_results[0]}")
print(f"Unbiased - value 99 count: {unbiased_results[99]}")

Insufficient token length: A 64-bit token sounds large, but it’s only 8 bytes. With modern hardware, 64-bit values can be brute-forced. Always use at least 128 bits for security tokens.

Reusing IVs/nonces: Encryption modes like AES-GCM catastrophically fail if you reuse a nonce with the same key. Generate fresh random values for each encryption operation.

Testing and Validation

Statistical test suites like NIST SP 800-22 and Diehard can detect obvious flaws in random number generators. However, passing these tests doesn’t prove cryptographic security—they only catch statistical anomalies.

For practical validation:

Code review: Verify you’re using the correct CSPRNG API
Static analysis: Tools like Semgrep can flag insecure random usage
Entropy verification: Ensure your system has adequate entropy at boot
Integration testing: Verify tokens have expected length and format

# Check for insecure random usage in Python
grep -r "random.random\|random.randint" --include="*.py" src/

# Better: use semgrep rules for security
semgrep --config p/security-audit src/

The most important test is the simplest: review your code and confirm every security-sensitive random value comes from a CSPRNG. There’s no statistical test that can distinguish a CSPRNG from a well-designed PRNG—security comes from the algorithm’s construction, not its output distribution.

The bottom line: Use your platform’s CSPRNG. Don’t seed it yourself. Don’t reduce its output through biased operations. When in doubt, generate more bits than you think you need. Cryptographic randomness is a solved problem—your job is to avoid unsolved it.