How to Create an Array of Random Numbers in NumPy

Random number generation is foundational to modern computing. Whether you're running Monte Carlo simulations, initializing neural network weights, generating synthetic test data, or bootstrapping...

Key Insights

  • NumPy’s modern Generator API (via default_rng()) replaces the legacy numpy.random functions with better statistical properties and a cleaner interface—use it for all new code.
  • Seeding your random generator with a fixed value ensures reproducible results, which is essential for debugging, testing, and scientific experiments.
  • The Generator API provides specialized methods for different use cases: integers() for whole numbers, random() for uniform floats, and distribution-specific methods like normal() for statistical sampling.

Introduction

Random number generation is foundational to modern computing. Whether you’re running Monte Carlo simulations, initializing neural network weights, generating synthetic test data, or bootstrapping statistical samples, you need reliable random arrays. NumPy’s random module is the standard tool for this job in Python.

NumPy underwent a significant overhaul of its random number generation in version 1.17. The old approach—calling functions directly on numpy.random—still works but is considered legacy. The modern approach uses a Generator object created via default_rng(). This article focuses on the modern API because it’s what you should use in new code. It offers better statistical properties, clearer semantics, and more flexibility.

Let’s get practical and build arrays of random numbers for real-world scenarios.

Setting Up the Random Generator

The entry point for modern NumPy random number generation is numpy.random.default_rng(). This function returns a Generator instance configured with NumPy’s default bit generator (PCG64), which has excellent statistical properties and performance.

import numpy as np

# Create a Generator with a random seed (different results each run)
rng = np.random.default_rng()

# Create a Generator with a fixed seed (reproducible results)
rng_seeded = np.random.default_rng(seed=42)

# Verify reproducibility
rng1 = np.random.default_rng(seed=12345)
rng2 = np.random.default_rng(seed=12345)

print(rng1.random(5))
# [0.22733602 0.31675834 0.79736546 0.67625467 0.39110955]

print(rng2.random(5))
# [0.22733602 0.31675834 0.79736546 0.67625467 0.39110955]

Always seed your generator when you need reproducible results. This is non-negotiable for scientific computing, unit tests, and debugging. For production systems where you want true randomness, omit the seed or use None explicitly.

A common pattern is to create one generator at the module level and reuse it throughout your code. This avoids the overhead of creating new generators and makes it easy to control reproducibility from a single location.

# At the top of your module
RNG = np.random.default_rng(seed=42)

def generate_sample_data():
    return RNG.random(100)

def generate_test_labels():
    return RNG.integers(0, 10, size=100)

Generating Arrays of Random Integers

The integers() method generates random integers within a specified range. By default, the lower bound is inclusive and the upper bound is exclusive (like Python’s range()).

rng = np.random.default_rng(seed=42)

# Generate 10 random integers from 0 to 99
random_ints = rng.integers(0, 100, size=10)
print(random_ints)
# [51 92 14 71 60 20 82 86 74 74]

# Generate a 3x4 matrix of random integers from 1 to 6 (simulating dice rolls)
dice_rolls = rng.integers(1, 7, size=(3, 4))
print(dice_rolls)
# [[5 2 2 6]
#  [4 1 6 2]
#  [4 5 3 3]]

# Include the endpoint with endpoint=True
inclusive_range = rng.integers(1, 10, size=5, endpoint=True)
print(inclusive_range)  # Values from 1 to 10, inclusive
# [ 4  7  9  2 10]

The size parameter accepts integers for 1D arrays or tuples for multi-dimensional arrays. This flexibility lets you generate exactly the shape you need without reshaping afterward.

For simulating categorical data or indices, integers() is your go-to method. Need to randomly select indices for a train/test split? Generate array indices directly:

rng = np.random.default_rng(seed=42)

# Generate 20 random indices for a dataset of 1000 samples
sample_indices = rng.integers(0, 1000, size=20)
print(sample_indices)
# [510 920 143 717 606 201 820 864 746 742 707 803 132 284 650 628 581 922 986 256]

Generating Arrays of Random Floats

For floating-point random numbers, NumPy provides two primary methods: random() for uniform values in [0.0, 1.0), and uniform() for values in a custom range.

rng = np.random.default_rng(seed=42)

# Generate 5 random floats between 0 and 1
unit_floats = rng.random(5)
print(unit_floats)
# [0.77395605 0.43887844 0.85859792 0.69736803 0.09417735]

# Generate a 2x3 matrix of random floats between 0 and 1
float_matrix = rng.random((2, 3))
print(float_matrix)
# [[0.97562235 0.7611397  0.78606431]
#  [0.12811363 0.45038594 0.37079802]]

# Generate floats in a custom range using uniform()
# Random temperatures between -10 and 35 degrees
temperatures = rng.uniform(low=-10, high=35, size=7)
print(temperatures)
# [23.98638977 29.02224657 15.03498553 -3.66498081 22.08401032 16.91498367 11.73498702]

The difference between random() and uniform() is purely about convenience. You can achieve custom ranges with random() through arithmetic:

rng = np.random.default_rng(seed=42)

# These are equivalent
low, high = -10, 35
method1 = rng.uniform(low, high, size=5)

rng = np.random.default_rng(seed=42)  # Reset for comparison
method2 = low + (high - low) * rng.random(5)

print(np.allclose(method1, method2))  # True

Use uniform() when your intent is clearer with explicit bounds. Use random() when you’re scaling values yourself or need raw [0, 1) values.

Generating Arrays from Statistical Distributions

Real-world data rarely follows a uniform distribution. NumPy’s Generator provides methods for sampling from dozens of statistical distributions. The most commonly used are normal (Gaussian), binomial, and Poisson distributions.

rng = np.random.default_rng(seed=42)

# Normal distribution: mean=0, standard deviation=1
standard_normal = rng.normal(loc=0, scale=1, size=1000)
print(f"Mean: {standard_normal.mean():.4f}, Std: {standard_normal.std():.4f}")
# Mean: -0.0039, Std: 1.0014

# Normal distribution with custom parameters
# Heights in cm: mean=170, std=10
heights = rng.normal(loc=170, scale=10, size=100)
print(f"Height range: {heights.min():.1f} to {heights.max():.1f}")
# Height range: 143.2 to 196.8

# Binomial distribution: 10 trials, 50% success probability
coin_flips = rng.binomial(n=10, p=0.5, size=1000)
print(f"Average heads in 10 flips: {coin_flips.mean():.2f}")
# Average heads in 10 flips: 4.99

# Poisson distribution: average of 5 events
events_per_hour = rng.poisson(lam=5, size=24)
print(events_per_hour)
# [5 4 6 4 5 3 6 8 4 5 4 5 7 3 5 4 6 5 4 6 5 4 5 6]

For sampling from existing data, use choice():

rng = np.random.default_rng(seed=42)

categories = np.array(['red', 'green', 'blue', 'yellow'])

# Sample with replacement (default)
samples = rng.choice(categories, size=10)
print(samples)
# ['yellow' 'blue' 'red' 'yellow' 'yellow' 'red' 'yellow' 'yellow' 'yellow' 'yellow']

# Sample without replacement
unique_samples = rng.choice(categories, size=3, replace=False)
print(unique_samples)
# ['blue' 'yellow' 'red']

# Weighted sampling
weights = [0.5, 0.3, 0.15, 0.05]  # red is most likely
weighted_samples = rng.choice(categories, size=1000, p=weights)
print(np.unique(weighted_samples, return_counts=True))
# (array(['blue', 'green', 'red', 'yellow'], dtype='<U6'), array([154, 296, 499, 51]))

Controlling Array Shape and Data Types

NumPy’s random methods support arbitrary dimensions through the size parameter. For integers, you can also control the output data type.

rng = np.random.default_rng(seed=42)

# 3D array: 2 batches of 3x4 matrices
tensor = rng.random((2, 3, 4))
print(f"Shape: {tensor.shape}")
# Shape: (2, 3, 4)

# Specify dtype for integers
small_ints = rng.integers(0, 100, size=(3, 3), dtype=np.int8)
print(small_ints.dtype)  # int8

large_ints = rng.integers(0, 1000000, size=(3, 3), dtype=np.int64)
print(large_ints.dtype)  # int64

# For memory-critical applications, use the smallest dtype that fits your range
byte_array = rng.integers(0, 256, size=1000000, dtype=np.uint8)
print(f"Memory: {byte_array.nbytes / 1024:.1f} KB")
# Memory: 976.6 KB

Choosing appropriate data types matters for large arrays. A million int64 values consume 8 MB; the same data as uint8 uses just 1 MB.

Quick Reference: Legacy vs. Modern API

You’ll encounter legacy code using the old API. Here’s how to translate:

Legacy API Modern API Notes
np.random.seed(42) rng = np.random.default_rng(42) Create generator once, reuse
np.random.randint(0, 10, size=5) rng.integers(0, 10, size=5) Same behavior
np.random.rand(3, 4) rng.random((3, 4)) Note the tuple for shape
np.random.randn(3, 4) rng.standard_normal((3, 4)) Standard normal
np.random.uniform(0, 1, 5) rng.uniform(0, 1, 5) Same interface
np.random.choice(arr, 5) rng.choice(arr, 5) Same interface
np.random.shuffle(arr) rng.shuffle(arr) In-place shuffle

The legacy API uses global state, which causes problems in multithreaded code and makes reproducibility harder to manage. The modern API’s explicit Generator objects solve these issues. Use the modern API for all new projects, and consider migrating legacy code when you have the opportunity.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.