How to Generate Random Numbers from a Normal Distribution in Python

Key Insights

NumPy’s random.normal() is the go-to choice for most applications—it’s fast, flexible, and handles arrays natively without loops
Python’s built-in random.gauss() works perfectly for simple use cases where you want to avoid external dependencies
Always set a random seed when your results need to be reproducible, and prefer the newer np.random.Generator API over the legacy np.random.seed() approach

Introduction to Normal Distributions

The normal distribution (also called Gaussian distribution) is the backbone of statistical analysis. It’s that familiar bell-shaped curve where values cluster around a central mean, with probability decreasing symmetrically as you move away from the center. About 68% of values fall within one standard deviation of the mean, 95% within two, and 99.7% within three.

You’ll encounter normal distributions constantly in practical work: modeling measurement errors, simulating stock price movements, generating synthetic test data, implementing machine learning algorithms, and running Monte Carlo simulations. If you’re doing any kind of statistical modeling or data science in Python, you need to know how to generate these numbers efficiently.

Let’s look at the three main approaches, when to use each, and how to do it right.

Using NumPy’s random.normal()

NumPy is the standard choice for numerical computing in Python, and its random number generation is both fast and flexible. The numpy.random.normal() function is what you’ll use 90% of the time.

The function takes three parameters:

loc: The mean (center) of the distribution (default: 0.0)
scale: The standard deviation (spread) of the distribution (default: 1.0)
size: The output shape—can be an integer or tuple (default: None, returns single value)

import numpy as np

# Generate a single random number from standard normal (mean=0, std=1)
single_value = np.random.normal()
print(f"Single value: {single_value}")

# Generate a single value with custom mean and standard deviation
# Example: heights with mean 170cm and std dev 10cm
height = np.random.normal(loc=170, scale=10)
print(f"Random height: {height:.1f} cm")

# Generate an array of 1000 values
samples = np.random.normal(loc=0, scale=1, size=1000)
print(f"Array shape: {samples.shape}")
print(f"Sample mean: {samples.mean():.3f}")
print(f"Sample std: {samples.std():.3f}")

# Generate a 2D array (useful for batch operations)
matrix = np.random.normal(loc=50, scale=15, size=(100, 5))
print(f"Matrix shape: {matrix.shape}")

The power of NumPy becomes obvious when you need large quantities of random numbers. Generating a million values takes milliseconds, and you get back a proper NumPy array that integrates seamlessly with the rest of the scientific Python ecosystem.

# Generate 1 million samples efficiently
large_sample = np.random.normal(loc=100, scale=25, size=1_000_000)
print(f"Generated {len(large_sample):,} values in memory")

Using Python’s Built-in random.gauss()

Sometimes you don’t want to add NumPy as a dependency. Maybe you’re writing a simple script, building a lightweight package, or working in an environment where installing packages is complicated. Python’s standard library has you covered.

The random module provides two functions for normal distributions: gauss() and normalvariate(). They’re functionally identical, but gauss() is slightly faster because it’s not thread-safe (it reuses a cached value internally).

import random

# Generate a single value using gauss()
value = random.gauss(mu=0, sigma=1)
print(f"gauss(): {value}")

# Generate using normalvariate() - thread-safe alternative
value_nv = random.normalvariate(mu=0, sigma=1)
print(f"normalvariate(): {value_nv}")

# Generate multiple values (requires a loop or comprehension)
samples = [random.gauss(mu=100, sigma=15) for _ in range(1000)]
print(f"Generated {len(samples)} values")
print(f"Mean: {sum(samples)/len(samples):.2f}")

The main limitation is obvious: no array support. You need loops or comprehensions to generate multiple values, which is slower than NumPy for large datasets. For generating a handful of values in a simple script, though, it’s perfectly adequate.

# Practical example: simulating dice rolls with measurement error
def simulate_measurements(true_value, measurement_error, n_measurements):
    """Simulate n measurements of a true value with Gaussian error."""
    return [random.gauss(true_value, measurement_error) 
            for _ in range(n_measurements)]

measurements = simulate_measurements(true_value=42.0, 
                                      measurement_error=0.5, 
                                      n_measurements=10)
print(f"Measurements: {[f'{m:.2f}' for m in measurements]}")

Using SciPy’s stats.norm

SciPy takes an object-oriented approach to probability distributions. Instead of calling a function directly, you work with a distribution object that provides a full suite of statistical methods.

from scipy import stats

# Create a normal distribution object
dist = stats.norm(loc=50, scale=10)

# Generate random samples using .rvs() (random variates)
samples = dist.rvs(size=1000)
print(f"Generated {len(samples)} samples")

# The same object gives you access to other useful methods
x = 60
print(f"PDF at x={x}: {dist.pdf(x):.4f}")  # Probability density
print(f"CDF at x={x}: {dist.cdf(x):.4f}")  # Cumulative probability
print(f"Percentile for p=0.95: {dist.ppf(0.95):.2f}")  # Inverse CDF

# Get distribution statistics
print(f"Mean: {dist.mean()}")
print(f"Variance: {dist.var()}")
print(f"Standard deviation: {dist.std()}")

This approach shines when you need more than just random samples. If you’re doing statistical analysis—calculating probabilities, finding percentiles, or fitting distributions to data—SciPy’s interface is cleaner than juggling separate functions.

# Practical example: confidence intervals
dist = stats.norm(loc=100, scale=15)

# What range contains 95% of values?
lower, upper = dist.interval(0.95)
print(f"95% of values fall between {lower:.1f} and {upper:.1f}")

# What's the probability a value exceeds 130?
prob_exceed = 1 - dist.cdf(130)
print(f"P(X > 130) = {prob_exceed:.4f}")

Visualizing Your Generated Data

Always verify your random numbers look right. A histogram should show that characteristic bell curve, and the sample statistics should be close to your specified parameters.

import numpy as np
import matplotlib.pyplot as plt

# Generate samples
mean, std = 100, 15
samples = np.random.normal(loc=mean, scale=std, size=10000)

# Create histogram with theoretical curve overlay
fig, ax = plt.subplots(figsize=(10, 6))

# Plot histogram (density=True normalizes to match PDF)
ax.hist(samples, bins=50, density=True, alpha=0.7, 
        color='steelblue', edgecolor='white', label='Generated samples')

# Overlay theoretical normal curve
x = np.linspace(mean - 4*std, mean + 4*std, 200)
theoretical_pdf = (1/(std * np.sqrt(2*np.pi))) * np.exp(-0.5*((x-mean)/std)**2)
ax.plot(x, theoretical_pdf, 'r-', linewidth=2, label='Theoretical PDF')

ax.set_xlabel('Value')
ax.set_ylabel('Density')
ax.set_title(f'Normal Distribution (μ={mean}, σ={std})')
ax.legend()

plt.tight_layout()
plt.savefig('normal_distribution.png', dpi=150)
plt.show()

# Print verification statistics
print(f"Sample mean: {samples.mean():.2f} (expected: {mean})")
print(f"Sample std: {samples.std():.2f} (expected: {std})")

If your histogram doesn’t look right, you’ve probably mixed up your parameters or have a bug in your code. This visual check takes 30 seconds and can save hours of debugging downstream.

Setting Seeds for Reproducibility

Random numbers aren’t actually random—they’re generated by deterministic algorithms from a starting value called a seed. Set the same seed, get the same sequence. This is essential for debugging, testing, and scientific reproducibility.

The legacy approach uses np.random.seed(), but NumPy now recommends the Generator API:

import numpy as np

# Legacy approach (still works, but not recommended for new code)
np.random.seed(42)
old_style = np.random.normal(0, 1, 5)
print(f"Legacy method: {old_style}")

# Reset and verify reproducibility
np.random.seed(42)
old_style_again = np.random.normal(0, 1, 5)
print(f"Same results: {np.array_equal(old_style, old_style_again)}")

# Modern approach: use Generator (recommended)
rng = np.random.default_rng(seed=42)
modern_style = rng.normal(0, 1, 5)
print(f"Generator method: {modern_style}")

# Create new generator with same seed for reproducibility
rng2 = np.random.default_rng(seed=42)
modern_style_again = rng2.normal(0, 1, 5)
print(f"Same results: {np.array_equal(modern_style, modern_style_again)}")

The Generator approach is better because it’s explicit (no global state), supports better algorithms, and plays nicely with parallel code. Use it for new projects.

# For the standard library
import random

random.seed(42)
values = [random.gauss(0, 1) for _ in range(5)]
print(f"Reproducible values: {values}")

Choosing the Right Method

Here’s a practical decision framework:

Method	Speed	Dependencies	Best For
`numpy.random.normal()`	Fast	NumPy	Arrays, large datasets, scientific computing
`random.gauss()`	Moderate	None	Simple scripts, minimal dependencies
`scipy.stats.norm`	Moderate	SciPy	Statistical analysis, PDF/CDF calculations

My recommendations:

Default choice: Use NumPy. It’s already a dependency in most data science projects, it’s fast, and the array support is invaluable.
Lightweight scripts: Use random.gauss() when you’re generating a few values and don’t want to import NumPy.
Statistical work: Use SciPy when you need the full distribution object—percentiles, probabilities, fitting, and so on.

For reproducibility, always set seeds in research code, tests, and anywhere results need to be consistent across runs. Use the modern Generator API for NumPy projects.

That’s it. Pick the method that fits your use case, set a seed if you need reproducibility, and verify your output with a quick histogram. Normal distribution generation is a solved problem—now go build something interesting with it.