How to Create Arrays in NumPy

NumPy arrays are the foundation of scientific computing in Python. While Python lists are flexible and convenient, they're terrible for numerical work. Each element in a list is a full Python object...

Key Insights

  • NumPy arrays are up to 50x faster than Python lists for numerical operations because they store data in contiguous memory blocks with fixed types, enabling vectorized operations.
  • Choose your array creation method based on your use case: np.array() for existing data, np.zeros()/np.ones() for initialization, np.arange()/np.linspace() for sequences, and np.random for simulations.
  • Always specify dtype explicitly when memory or precision matters—NumPy defaults to float64, which may be overkill for many applications.

Introduction to NumPy Arrays

NumPy arrays are the foundation of scientific computing in Python. While Python lists are flexible and convenient, they’re terrible for numerical work. Each element in a list is a full Python object with type information, reference counting, and other overhead. NumPy arrays store raw numerical data in contiguous memory blocks, enabling operations that are orders of magnitude faster.

import numpy as np

# Python list approach - slow and verbose
python_list = [1, 2, 3, 4, 5]
squared_list = [x ** 2 for x in python_list]

# NumPy approach - fast and clean
numpy_array = np.array([1, 2, 3, 4, 5])
squared_array = numpy_array ** 2

print(squared_array)  # array([ 1,  4,  9, 16, 25])

The difference becomes stark with larger datasets. A million-element array operation that takes 500ms with Python lists completes in under 10ms with NumPy. This isn’t a minor optimization—it’s the difference between interactive analysis and waiting for your coffee to brew.

Creating Arrays from Python Lists

The most straightforward way to create a NumPy array is converting an existing Python list using np.array(). This works for any iterable, including tuples and nested structures.

import numpy as np

# 1D array from a list
one_dimensional = np.array([1, 2, 3, 4, 5])
print(one_dimensional)
# Output: [1 2 3 4 5]
print(one_dimensional.shape)  # (5,)

# 2D array from nested lists
two_dimensional = np.array([[1, 2, 3], 
                            [4, 5, 6]])
print(two_dimensional)
# Output:
# [[1 2 3]
#  [4 5 6]]
print(two_dimensional.shape)  # (2, 3)

# 3D array - think of it as a stack of 2D arrays
three_dimensional = np.array([[[1, 2], [3, 4]], 
                               [[5, 6], [7, 8]]])
print(three_dimensional.shape)  # (2, 2, 2)

A critical detail: NumPy arrays are homogeneous. If you pass mixed types, NumPy will upcast everything to the most general type.

# Mixed types get upcast
mixed = np.array([1, 2.5, 3])
print(mixed.dtype)  # float64 - integers became floats

# Explicit dtype control
integers_only = np.array([1.9, 2.5, 3.1], dtype=np.int32)
print(integers_only)  # [1 2 3] - truncated, not rounded

Watch out for that truncation behavior. If you need rounding, use np.round() before converting.

Creating Arrays with Built-in Functions

When you need arrays of a specific shape filled with initial values, NumPy provides specialized functions that are faster than creating a list and converting it.

import numpy as np

# Zeros - great for accumulators and initialization
zeros_1d = np.zeros(5)
print(zeros_1d)  # [0. 0. 0. 0. 0.]

zeros_2d = np.zeros((3, 4))  # Note: shape is a tuple
print(zeros_2d)
# [[0. 0. 0. 0.]
#  [0. 0. 0. 0.]
#  [0. 0. 0. 0.]]

# Ones - useful for masks and multiplicative identities
ones_array = np.ones((2, 3), dtype=np.int32)
print(ones_array)
# [[1 1 1]
#  [1 1 1]]

# Full - fill with any value
sevens = np.full((2, 2), 7)
print(sevens)
# [[7 7]
#  [7 7]]

# Fill with a specific value including special floats
infinity_array = np.full((3,), np.inf)
print(infinity_array)  # [inf inf inf]

There’s also np.empty(), which allocates memory without initializing it. This is marginally faster but dangerous—the array contains whatever garbage was in memory.

# Empty - uninitialized, contains garbage values
empty_array = np.empty((2, 2))
print(empty_array)  # Random values - never assume these are zeros!

# Only use np.empty() when you're immediately overwriting all values
buffer = np.empty((1000, 1000))
buffer[:] = compute_values()  # Immediately filled

My recommendation: avoid np.empty() unless you’re optimizing a tight loop and have profiled to confirm it matters. The bugs it can introduce aren’t worth the nanoseconds saved.

Creating Sequential and Ranged Arrays

For numerical sequences, np.arange() and np.linspace() are your workhorses. They serve different purposes despite seeming similar.

import numpy as np

# arange - like Python's range(), but returns an array
# Syntax: np.arange(start, stop, step)
integers = np.arange(0, 10, 2)
print(integers)  # [0 2 4 6 8]

# Works with floats, but beware of precision issues
floats = np.arange(0, 1, 0.1)
print(floats)  # [0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]
print(len(floats))  # 10, but might be 9 or 11 due to floating point!

# linspace - specify number of points, not step size
# Syntax: np.linspace(start, stop, num)
linear = np.linspace(0, 1, 5)
print(linear)  # [0.   0.25 0.5  0.75 1.  ]

# linspace includes the endpoint by default
# This is usually what you want for plotting
x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(x)

Here’s the practical rule: use np.arange() for integer sequences and np.linspace() for floating-point ranges. The floating-point precision issues with np.arange() have bitten countless developers.

# Logarithmic spacing - essential for log-scale plots
log_space = np.logspace(0, 3, 4)  # 10^0 to 10^3
print(log_space)  # [   1.   10.  100. 1000.]

# Geometric spacing - constant ratio between elements
geo_space = np.geomspace(1, 1000, 4)
print(geo_space)  # [   1.   10.  100. 1000.]

Creating Random Arrays

The np.random module is essential for simulations, testing, and machine learning. NumPy 1.17+ introduced a new random API that’s faster and more statistically robust.

import numpy as np

# Modern approach: create a Generator
rng = np.random.default_rng(seed=42)  # Seed for reproducibility

# Uniform random floats in [0, 1)
uniform = rng.random((3, 3))
print(uniform)

# Random integers
dice_rolls = rng.integers(1, 7, size=10)  # 1 to 6 inclusive
print(dice_rolls)  # [6 1 5 2 4 4 6 2 4 3]

# Normal (Gaussian) distribution
normal = rng.normal(loc=0, scale=1, size=1000)  # mean=0, std=1
print(f"Mean: {normal.mean():.3f}, Std: {normal.std():.3f}")

# Other distributions
exponential = rng.exponential(scale=2.0, size=100)
poisson = rng.poisson(lam=5, size=100)
binomial = rng.binomial(n=10, p=0.5, size=100)

The legacy API (np.random.rand(), np.random.randint()) still works but is deprecated for new code. Always use the Generator-based approach.

# Shuffling and sampling
data = np.array([1, 2, 3, 4, 5])

# Shuffle in place
rng.shuffle(data)
print(data)  # [3 1 5 2 4] (example)

# Random sample without replacement
sample = rng.choice(data, size=3, replace=False)
print(sample)

# Random permutation (returns new array)
perm = rng.permutation(10)
print(perm)  # [2 8 4 9 1 6 7 3 0 5] (example)

Creating Identity and Diagonal Matrices

Linear algebra operations frequently require identity matrices and diagonal matrices. NumPy makes these trivial to create.

import numpy as np

# Identity matrix - ones on diagonal, zeros elsewhere
identity_3x3 = np.eye(3)
print(identity_3x3)
# [[1. 0. 0.]
#  [0. 1. 0.]
#  [0. 0. 1.]]

# Non-square "identity" matrix
rectangular = np.eye(3, 4)
print(rectangular)
# [[1. 0. 0. 0.]
#  [0. 1. 0. 0.]
#  [0. 0. 1. 0.]]

# Offset diagonal with k parameter
upper_diagonal = np.eye(3, k=1)
print(upper_diagonal)
# [[0. 1. 0.]
#  [0. 0. 1.]
#  [0. 0. 0.]]

# Create diagonal matrix from values
diagonal_values = np.diag([1, 2, 3, 4])
print(diagonal_values)
# [[1 0 0 0]
#  [0 2 0 0]
#  [0 0 3 0]
#  [0 0 0 4]]

# Extract diagonal from existing matrix
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
diagonal = np.diag(matrix)
print(diagonal)  # [1 5 9]

Notice that np.diag() does double duty: it creates a diagonal matrix from a 1D array and extracts the diagonal from a 2D array.

Quick Reference Summary

Here’s a consolidated reference for all array creation methods:

Method Syntax Use Case
np.array() np.array([1, 2, 3]) Convert existing data
np.zeros() np.zeros((3, 4)) Initialize accumulators
np.ones() np.ones((3, 4)) Masks, multiplicative init
np.full() np.full((3, 4), 7) Fill with specific value
np.empty() np.empty((3, 4)) Pre-allocate (use carefully)
np.arange() np.arange(0, 10, 2) Integer sequences
np.linspace() np.linspace(0, 1, 50) Float ranges for plotting
np.eye() np.eye(3) Identity matrices
np.diag() np.diag([1, 2, 3]) Diagonal matrices
rng.random() rng.random((3, 4)) Uniform random
rng.integers() rng.integers(0, 10, (3, 4)) Random integers
rng.normal() rng.normal(0, 1, (3, 4)) Gaussian distribution
import numpy as np

# All methods in one snippet
rng = np.random.default_rng(42)

arrays = {
    'from_list': np.array([1, 2, 3]),
    'zeros': np.zeros((2, 3)),
    'ones': np.ones((2, 3)),
    'full': np.full((2, 3), 5),
    'arange': np.arange(0, 10, 2),
    'linspace': np.linspace(0, 1, 5),
    'eye': np.eye(3),
    'diag': np.diag([1, 2, 3]),
    'random': rng.random((2, 3)),
    'randint': rng.integers(0, 10, (2, 3)),
}

for name, arr in arrays.items():
    print(f"{name}: shape={arr.shape}, dtype={arr.dtype}")

Master these creation methods and you’ll handle 95% of NumPy initialization tasks. The remaining 5% involves specialized functions like np.fromfunction(), np.fromfile(), and np.frombuffer()—worth learning when you need them, but not essential for everyday work.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.