NumPy - Create Array (np.array) with Examples

The `np.array()` function converts Python sequences into NumPy arrays. The simplest case takes a flat list:

Key Insights

  • np.array() creates NumPy arrays from Python sequences with automatic dtype inference, while np.asarray() avoids copying data when the input is already an array
  • Explicit dtype specification prevents silent type conversions and memory issues—always set dtype for numeric operations requiring specific precision
  • Multi-dimensional arrays require consistent inner sequence lengths; jagged arrays default to object dtype, killing performance

Creating Arrays from Python Lists

The np.array() function converts Python sequences into NumPy arrays. The simplest case takes a flat list:

import numpy as np

# Basic array creation
arr = np.array([1, 2, 3, 4, 5])
print(arr)  # [1 2 3 4 5]
print(type(arr))  # <class 'numpy.ndarray'>
print(arr.dtype)  # int64 (or int32 on Windows)

NumPy infers the data type from input elements. Mixed numeric types promote to the most general type:

# Integer and float mix
arr = np.array([1, 2, 3.5, 4])
print(arr)  # [1.  2.  3.5 4. ]
print(arr.dtype)  # float64

# All integers stay integer
arr = np.array([1, 2, 3, 4])
print(arr.dtype)  # int64

Explicit Data Type Specification

Always specify dtype for production code. Type inference causes subtle bugs when data changes:

# Explicit integer type
arr = np.array([1, 2, 3, 4], dtype=np.int32)
print(arr.dtype)  # int32

# Explicit float type
arr = np.array([1, 2, 3, 4], dtype=np.float64)
print(arr)  # [1. 2. 3. 4.]

# Unsigned integers for non-negative data
arr = np.array([1, 2, 3, 4], dtype=np.uint8)
print(arr.dtype)  # uint8

# Complex numbers
arr = np.array([1, 2, 3], dtype=np.complex128)
print(arr)  # [1.+0.j 2.+0.j 3.+0.j]

Type specification prevents memory waste. An array of small integers doesn’t need 64 bits per element:

# Memory comparison
arr_default = np.array(range(1000))
arr_optimized = np.array(range(1000), dtype=np.int16)

print(arr_default.nbytes)  # 8000 bytes
print(arr_optimized.nbytes)  # 2000 bytes

Multi-Dimensional Arrays

Nested sequences create multi-dimensional arrays. Each nesting level adds a dimension:

# 2D array (matrix)
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr_2d)
# [[1 2 3]
#  [4 5 6]]
print(arr_2d.shape)  # (2, 3)
print(arr_2d.ndim)  # 2

# 3D array
arr_3d = np.array([
    [[1, 2], [3, 4]],
    [[5, 6], [7, 8]]
])
print(arr_3d.shape)  # (2, 2, 2)
print(arr_3d.ndim)  # 3

Inconsistent dimensions create object arrays—a performance disaster:

# Jagged array - BAD
jagged = np.array([[1, 2, 3], [4, 5]])
print(jagged.dtype)  # object
print(jagged.shape)  # (2,)
print(jagged[0])  # [1, 2, 3] - still a list!

# Proper rectangular array - GOOD
proper = np.array([[1, 2, 3], [4, 5, 6]])
print(proper.dtype)  # int64
print(proper.shape)  # (2, 3)

Array from Tuples and Other Sequences

np.array() accepts any sequence type, not just lists:

# From tuple
arr = np.array((1, 2, 3, 4))
print(arr)  # [1 2 3 4]

# From range
arr = np.array(range(5))
print(arr)  # [0 1 2 3 4]

# From generator (materializes entire sequence)
arr = np.array(x**2 for x in range(5))
print(arr)  # [ 0  1  4  9 16]

Copy vs View Behavior

np.array() always copies data. Use np.asarray() to avoid unnecessary copies:

# np.array always copies
original = np.array([1, 2, 3, 4])
copy = np.array(original)
copy[0] = 999
print(original[0])  # 1 (unchanged)

# np.asarray returns view when possible
original = np.array([1, 2, 3, 4])
view = np.asarray(original)
view[0] = 999
print(original[0])  # 999 (changed!)

# np.asarray copies when converting from list
list_data = [1, 2, 3, 4]
arr = np.asarray(list_data)
list_data[0] = 999
print(arr[0])  # 1 (independent)

String and Boolean Arrays

String arrays have fixed-width dtype. Longer strings get truncated:

# String array
arr = np.array(['apple', 'banana', 'cherry'])
print(arr.dtype)  # <U6 (Unicode, 6 characters max)

# Truncation happens silently
arr = np.array(['short', 'this is a very long string'])
print(arr.dtype)  # <U26
arr2 = np.array(['short', 'this is a very long string'], dtype='U5')
print(arr2)  # ['short' 'this ']

# Boolean arrays
arr = np.array([True, False, True, True])
print(arr.dtype)  # bool
print(arr.nbytes)  # 4 (1 byte per element)

Structured Arrays with Mixed Types

Structured arrays store heterogeneous data with named fields:

# Define structure
dt = np.dtype([('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])

# Create structured array
data = np.array([
    ('Alice', 25, 55.5),
    ('Bob', 30, 75.2),
    ('Charlie', 35, 68.8)
], dtype=dt)

print(data['name'])  # ['Alice' 'Bob' 'Charlie']
print(data['age'])  # [25 30 35]
print(data[0])  # ('Alice', 25, 55.5)

Common Pitfalls

Scalar Input

Passing a scalar creates a 0-dimensional array:

# 0-dimensional array
arr = np.array(42)
print(arr.shape)  # ()
print(arr.ndim)  # 0
print(arr[()])  # 42

# To create 1D array with single element
arr = np.array([42])
print(arr.shape)  # (1,)

ndmin Parameter

Force minimum dimensions with ndmin:

# Force 2D array from 1D input
arr = np.array([1, 2, 3], ndmin=2)
print(arr.shape)  # (1, 3)
print(arr)  # [[1 2 3]]

# Force 3D
arr = np.array([1, 2, 3], ndmin=3)
print(arr.shape)  # (1, 1, 3)

Copy Flag

The copy parameter is deprecated. Use np.asarray() for no-copy behavior:

# Modern approach
original = [1, 2, 3, 4]
arr = np.asarray(original)  # Copies from list

original_arr = np.array([1, 2, 3, 4])
view = np.asarray(original_arr)  # No copy

Performance Considerations

Pre-allocate arrays when size is known. Creating arrays in loops kills performance:

import time

# BAD: Growing array in loop
start = time.time()
arr = np.array([])
for i in range(10000):
    arr = np.append(arr, i)
print(f"Append time: {time.time() - start:.3f}s")

# GOOD: Pre-allocate
start = time.time()
arr = np.empty(10000, dtype=np.int64)
for i in range(10000):
    arr[i] = i
print(f"Pre-allocate time: {time.time() - start:.3f}s")

# BEST: Use built-in constructors
start = time.time()
arr = np.arange(10000)
print(f"Built-in time: {time.time() - start:.3f}s")

Use np.fromiter() for generators when you know the count and dtype:

# Efficient generator consumption
arr = np.fromiter((x**2 for x in range(1000)), dtype=np.int64, count=1000)
print(arr.shape)  # (1000,)

Verification and Validation

Always validate array properties after creation:

def create_validated_array(data, expected_shape, expected_dtype):
    arr = np.array(data, dtype=expected_dtype)
    
    if arr.shape != expected_shape:
        raise ValueError(f"Expected shape {expected_shape}, got {arr.shape}")
    
    if arr.dtype != expected_dtype:
        raise ValueError(f"Expected dtype {expected_dtype}, got {arr.dtype}")
    
    return arr

# Usage
arr = create_validated_array(
    [[1, 2, 3], [4, 5, 6]], 
    expected_shape=(2, 3), 
    expected_dtype=np.int64
)

The np.array() function is the foundation of NumPy operations. Master dtype specification, understand copy semantics, and avoid object arrays for production code. Pre-allocation beats dynamic growth every time.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.