NumPy - Create Array (np.array) with Examples
The `np.array()` function converts Python sequences into NumPy arrays. The simplest case takes a flat list:
Key Insights
np.array()creates NumPy arrays from Python sequences with automatic dtype inference, whilenp.asarray()avoids copying data when the input is already an array- Explicit dtype specification prevents silent type conversions and memory issues—always set
dtypefor numeric operations requiring specific precision - Multi-dimensional arrays require consistent inner sequence lengths; jagged arrays default to object dtype, killing performance
Creating Arrays from Python Lists
The np.array() function converts Python sequences into NumPy arrays. The simplest case takes a flat list:
import numpy as np
# Basic array creation
arr = np.array([1, 2, 3, 4, 5])
print(arr) # [1 2 3 4 5]
print(type(arr)) # <class 'numpy.ndarray'>
print(arr.dtype) # int64 (or int32 on Windows)
NumPy infers the data type from input elements. Mixed numeric types promote to the most general type:
# Integer and float mix
arr = np.array([1, 2, 3.5, 4])
print(arr) # [1. 2. 3.5 4. ]
print(arr.dtype) # float64
# All integers stay integer
arr = np.array([1, 2, 3, 4])
print(arr.dtype) # int64
Explicit Data Type Specification
Always specify dtype for production code. Type inference causes subtle bugs when data changes:
# Explicit integer type
arr = np.array([1, 2, 3, 4], dtype=np.int32)
print(arr.dtype) # int32
# Explicit float type
arr = np.array([1, 2, 3, 4], dtype=np.float64)
print(arr) # [1. 2. 3. 4.]
# Unsigned integers for non-negative data
arr = np.array([1, 2, 3, 4], dtype=np.uint8)
print(arr.dtype) # uint8
# Complex numbers
arr = np.array([1, 2, 3], dtype=np.complex128)
print(arr) # [1.+0.j 2.+0.j 3.+0.j]
Type specification prevents memory waste. An array of small integers doesn’t need 64 bits per element:
# Memory comparison
arr_default = np.array(range(1000))
arr_optimized = np.array(range(1000), dtype=np.int16)
print(arr_default.nbytes) # 8000 bytes
print(arr_optimized.nbytes) # 2000 bytes
Multi-Dimensional Arrays
Nested sequences create multi-dimensional arrays. Each nesting level adds a dimension:
# 2D array (matrix)
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr_2d)
# [[1 2 3]
# [4 5 6]]
print(arr_2d.shape) # (2, 3)
print(arr_2d.ndim) # 2
# 3D array
arr_3d = np.array([
[[1, 2], [3, 4]],
[[5, 6], [7, 8]]
])
print(arr_3d.shape) # (2, 2, 2)
print(arr_3d.ndim) # 3
Inconsistent dimensions create object arrays—a performance disaster:
# Jagged array - BAD
jagged = np.array([[1, 2, 3], [4, 5]])
print(jagged.dtype) # object
print(jagged.shape) # (2,)
print(jagged[0]) # [1, 2, 3] - still a list!
# Proper rectangular array - GOOD
proper = np.array([[1, 2, 3], [4, 5, 6]])
print(proper.dtype) # int64
print(proper.shape) # (2, 3)
Array from Tuples and Other Sequences
np.array() accepts any sequence type, not just lists:
# From tuple
arr = np.array((1, 2, 3, 4))
print(arr) # [1 2 3 4]
# From range
arr = np.array(range(5))
print(arr) # [0 1 2 3 4]
# From generator (materializes entire sequence)
arr = np.array(x**2 for x in range(5))
print(arr) # [ 0 1 4 9 16]
Copy vs View Behavior
np.array() always copies data. Use np.asarray() to avoid unnecessary copies:
# np.array always copies
original = np.array([1, 2, 3, 4])
copy = np.array(original)
copy[0] = 999
print(original[0]) # 1 (unchanged)
# np.asarray returns view when possible
original = np.array([1, 2, 3, 4])
view = np.asarray(original)
view[0] = 999
print(original[0]) # 999 (changed!)
# np.asarray copies when converting from list
list_data = [1, 2, 3, 4]
arr = np.asarray(list_data)
list_data[0] = 999
print(arr[0]) # 1 (independent)
String and Boolean Arrays
String arrays have fixed-width dtype. Longer strings get truncated:
# String array
arr = np.array(['apple', 'banana', 'cherry'])
print(arr.dtype) # <U6 (Unicode, 6 characters max)
# Truncation happens silently
arr = np.array(['short', 'this is a very long string'])
print(arr.dtype) # <U26
arr2 = np.array(['short', 'this is a very long string'], dtype='U5')
print(arr2) # ['short' 'this ']
# Boolean arrays
arr = np.array([True, False, True, True])
print(arr.dtype) # bool
print(arr.nbytes) # 4 (1 byte per element)
Structured Arrays with Mixed Types
Structured arrays store heterogeneous data with named fields:
# Define structure
dt = np.dtype([('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])
# Create structured array
data = np.array([
('Alice', 25, 55.5),
('Bob', 30, 75.2),
('Charlie', 35, 68.8)
], dtype=dt)
print(data['name']) # ['Alice' 'Bob' 'Charlie']
print(data['age']) # [25 30 35]
print(data[0]) # ('Alice', 25, 55.5)
Common Pitfalls
Scalar Input
Passing a scalar creates a 0-dimensional array:
# 0-dimensional array
arr = np.array(42)
print(arr.shape) # ()
print(arr.ndim) # 0
print(arr[()]) # 42
# To create 1D array with single element
arr = np.array([42])
print(arr.shape) # (1,)
ndmin Parameter
Force minimum dimensions with ndmin:
# Force 2D array from 1D input
arr = np.array([1, 2, 3], ndmin=2)
print(arr.shape) # (1, 3)
print(arr) # [[1 2 3]]
# Force 3D
arr = np.array([1, 2, 3], ndmin=3)
print(arr.shape) # (1, 1, 3)
Copy Flag
The copy parameter is deprecated. Use np.asarray() for no-copy behavior:
# Modern approach
original = [1, 2, 3, 4]
arr = np.asarray(original) # Copies from list
original_arr = np.array([1, 2, 3, 4])
view = np.asarray(original_arr) # No copy
Performance Considerations
Pre-allocate arrays when size is known. Creating arrays in loops kills performance:
import time
# BAD: Growing array in loop
start = time.time()
arr = np.array([])
for i in range(10000):
arr = np.append(arr, i)
print(f"Append time: {time.time() - start:.3f}s")
# GOOD: Pre-allocate
start = time.time()
arr = np.empty(10000, dtype=np.int64)
for i in range(10000):
arr[i] = i
print(f"Pre-allocate time: {time.time() - start:.3f}s")
# BEST: Use built-in constructors
start = time.time()
arr = np.arange(10000)
print(f"Built-in time: {time.time() - start:.3f}s")
Use np.fromiter() for generators when you know the count and dtype:
# Efficient generator consumption
arr = np.fromiter((x**2 for x in range(1000)), dtype=np.int64, count=1000)
print(arr.shape) # (1000,)
Verification and Validation
Always validate array properties after creation:
def create_validated_array(data, expected_shape, expected_dtype):
arr = np.array(data, dtype=expected_dtype)
if arr.shape != expected_shape:
raise ValueError(f"Expected shape {expected_shape}, got {arr.shape}")
if arr.dtype != expected_dtype:
raise ValueError(f"Expected dtype {expected_dtype}, got {arr.dtype}")
return arr
# Usage
arr = create_validated_array(
[[1, 2, 3], [4, 5, 6]],
expected_shape=(2, 3),
expected_dtype=np.int64
)
The np.array() function is the foundation of NumPy operations. Master dtype specification, understand copy semantics, and avoid object arrays for production code. Pre-allocation beats dynamic growth every time.