NumPy - Create Array (np.array) with Examples

Key Insights

np.array() creates NumPy arrays from Python sequences with automatic dtype inference, while np.asarray() avoids copying data when the input is already an array
Explicit dtype specification prevents silent type conversions and memory issues—always set dtype for numeric operations requiring specific precision
Multi-dimensional arrays require consistent inner sequence lengths; jagged arrays default to object dtype, killing performance

Creating Arrays from Python Lists

The np.array() function converts Python sequences into NumPy arrays. The simplest case takes a flat list:

import numpy as np

# Basic array creation
arr = np.array([1, 2, 3, 4, 5])
print(arr)  # [1 2 3 4 5]
print(type(arr))  # <class 'numpy.ndarray'>
print(arr.dtype)  # int64 (or int32 on Windows)

NumPy infers the data type from input elements. Mixed numeric types promote to the most general type:

# Integer and float mix
arr = np.array([1, 2, 3.5, 4])
print(arr)  # [1.  2.  3.5 4. ]
print(arr.dtype)  # float64

# All integers stay integer
arr = np.array([1, 2, 3, 4])
print(arr.dtype)  # int64

Explicit Data Type Specification

Always specify dtype for production code. Type inference causes subtle bugs when data changes:

# Explicit integer type
arr = np.array([1, 2, 3, 4], dtype=np.int32)
print(arr.dtype)  # int32

# Explicit float type
arr = np.array([1, 2, 3, 4], dtype=np.float64)
print(arr)  # [1. 2. 3. 4.]

# Unsigned integers for non-negative data
arr = np.array([1, 2, 3, 4], dtype=np.uint8)
print(arr.dtype)  # uint8

# Complex numbers
arr = np.array([1, 2, 3], dtype=np.complex128)
print(arr)  # [1.+0.j 2.+0.j 3.+0.j]

Type specification prevents memory waste. An array of small integers doesn’t need 64 bits per element:

# Memory comparison
arr_default = np.array(range(1000))
arr_optimized = np.array(range(1000), dtype=np.int16)

print(arr_default.nbytes)  # 8000 bytes
print(arr_optimized.nbytes)  # 2000 bytes

Multi-Dimensional Arrays

Nested sequences create multi-dimensional arrays. Each nesting level adds a dimension:

# 2D array (matrix)
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr_2d)
# [[1 2 3]
#  [4 5 6]]
print(arr_2d.shape)  # (2, 3)
print(arr_2d.ndim)  # 2

# 3D array
arr_3d = np.array([
    [[1, 2], [3, 4]],
    [[5, 6], [7, 8]]
])
print(arr_3d.shape)  # (2, 2, 2)
print(arr_3d.ndim)  # 3

Inconsistent dimensions create object arrays—a performance disaster:

# Jagged array - BAD
jagged = np.array([[1, 2, 3], [4, 5]])
print(jagged.dtype)  # object
print(jagged.shape)  # (2,)
print(jagged[0])  # [1, 2, 3] - still a list!

# Proper rectangular array - GOOD
proper = np.array([[1, 2, 3], [4, 5, 6]])
print(proper.dtype)  # int64
print(proper.shape)  # (2, 3)

Array from Tuples and Other Sequences

np.array() accepts any sequence type, not just lists:

# From tuple
arr = np.array((1, 2, 3, 4))
print(arr)  # [1 2 3 4]

# From range
arr = np.array(range(5))
print(arr)  # [0 1 2 3 4]

# From generator (materializes entire sequence)
arr = np.array(x**2 for x in range(5))
print(arr)  # [ 0  1  4  9 16]

Copy vs View Behavior

np.array() always copies data. Use np.asarray() to avoid unnecessary copies:

# np.array always copies
original = np.array([1, 2, 3, 4])
copy = np.array(original)
copy[0] = 999
print(original[0])  # 1 (unchanged)

# np.asarray returns view when possible
original = np.array([1, 2, 3, 4])
view = np.asarray(original)
view[0] = 999
print(original[0])  # 999 (changed!)

# np.asarray copies when converting from list
list_data = [1, 2, 3, 4]
arr = np.asarray(list_data)
list_data[0] = 999
print(arr[0])  # 1 (independent)

String and Boolean Arrays

String arrays have fixed-width dtype. Longer strings get truncated:

# String array
arr = np.array(['apple', 'banana', 'cherry'])
print(arr.dtype)  # <U6 (Unicode, 6 characters max)

# Truncation happens silently
arr = np.array(['short', 'this is a very long string'])
print(arr.dtype)  # <U26
arr2 = np.array(['short', 'this is a very long string'], dtype='U5')
print(arr2)  # ['short' 'this ']

# Boolean arrays
arr = np.array([True, False, True, True])
print(arr.dtype)  # bool
print(arr.nbytes)  # 4 (1 byte per element)

Structured Arrays with Mixed Types

Structured arrays store heterogeneous data with named fields:

# Define structure
dt = np.dtype([('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])

# Create structured array
data = np.array([
    ('Alice', 25, 55.5),
    ('Bob', 30, 75.2),
    ('Charlie', 35, 68.8)
], dtype=dt)

print(data['name'])  # ['Alice' 'Bob' 'Charlie']
print(data['age'])  # [25 30 35]
print(data[0])  # ('Alice', 25, 55.5)

Common Pitfalls

Scalar Input

Passing a scalar creates a 0-dimensional array:

# 0-dimensional array
arr = np.array(42)
print(arr.shape)  # ()
print(arr.ndim)  # 0
print(arr[()])  # 42

# To create 1D array with single element
arr = np.array([42])
print(arr.shape)  # (1,)

ndmin Parameter

Force minimum dimensions with ndmin:

# Force 2D array from 1D input
arr = np.array([1, 2, 3], ndmin=2)
print(arr.shape)  # (1, 3)
print(arr)  # [[1 2 3]]

# Force 3D
arr = np.array([1, 2, 3], ndmin=3)
print(arr.shape)  # (1, 1, 3)

Copy Flag

The copy parameter is deprecated. Use np.asarray() for no-copy behavior:

# Modern approach
original = [1, 2, 3, 4]
arr = np.asarray(original)  # Copies from list

original_arr = np.array([1, 2, 3, 4])
view = np.asarray(original_arr)  # No copy

Performance Considerations

Pre-allocate arrays when size is known. Creating arrays in loops kills performance:

import time

# BAD: Growing array in loop
start = time.time()
arr = np.array([])
for i in range(10000):
    arr = np.append(arr, i)
print(f"Append time: {time.time() - start:.3f}s")

# GOOD: Pre-allocate
start = time.time()
arr = np.empty(10000, dtype=np.int64)
for i in range(10000):
    arr[i] = i
print(f"Pre-allocate time: {time.time() - start:.3f}s")

# BEST: Use built-in constructors
start = time.time()
arr = np.arange(10000)
print(f"Built-in time: {time.time() - start:.3f}s")

Use np.fromiter() for generators when you know the count and dtype:

# Efficient generator consumption
arr = np.fromiter((x**2 for x in range(1000)), dtype=np.int64, count=1000)
print(arr.shape)  # (1000,)

Verification and Validation

Always validate array properties after creation:

def create_validated_array(data, expected_shape, expected_dtype):
    arr = np.array(data, dtype=expected_dtype)
    
    if arr.shape != expected_shape:
        raise ValueError(f"Expected shape {expected_shape}, got {arr.shape}")
    
    if arr.dtype != expected_dtype:
        raise ValueError(f"Expected dtype {expected_dtype}, got {arr.dtype}")
    
    return arr

# Usage
arr = create_validated_array(
    [[1, 2, 3], [4, 5, 6]], 
    expected_shape=(2, 3), 
    expected_dtype=np.int64
)

The np.array() function is the foundation of NumPy operations. Master dtype specification, understand copy semantics, and avoid object arrays for production code. Pre-allocation beats dynamic growth every time.