NumPy - Complete Tutorial for Beginners

NumPy is the foundation of Python's scientific computing ecosystem. While Python lists are flexible, they're slow for numerical operations because they store pointers to objects scattered across...

Key Insights

  • NumPy provides array operations that are 10-100x faster than native Python lists through vectorization and contiguous memory allocation
  • Broadcasting rules allow arithmetic operations between arrays of different shapes without explicit loops, reducing code complexity and execution time
  • Universal functions (ufuncs) apply element-wise operations across entire arrays using optimized C code under the hood

Why NumPy Matters

NumPy is the foundation of Python’s scientific computing ecosystem. While Python lists are flexible, they’re slow for numerical operations because they store pointers to objects scattered across memory. NumPy arrays store homogeneous data in contiguous memory blocks, enabling CPU-level optimizations and vectorized operations.

import numpy as np
import time

# Performance comparison
size = 1000000

# Python list approach
python_list = list(range(size))
start = time.time()
result_list = [x * 2 for x in python_list]
python_time = time.time() - start

# NumPy approach
numpy_array = np.arange(size)
start = time.time()
result_array = numpy_array * 2
numpy_time = time.time() - start

print(f"Python list: {python_time:.4f}s")
print(f"NumPy array: {numpy_time:.4f}s")
print(f"Speedup: {python_time/numpy_time:.1f}x")

Creating Arrays

NumPy offers multiple methods for array creation, each optimized for different use cases.

# From Python sequences
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([[1, 2, 3], [4, 5, 6]])

# Specify data type explicitly
arr_float = np.array([1, 2, 3], dtype=np.float64)
arr_int = np.array([1.5, 2.7, 3.9], dtype=np.int32)  # Truncates decimals

# Built-in creation functions
zeros = np.zeros((3, 4))           # 3x4 array of zeros
ones = np.ones((2, 3, 4))          # 2x3x4 array of ones
empty = np.empty((2, 2))           # Uninitialized values (fast)
full = np.full((3, 3), 7)          # Fill with specific value

# Ranges and sequences
arange = np.arange(0, 10, 2)       # [0, 2, 4, 6, 8]
linspace = np.linspace(0, 1, 5)    # 5 evenly spaced values from 0 to 1

# Random arrays
random_uniform = np.random.random((3, 3))
random_normal = np.random.randn(3, 3)  # Standard normal distribution
random_int = np.random.randint(0, 100, (3, 3))

# Identity and diagonal matrices
identity = np.eye(4)
diagonal = np.diag([1, 2, 3, 4])

Array Attributes and Reshaping

Understanding array structure is critical for efficient operations.

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

print(arr.shape)      # (3, 4) - dimensions
print(arr.ndim)       # 2 - number of dimensions
print(arr.size)       # 12 - total elements
print(arr.dtype)      # dtype('int64') - data type
print(arr.itemsize)   # 8 - bytes per element
print(arr.nbytes)     # 96 - total bytes

# Reshaping (must preserve total size)
reshaped = arr.reshape(4, 3)
flattened = arr.flatten()          # Returns copy
raveled = arr.ravel()              # Returns view when possible

# Transpose
transposed = arr.T
transposed_explicit = np.transpose(arr)

# Adding/removing dimensions
expanded = arr[np.newaxis, :]      # Add dimension at start
squeezed = np.squeeze(expanded)    # Remove single-dimensional entries

Indexing and Slicing

NumPy extends Python’s slicing syntax with powerful multi-dimensional indexing.

arr = np.arange(24).reshape(4, 6)

# Basic indexing
element = arr[2, 3]               # Single element
row = arr[1]                      # Entire row
column = arr[:, 2]                # Entire column
subarray = arr[1:3, 2:5]         # 2D slice

# Boolean indexing
mask = arr > 10
filtered = arr[mask]              # All elements > 10
arr[mask] = 0                     # Set elements to 0

# Fancy indexing with arrays
rows = np.array([0, 2, 3])
cols = np.array([1, 3, 5])
selected = arr[rows, cols]        # Elements at (0,1), (2,3), (3,5)

# Where clause
indices = np.where(arr > 15)      # Returns tuple of arrays
values = arr[indices]

# Conditional replacement
result = np.where(arr > 10, arr, -1)  # Replace values <= 10 with -1

Broadcasting

Broadcasting enables operations between arrays of different shapes without copying data.

# Scalar broadcasting
arr = np.array([1, 2, 3, 4])
result = arr * 2                  # Scalar broadcasts to each element

# 1D to 2D broadcasting
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
vector = np.array([10, 20, 30])
broadcasted = matrix + vector     # Vector broadcasts to each row

# Column vector broadcasting
col_vector = np.array([[1], [2], [3]])
result = matrix + col_vector      # Broadcasts to each column

# Explicit broadcasting with newaxis
a = np.arange(3)                  # Shape: (3,)
b = np.arange(3)[:, np.newaxis]   # Shape: (3, 1)
outer_product = a * b             # Shape: (3, 3)

# Broadcasting rules example
x = np.ones((3, 4, 5))
y = np.ones((4, 5))
z = x + y                         # Works: (3,4,5) + (4,5) -> (3,4,5)

# This fails - incompatible shapes
try:
    bad = np.ones((3, 4)) + np.ones((3, 5))
except ValueError as e:
    print(f"Error: {e}")

Universal Functions (ufuncs)

Ufuncs perform element-wise operations using optimized C code.

arr = np.array([1, 4, 9, 16, 25])

# Mathematical operations
sqrt = np.sqrt(arr)
exp = np.exp(arr)
log = np.log(arr)
power = np.power(arr, 2)

# Trigonometric functions
angles = np.array([0, np.pi/4, np.pi/2])
sin_vals = np.sin(angles)
cos_vals = np.cos(angles)

# Binary ufuncs
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
added = np.add(a, b)
maximum = np.maximum(a, b)

# Aggregation functions
arr = np.random.randn(100)
mean = np.mean(arr)
std = np.std(arr)
sum_val = np.sum(arr)
min_val = np.min(arr)
max_val = np.max(arr)

# Axis-specific operations
matrix = np.random.randn(3, 4)
row_means = np.mean(matrix, axis=1)    # Mean of each row
col_sums = np.sum(matrix, axis=0)      # Sum of each column

# Cumulative operations
cumsum = np.cumsum(np.array([1, 2, 3, 4]))  # [1, 3, 6, 10]
cumprod = np.cumprod(np.array([1, 2, 3, 4])) # [1, 2, 6, 24]

Array Manipulation

Common operations for transforming array structure and content.

# Stacking arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

vstacked = np.vstack([a, b])      # Vertical stack (rows)
hstacked = np.hstack([a, b])      # Horizontal stack (columns)
dstacked = np.dstack([a, b])      # Depth stack

# Splitting arrays
arr = np.arange(12)
split = np.split(arr, 3)          # Split into 3 equal parts
vsplit = np.vsplit(np.arange(12).reshape(4, 3), 2)

# Repeating and tiling
repeated = np.repeat([1, 2, 3], 3)          # [1,1,1,2,2,2,3,3,3]
tiled = np.tile([1, 2, 3], 3)              # [1,2,3,1,2,3,1,2,3]

# Sorting
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6])
sorted_arr = np.sort(arr)
indices = np.argsort(arr)         # Indices that would sort array

# Unique values
unique_vals = np.unique(arr)
vals, counts = np.unique(arr, return_counts=True)

Linear Algebra Operations

NumPy provides comprehensive linear algebra functionality through numpy.linalg.

# Matrix multiplication
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

matmul = np.matmul(A, B)          # Matrix multiplication
matmul_op = A @ B                 # Same using @ operator
elementwise = A * B               # Element-wise multiplication

# Dot product
v1 = np.array([1, 2, 3])
v2 = np.array([4, 5, 6])
dot = np.dot(v1, v2)              # 32

# Linear algebra operations
det = np.linalg.det(A)            # Determinant
inv = np.linalg.inv(A)            # Inverse
eigenvals, eigenvecs = np.linalg.eig(A)

# Solving linear systems (Ax = b)
A = np.array([[3, 1], [1, 2]])
b = np.array([9, 8])
x = np.linalg.solve(A, b)         # Solution to system

# Matrix decompositions
U, s, Vt = np.linalg.svd(A)       # Singular Value Decomposition

Practical Example: Image Processing

NumPy excels at batch operations on structured data like images.

# Simulate a grayscale image (256x256 pixels, values 0-255)
image = np.random.randint(0, 256, (256, 256), dtype=np.uint8)

# Normalize to 0-1 range
normalized = image.astype(np.float32) / 255.0

# Apply brightness adjustment
brightened = np.clip(normalized + 0.2, 0, 1)

# Apply contrast (gamma correction)
gamma = 1.5
contrasted = np.power(normalized, gamma)

# Create a simple blur kernel (3x3 averaging)
kernel = np.ones((3, 3)) / 9

# Apply convolution-style operation (simplified)
padded = np.pad(normalized, 1, mode='edge')
blurred = np.zeros_like(normalized)

for i in range(normalized.shape[0]):
    for j in range(normalized.shape[1]):
        blurred[i, j] = np.sum(padded[i:i+3, j:j+3] * kernel)

# Threshold operation
threshold = 0.5
binary = (normalized > threshold).astype(np.uint8)

# Calculate histogram
histogram, bins = np.histogram(image, bins=256, range=(0, 256))

NumPy’s efficiency comes from understanding when operations create views versus copies, leveraging broadcasting to avoid loops, and using vectorized ufuncs instead of Python-level iteration. Master these concepts and you’ll write faster, cleaner numerical code.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.