NumPy - Split Array (np.split, np.hsplit, np.vsplit)

• NumPy provides three primary splitting functions: `np.split()` for arbitrary axis splitting, `np.hsplit()` for horizontal (column-wise) splits, and `np.vsplit()` for vertical (row-wise) splits

Key Insights

• NumPy provides three primary splitting functions: np.split() for arbitrary axis splitting, np.hsplit() for horizontal (column-wise) splits, and np.vsplit() for vertical (row-wise) splits • Split operations create views of the original array when possible, making them memory-efficient, but modifications to split arrays affect the original data • Understanding axis parameters is critical: axis=0 operates on rows, axis=1 on columns, and higher dimensions follow the same pattern for multi-dimensional arrays

Understanding np.split() Fundamentals

The np.split() function divides an array into multiple sub-arrays along a specified axis. It accepts three key parameters: the input array, indices or number of sections, and the axis along which to split.

import numpy as np

# Basic split into equal sections
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
result = np.split(arr, 3)
print(f"Original: {arr}")
print(f"Split into 3: {result}")
# Output: [array([1, 2, 3]), array([4, 5, 6]), array([7, 8, 9])]

# Split at specific indices
arr2 = np.array([10, 20, 30, 40, 50, 60])
result2 = np.split(arr2, [2, 4])
print(f"Split at indices [2, 4]: {result2}")
# Output: [array([10, 20]), array([30, 40]), array([50, 60])]

When splitting into N sections, the array length must be evenly divisible by N. Otherwise, NumPy raises a ValueError. For uneven splits, use index-based splitting or np.array_split().

# This works - 12 elements divisible by 3
arr3 = np.arange(12)
result3 = np.split(arr3, 3)
print(f"Even split: {[r.tolist() for r in result3]}")

# This raises ValueError - 10 not divisible by 3
try:
    arr4 = np.arange(10)
    result4 = np.split(arr4, 3)
except ValueError as e:
    print(f"Error: {e}")

# Use array_split for uneven divisions
result5 = np.array_split(arr4, 3)
print(f"Uneven split with array_split: {[r.tolist() for r in result5]}")
# Output: [[0, 1, 2, 3], [4, 5, 6], [7, 8, 9]]

Splitting Multi-Dimensional Arrays

For 2D and higher-dimensional arrays, the axis parameter determines the split direction. Axis 0 splits along rows (vertical split), while axis 1 splits along columns (horizontal split).

# 2D array splitting
matrix = np.array([[1, 2, 3, 4],
                   [5, 6, 7, 8],
                   [9, 10, 11, 12],
                   [13, 14, 15, 16]])

# Split along axis 0 (rows) - creates horizontal bands
row_splits = np.split(matrix, 2, axis=0)
print("Split along rows (axis=0):")
for i, split in enumerate(row_splits):
    print(f"Section {i}:\n{split}\n")

# Split along axis 1 (columns) - creates vertical bands
col_splits = np.split(matrix, 2, axis=1)
print("Split along columns (axis=1):")
for i, split in enumerate(col_splits):
    print(f"Section {i}:\n{split}\n")

For 3D arrays, axis 2 splits along the depth dimension:

# 3D array example
cube = np.arange(24).reshape(2, 3, 4)
print(f"Original shape: {cube.shape}")

# Split along different axes
depth_split = np.split(cube, 2, axis=0)  # Split depth layers
print(f"Split on axis 0: {len(depth_split)} arrays of shape {depth_split[0].shape}")

row_split = np.split(cube, 3, axis=1)  # Split rows
print(f"Split on axis 1: {len(row_split)} arrays of shape {row_split[0].shape}")

col_split = np.split(cube, 2, axis=2)  # Split columns
print(f"Split on axis 2: {len(col_split)} arrays of shape {col_split[0].shape}")

Horizontal Splitting with np.hsplit()

np.hsplit() is equivalent to np.split() with axis=1 for 2D arrays and axis=0 for 1D arrays. It splits arrays horizontally (column-wise).

# 1D array - splits along axis 0
arr_1d = np.arange(12)
h_split_1d = np.hsplit(arr_1d, 4)
print(f"1D hsplit: {[s.tolist() for s in h_split_1d]}")

# 2D array - splits along axis 1 (columns)
matrix_2d = np.arange(20).reshape(4, 5)
print(f"Original matrix:\n{matrix_2d}\n")

# Split into columns at specific indices
h_split_indices = np.hsplit(matrix_2d, [2, 4])
print("Split at column indices [2, 4]:")
for i, section in enumerate(h_split_indices):
    print(f"Section {i} shape {section.shape}:\n{section}\n")

# Practical example: separating features from labels
data = np.array([[1, 2, 3, 100],
                 [4, 5, 6, 200],
                 [7, 8, 9, 300]])

features, labels = np.hsplit(data, [3])
print(f"Features:\n{features}")
print(f"Labels:\n{labels}")

Vertical Splitting with np.vsplit()

np.vsplit() splits arrays vertically (row-wise), equivalent to np.split() with axis=0. It requires at least a 2D array.

# Create a dataset
dataset = np.arange(30).reshape(6, 5)
print(f"Dataset:\n{dataset}\n")

# Split into equal sections
v_split_equal = np.vsplit(dataset, 3)
print("Split into 3 equal sections:")
for i, section in enumerate(v_split_equal):
    print(f"Section {i}:\n{section}\n")

# Split at specific row indices
v_split_indices = np.vsplit(dataset, [2, 5])
print("Split at row indices [2, 5]:")
for i, section in enumerate(v_split_indices):
    print(f"Section {i} (rows {section.shape[0]}):\n{section}\n")

# Practical example: train/validation/test split
data_rows = np.random.rand(100, 10)
train, val, test = np.vsplit(data_rows, [70, 85])
print(f"Train set: {train.shape}")
print(f"Validation set: {val.shape}")
print(f"Test set: {test.shape}")

Memory Considerations and Views

Split operations typically return views of the original array, not copies. This has important implications for memory usage and data modification.

# Demonstrate view behavior
original = np.arange(12).reshape(3, 4)
splits = np.hsplit(original, 2)

print(f"Original:\n{original}\n")

# Modify a split section
splits[0][0, 0] = 999
print(f"After modifying split[0]:")
print(f"Split[0]:\n{splits[0]}")
print(f"Original:\n{original}\n")  # Original is also modified

# Check if it's a view
print(f"Is split a view? {splits[0].base is original}")

# Create independent copies if needed
independent_splits = [s.copy() for s in np.hsplit(original, 2)]
independent_splits[0][0, 0] = 777
print(f"After modifying independent copy:")
print(f"Copy:\n{independent_splits[0]}")
print(f"Original unchanged:\n{original}")

Combining Splits for Complex Partitioning

Combine multiple split operations to create complex partitioning schemes for data processing pipelines.

# Create a sample image-like array
image_data = np.arange(256).reshape(16, 16)

# Split into quadrants
top, bottom = np.vsplit(image_data, 2)
top_left, top_right = np.hsplit(top, 2)
bottom_left, bottom_right = np.hsplit(bottom, 2)

print(f"Top-left quadrant:\n{top_left}\n")
print(f"Quadrant shapes: {top_left.shape}")

# More complex: split into a grid
rows = np.vsplit(image_data, 4)
grid = [np.hsplit(row, 4) for row in rows]

print(f"Created {len(grid)}x{len(grid[0])} grid")
print(f"Each cell shape: {grid[0][0].shape}")
print(f"Grid[2][3]:\n{grid[2][3]}")

# Practical batch processing
batch_data = np.random.rand(128, 784)  # 128 samples, 784 features
batch_size = 32
batches = np.vsplit(batch_data, batch_data.shape[0] // batch_size)

for i, batch in enumerate(batches):
    print(f"Processing batch {i+1}: {batch.shape}")

Error Handling and Edge Cases

Understanding common errors prevents runtime issues in production code.

# Handle uneven splits gracefully
def safe_split(array, sections, axis=0):
    """Split array, handling uneven divisions"""
    try:
        return np.split(array, sections, axis=axis)
    except ValueError:
        return np.array_split(array, sections, axis=axis)

arr = np.arange(10)
result = safe_split(arr, 3)
print(f"Safe split result: {[r.tolist() for r in result]}")

# Validate dimensions before splitting
def validate_and_split(array, sections, axis=0):
    """Validate array dimensions before splitting"""
    if axis >= array.ndim:
        raise ValueError(f"Axis {axis} out of bounds for {array.ndim}D array")
    
    if array.shape[axis] < sections:
        raise ValueError(f"Cannot split {array.shape[axis]} elements into {sections} sections")
    
    return safe_split(array, sections, axis)

# Test validation
try:
    validate_and_split(np.arange(5), 10)
except ValueError as e:
    print(f"Validation caught: {e}")

# Empty array handling
empty_arr = np.array([])
try:
    np.split(empty_arr, 2)
except ValueError as e:
    print(f"Empty array error: {e}")

NumPy’s split functions provide essential tools for array partitioning in data processing workflows. Choose np.split() for maximum flexibility, np.hsplit() for column-wise operations, and np.vsplit() for row-wise splits. Always consider whether you need views or copies, and handle edge cases appropriately for robust production code.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.