NumPy - Split Array (np.split, np.hsplit, np.vsplit)
• NumPy provides three primary splitting functions: `np.split()` for arbitrary axis splitting, `np.hsplit()` for horizontal (column-wise) splits, and `np.vsplit()` for vertical (row-wise) splits
Key Insights
• NumPy provides three primary splitting functions: np.split() for arbitrary axis splitting, np.hsplit() for horizontal (column-wise) splits, and np.vsplit() for vertical (row-wise) splits
• Split operations create views of the original array when possible, making them memory-efficient, but modifications to split arrays affect the original data
• Understanding axis parameters is critical: axis=0 operates on rows, axis=1 on columns, and higher dimensions follow the same pattern for multi-dimensional arrays
Understanding np.split() Fundamentals
The np.split() function divides an array into multiple sub-arrays along a specified axis. It accepts three key parameters: the input array, indices or number of sections, and the axis along which to split.
import numpy as np
# Basic split into equal sections
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
result = np.split(arr, 3)
print(f"Original: {arr}")
print(f"Split into 3: {result}")
# Output: [array([1, 2, 3]), array([4, 5, 6]), array([7, 8, 9])]
# Split at specific indices
arr2 = np.array([10, 20, 30, 40, 50, 60])
result2 = np.split(arr2, [2, 4])
print(f"Split at indices [2, 4]: {result2}")
# Output: [array([10, 20]), array([30, 40]), array([50, 60])]
When splitting into N sections, the array length must be evenly divisible by N. Otherwise, NumPy raises a ValueError. For uneven splits, use index-based splitting or np.array_split().
# This works - 12 elements divisible by 3
arr3 = np.arange(12)
result3 = np.split(arr3, 3)
print(f"Even split: {[r.tolist() for r in result3]}")
# This raises ValueError - 10 not divisible by 3
try:
arr4 = np.arange(10)
result4 = np.split(arr4, 3)
except ValueError as e:
print(f"Error: {e}")
# Use array_split for uneven divisions
result5 = np.array_split(arr4, 3)
print(f"Uneven split with array_split: {[r.tolist() for r in result5]}")
# Output: [[0, 1, 2, 3], [4, 5, 6], [7, 8, 9]]
Splitting Multi-Dimensional Arrays
For 2D and higher-dimensional arrays, the axis parameter determines the split direction. Axis 0 splits along rows (vertical split), while axis 1 splits along columns (horizontal split).
# 2D array splitting
matrix = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]])
# Split along axis 0 (rows) - creates horizontal bands
row_splits = np.split(matrix, 2, axis=0)
print("Split along rows (axis=0):")
for i, split in enumerate(row_splits):
print(f"Section {i}:\n{split}\n")
# Split along axis 1 (columns) - creates vertical bands
col_splits = np.split(matrix, 2, axis=1)
print("Split along columns (axis=1):")
for i, split in enumerate(col_splits):
print(f"Section {i}:\n{split}\n")
For 3D arrays, axis 2 splits along the depth dimension:
# 3D array example
cube = np.arange(24).reshape(2, 3, 4)
print(f"Original shape: {cube.shape}")
# Split along different axes
depth_split = np.split(cube, 2, axis=0) # Split depth layers
print(f"Split on axis 0: {len(depth_split)} arrays of shape {depth_split[0].shape}")
row_split = np.split(cube, 3, axis=1) # Split rows
print(f"Split on axis 1: {len(row_split)} arrays of shape {row_split[0].shape}")
col_split = np.split(cube, 2, axis=2) # Split columns
print(f"Split on axis 2: {len(col_split)} arrays of shape {col_split[0].shape}")
Horizontal Splitting with np.hsplit()
np.hsplit() is equivalent to np.split() with axis=1 for 2D arrays and axis=0 for 1D arrays. It splits arrays horizontally (column-wise).
# 1D array - splits along axis 0
arr_1d = np.arange(12)
h_split_1d = np.hsplit(arr_1d, 4)
print(f"1D hsplit: {[s.tolist() for s in h_split_1d]}")
# 2D array - splits along axis 1 (columns)
matrix_2d = np.arange(20).reshape(4, 5)
print(f"Original matrix:\n{matrix_2d}\n")
# Split into columns at specific indices
h_split_indices = np.hsplit(matrix_2d, [2, 4])
print("Split at column indices [2, 4]:")
for i, section in enumerate(h_split_indices):
print(f"Section {i} shape {section.shape}:\n{section}\n")
# Practical example: separating features from labels
data = np.array([[1, 2, 3, 100],
[4, 5, 6, 200],
[7, 8, 9, 300]])
features, labels = np.hsplit(data, [3])
print(f"Features:\n{features}")
print(f"Labels:\n{labels}")
Vertical Splitting with np.vsplit()
np.vsplit() splits arrays vertically (row-wise), equivalent to np.split() with axis=0. It requires at least a 2D array.
# Create a dataset
dataset = np.arange(30).reshape(6, 5)
print(f"Dataset:\n{dataset}\n")
# Split into equal sections
v_split_equal = np.vsplit(dataset, 3)
print("Split into 3 equal sections:")
for i, section in enumerate(v_split_equal):
print(f"Section {i}:\n{section}\n")
# Split at specific row indices
v_split_indices = np.vsplit(dataset, [2, 5])
print("Split at row indices [2, 5]:")
for i, section in enumerate(v_split_indices):
print(f"Section {i} (rows {section.shape[0]}):\n{section}\n")
# Practical example: train/validation/test split
data_rows = np.random.rand(100, 10)
train, val, test = np.vsplit(data_rows, [70, 85])
print(f"Train set: {train.shape}")
print(f"Validation set: {val.shape}")
print(f"Test set: {test.shape}")
Memory Considerations and Views
Split operations typically return views of the original array, not copies. This has important implications for memory usage and data modification.
# Demonstrate view behavior
original = np.arange(12).reshape(3, 4)
splits = np.hsplit(original, 2)
print(f"Original:\n{original}\n")
# Modify a split section
splits[0][0, 0] = 999
print(f"After modifying split[0]:")
print(f"Split[0]:\n{splits[0]}")
print(f"Original:\n{original}\n") # Original is also modified
# Check if it's a view
print(f"Is split a view? {splits[0].base is original}")
# Create independent copies if needed
independent_splits = [s.copy() for s in np.hsplit(original, 2)]
independent_splits[0][0, 0] = 777
print(f"After modifying independent copy:")
print(f"Copy:\n{independent_splits[0]}")
print(f"Original unchanged:\n{original}")
Combining Splits for Complex Partitioning
Combine multiple split operations to create complex partitioning schemes for data processing pipelines.
# Create a sample image-like array
image_data = np.arange(256).reshape(16, 16)
# Split into quadrants
top, bottom = np.vsplit(image_data, 2)
top_left, top_right = np.hsplit(top, 2)
bottom_left, bottom_right = np.hsplit(bottom, 2)
print(f"Top-left quadrant:\n{top_left}\n")
print(f"Quadrant shapes: {top_left.shape}")
# More complex: split into a grid
rows = np.vsplit(image_data, 4)
grid = [np.hsplit(row, 4) for row in rows]
print(f"Created {len(grid)}x{len(grid[0])} grid")
print(f"Each cell shape: {grid[0][0].shape}")
print(f"Grid[2][3]:\n{grid[2][3]}")
# Practical batch processing
batch_data = np.random.rand(128, 784) # 128 samples, 784 features
batch_size = 32
batches = np.vsplit(batch_data, batch_data.shape[0] // batch_size)
for i, batch in enumerate(batches):
print(f"Processing batch {i+1}: {batch.shape}")
Error Handling and Edge Cases
Understanding common errors prevents runtime issues in production code.
# Handle uneven splits gracefully
def safe_split(array, sections, axis=0):
"""Split array, handling uneven divisions"""
try:
return np.split(array, sections, axis=axis)
except ValueError:
return np.array_split(array, sections, axis=axis)
arr = np.arange(10)
result = safe_split(arr, 3)
print(f"Safe split result: {[r.tolist() for r in result]}")
# Validate dimensions before splitting
def validate_and_split(array, sections, axis=0):
"""Validate array dimensions before splitting"""
if axis >= array.ndim:
raise ValueError(f"Axis {axis} out of bounds for {array.ndim}D array")
if array.shape[axis] < sections:
raise ValueError(f"Cannot split {array.shape[axis]} elements into {sections} sections")
return safe_split(array, sections, axis)
# Test validation
try:
validate_and_split(np.arange(5), 10)
except ValueError as e:
print(f"Validation caught: {e}")
# Empty array handling
empty_arr = np.array([])
try:
np.split(empty_arr, 2)
except ValueError as e:
print(f"Empty array error: {e}")
NumPy’s split functions provide essential tools for array partitioning in data processing workflows. Choose np.split() for maximum flexibility, np.hsplit() for column-wise operations, and np.vsplit() for row-wise splits. Always consider whether you need views or copies, and handle edge cases appropriately for robust production code.