NumPy - Stack Arrays (np.vstack, np.hstack, np.dstack)

Key Insights

• NumPy provides three primary stacking functions—vstack, hstack, and dstack—that concatenate arrays along different axes, with vstack stacking vertically (rows), hstack horizontally (columns), and dstack depth-wise (3rd dimension) • Understanding array dimensions and shapes is critical: vstack requires matching column counts, hstack needs matching row counts, and dstack demands matching row and column dimensions • Modern NumPy offers np.stack() and np.concatenate() as more flexible alternatives that explicitly specify axis parameters, providing better control and clearer intent in production code

Understanding Array Stacking Fundamentals

Array stacking combines multiple arrays into a single array. NumPy’s stacking functions differ in which axis they operate on. Before diving into specific functions, you need to understand how NumPy represents dimensions.

A 1D array has shape (n,), a 2D array has shape (rows, columns), and a 3D array has shape (depth, rows, columns). Stacking operations add or expand along these dimensions.

import numpy as np

# Create sample arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(f"Array a shape: {a.shape}")  # (3,)
print(f"Array b shape: {b.shape}")  # (3,)

Vertical Stacking with np.vstack

np.vstack() stacks arrays vertically, meaning it places arrays on top of each other along axis 0 (rows). This is equivalent to concatenation along the first axis after 1D arrays are reshaped to 2D.

import numpy as np

# Stack 1D arrays vertically
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = np.vstack((a, b))

print(result)
print(f"Shape: {result.shape}")
# Output:
# [[1 2 3]
#  [4 5 6]]
# Shape: (2, 3)

For 2D arrays, vstack requires all arrays to have the same number of columns:

# Stack 2D arrays vertically
matrix1 = np.array([[1, 2, 3],
                    [4, 5, 6]])
matrix2 = np.array([[7, 8, 9]])

result = np.vstack((matrix1, matrix2))
print(result)
print(f"Shape: {result.shape}")
# Output:
# [[1 2 3]
#  [4 5 6]
#  [7 8 9]]
# Shape: (3, 3)

Common use case—adding new observations to a dataset:

# Simulating adding new data rows
existing_data = np.array([[100, 85, 92],
                          [88, 79, 95]])
new_measurement = np.array([[91, 88, 87]])

updated_data = np.vstack((existing_data, new_measurement))
print(f"Updated dataset shape: {updated_data.shape}")  # (3, 3)

Horizontal Stacking with np.hstack

np.hstack() stacks arrays horizontally, placing them side by side along axis 1 (columns). For 1D arrays, this simply concatenates them into a longer 1D array.

import numpy as np

# Stack 1D arrays horizontally
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = np.hstack((a, b))

print(result)
print(f"Shape: {result.shape}")
# Output: [1 2 3 4 5 6]
# Shape: (6,)

For 2D arrays, hstack requires matching row counts:

# Stack 2D arrays horizontally
matrix1 = np.array([[1, 2],
                    [3, 4],
                    [5, 6]])
matrix2 = np.array([[7],
                    [8],
                    [9]])

result = np.hstack((matrix1, matrix2))
print(result)
print(f"Shape: {result.shape}")
# Output:
# [[1 2 7]
#  [3 4 8]
#  [5 6 9]]
# Shape: (3, 3)

Practical example—adding feature columns to a dataset:

# Adding a calculated feature column
features = np.array([[100, 85],
                     [88, 79],
                     [91, 88]])

# Calculate mean as new feature
mean_feature = np.mean(features, axis=1, keepdims=True)
enhanced_features = np.hstack((features, mean_feature))

print(enhanced_features)
# Output:
# [[100.   85.   92.5]
#  [ 88.   79.   83.5]
#  [ 91.   88.   89.5]]

Depth Stacking with np.dstack

np.dstack() stacks arrays along the third axis (depth), creating or extending the depth dimension. This is particularly useful for image processing and creating 3D datasets.

import numpy as np

# Stack 1D arrays depth-wise
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = np.dstack((a, b))

print(result)
print(f"Shape: {result.shape}")
# Output: [[[1 4]
#           [2 5]
#           [3 6]]]
# Shape: (1, 3, 2)

For 2D arrays, dstack creates layers:

# Stack 2D arrays depth-wise (like RGB channels)
red_channel = np.array([[255, 0],
                        [0, 255]])
green_channel = np.array([[0, 255],
                          [255, 0]])
blue_channel = np.array([[0, 0],
                         [0, 255]])

rgb_image = np.dstack((red_channel, green_channel, blue_channel))
print(f"RGB image shape: {rgb_image.shape}")  # (2, 2, 3)
print(rgb_image[0, 0])  # First pixel: [255 0 0] (red)

Real-world example—combining time-series data:

# Temperature readings from different sensors
sensor1 = np.array([[20.1, 20.3, 20.5],
                    [21.0, 21.2, 21.4]])
sensor2 = np.array([[19.8, 20.1, 20.3],
                    [20.7, 20.9, 21.1]])
sensor3 = np.array([[20.0, 20.2, 20.4],
                    [20.9, 21.1, 21.3]])

# Stack into 3D array: (time_steps, measurements, sensors)
all_sensors = np.dstack((sensor1, sensor2, sensor3))
print(f"Combined sensor data shape: {all_sensors.shape}")  # (2, 3, 3)

# Access all sensor readings for first time step, first measurement
print(f"All sensors at t=0, m=0: {all_sensors[0, 0]}")

Modern Alternatives: np.stack and np.concatenate

While vstack, hstack, and dstack are convenient, np.stack() and np.concatenate() offer more explicit control:

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# np.stack creates a new axis
stacked_axis0 = np.stack((a, b), axis=0)  # Same as vstack for 1D
stacked_axis1 = np.stack((a, b), axis=1)

print("Stack axis=0:")
print(stacked_axis0)
print(f"Shape: {stacked_axis0.shape}\n")

print("Stack axis=1:")
print(stacked_axis1)
print(f"Shape: {stacked_axis1.shape}")
# Output:
# Stack axis=0:
# [[1 2 3]
#  [4 5 6]]
# Shape: (2, 3)
#
# Stack axis=1:
# [[1 4]
#  [2 5]
#  [3 6]]
# Shape: (3, 2)

Using np.concatenate() for explicit axis control:

# Concatenate requires matching dimensions except along concat axis
matrix1 = np.array([[1, 2, 3],
                    [4, 5, 6]])
matrix2 = np.array([[7, 8, 9]])

# Concatenate along axis 0 (equivalent to vstack)
result_axis0 = np.concatenate((matrix1, matrix2), axis=0)
print(f"Concatenate axis=0 shape: {result_axis0.shape}")  # (3, 3)

# For axis 1 concatenation (like hstack)
matrix3 = np.array([[10],
                    [11]])
result_axis1 = np.concatenate((matrix1, matrix3), axis=1)
print(f"Concatenate axis=1 shape: {result_axis1.shape}")  # (2, 4)

Performance Considerations and Best Practices

Pre-allocate arrays when possible instead of repeatedly stacking:

import numpy as np
import time

# Inefficient: repeated stacking
start = time.time()
result = np.array([])
for i in range(1000):
    result = np.hstack((result, np.array([i])))
slow_time = time.time() - start

# Efficient: pre-allocate
start = time.time()
result = np.empty(1000)
for i in range(1000):
    result[i] = i
fast_time = time.time() - start

print(f"Repeated stacking: {slow_time:.4f}s")
print(f"Pre-allocation: {fast_time:.4f}s")
print(f"Speedup: {slow_time/fast_time:.1f}x")

When stacking arrays of different types, NumPy upcasts to the most general type:

int_array = np.array([1, 2, 3])
float_array = np.array([4.0, 5.0, 6.0])

result = np.vstack((int_array, float_array))
print(f"Result dtype: {result.dtype}")  # float64

For large-scale data pipelines, validate shapes before stacking to catch errors early:

def safe_vstack(arrays):
    """Safely stack arrays with validation."""
    if not arrays:
        raise ValueError("Empty array list")
    
    n_cols = arrays[0].shape[-1] if arrays[0].ndim > 1 else arrays[0].shape[0]
    
    for i, arr in enumerate(arrays[1:], 1):
        arr_cols = arr.shape[-1] if arr.ndim > 1 else arr.shape[0]
        if arr_cols != n_cols:
            raise ValueError(f"Array {i} has {arr_cols} columns, expected {n_cols}")
    
    return np.vstack(arrays)

# Usage
try:
    result = safe_vstack([np.array([1, 2, 3]), np.array([4, 5])])
except ValueError as e:
    print(f"Error caught: {e}")

Choose the right function based on your intent: use vstack/hstack/dstack for quick prototyping and clear semantic meaning, but prefer np.concatenate() or np.stack() with explicit axis parameters in production code for better maintainability and debugging.