How to Concatenate Arrays in NumPy

Array concatenation is one of the most frequent operations in data manipulation. Whether you're merging datasets, combining feature matrices, or assembling image channels, you'll reach for NumPy's...

Key Insights

  • np.concatenate() is the foundational function for joining arrays, while vstack(), hstack(), and dstack() provide semantic shortcuts for specific axis operations
  • Understanding the difference between hstack() and column_stack() for 1D arrays prevents subtle bugs—column_stack() treats 1D arrays as columns, while hstack() joins them end-to-end
  • Avoid concatenating arrays in loops; pre-allocate with np.empty() or collect in a list first, then concatenate once for dramatically better performance

Introduction

Array concatenation is one of the most frequent operations in data manipulation. Whether you’re merging datasets, combining feature matrices, or assembling image channels, you’ll reach for NumPy’s concatenation functions regularly.

NumPy provides several functions for this purpose, each optimized for different scenarios. Choosing the right one improves both code readability and performance. This guide covers all the major concatenation methods, explains when to use each, and highlights the performance pitfalls that catch many developers off guard.

Using np.concatenate() — The Core Function

np.concatenate() is the fundamental array-joining function in NumPy. All other stacking functions are essentially convenience wrappers around it. Understanding concatenate() thoroughly gives you the foundation to use everything else effectively.

The function signature is straightforward:

np.concatenate((a1, a2, ...), axis=0, out=None, dtype=None)

The axis parameter determines along which dimension the arrays are joined. For 1D arrays, there’s only one axis (0). For 2D arrays, axis 0 means stacking rows, and axis 1 means stacking columns.

import numpy as np

# Concatenating 1D arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

result = np.concatenate((a, b))
print(result)
# Output: [1 2 3 4 5 6]

# Concatenating 2D arrays along axis 0 (rows)
matrix_a = np.array([[1, 2], [3, 4]])
matrix_b = np.array([[5, 6], [7, 8]])

row_concat = np.concatenate((matrix_a, matrix_b), axis=0)
print(row_concat)
# Output:
# [[1 2]
#  [3 4]
#  [5 6]
#  [7 8]]

# Concatenating 2D arrays along axis 1 (columns)
col_concat = np.concatenate((matrix_a, matrix_b), axis=1)
print(col_concat)
# Output:
# [[1 2 5 6]
#  [3 4 7 8]]

A critical requirement: arrays must have the same shape along all axes except the concatenation axis. Attempting to concatenate arrays with mismatched dimensions raises a ValueError.

# This fails - different number of columns
a = np.array([[1, 2, 3]])
b = np.array([[4, 5]])

# np.concatenate((a, b), axis=0)  # ValueError!

Vertical Stacking with np.vstack() and np.row_stack()

np.vstack() stacks arrays vertically (along axis 0). It’s equivalent to np.concatenate() with axis=0, but with one key advantage: it handles 1D arrays more intuitively by first reshaping them to 2D.

import numpy as np

# vstack with 1D arrays - converts them to row vectors first
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

stacked = np.vstack((a, b))
print(stacked)
# Output:
# [[1 2 3]
#  [4 5 6]]

# Compare with concatenate - this would fail without reshaping
# np.concatenate((a, b), axis=0)  # Just gives [1 2 3 4 5 6], not 2D

# Stacking multiple 2D arrays
matrix_a = np.array([[1, 2], [3, 4]])
matrix_b = np.array([[5, 6]])
matrix_c = np.array([[7, 8], [9, 10]])

combined = np.vstack((matrix_a, matrix_b, matrix_c))
print(combined)
# Output:
# [[ 1  2]
#  [ 3  4]
#  [ 5  6]
#  [ 7  8]
#  [ 9 10]]

np.row_stack() is simply an alias for np.vstack(). Use whichever name makes your code more readable in context.

Use vstack() when you’re thinking in terms of “adding more rows to a table” or when working with 1D arrays that represent row vectors.

Horizontal Stacking with np.hstack() and np.column_stack()

np.hstack() stacks arrays horizontally (along axis 1 for 2D arrays, axis 0 for 1D arrays). This is where things get interesting, because hstack() and column_stack() behave differently with 1D inputs.

import numpy as np

# hstack with 1D arrays - joins them end-to-end
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

h_result = np.hstack((a, b))
print(h_result)
# Output: [1 2 3 4 5 6]  - Still 1D!

# column_stack with 1D arrays - treats each as a column
c_result = np.column_stack((a, b))
print(c_result)
# Output:
# [[1 4]
#  [2 5]
#  [3 6]]

This distinction matters enormously. If you have feature vectors and want to combine them into a feature matrix where each original array becomes a column, use column_stack(). If you want to extend a 1D array with more elements, use hstack().

# For 2D arrays, hstack and column_stack behave identically
matrix_a = np.array([[1, 2], [3, 4]])
matrix_b = np.array([[5], [6]])

print(np.hstack((matrix_a, matrix_b)))
# Output:
# [[1 2 5]
#  [3 4 6]]

print(np.column_stack((matrix_a, matrix_b)))
# Output:
# [[1 2 5]
#  [3 4 6]]

My recommendation: prefer column_stack() when you conceptually want to add columns to a dataset, even if hstack() would work. The intent is clearer.

Depth Stacking with np.dstack()

np.dstack() stacks arrays along the third axis (depth, axis 2). This creates 3D arrays from 2D inputs, which is particularly useful for image processing where you might combine separate color channels.

import numpy as np

# Stacking 2D arrays into a 3D array
red_channel = np.array([[255, 0], [128, 64]])
green_channel = np.array([[0, 255], [64, 128]])
blue_channel = np.array([[0, 0], [192, 32]])

rgb_image = np.dstack((red_channel, green_channel, blue_channel))
print(rgb_image.shape)
# Output: (2, 2, 3)

print(rgb_image[0, 0])  # First pixel
# Output: [255   0   0]  - RGB values for top-left pixel

# Works with 1D arrays too - promotes to 3D
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

depth_stacked = np.dstack((a, b))
print(depth_stacked.shape)
# Output: (1, 3, 2)

While dstack() is less commonly used than its vertical and horizontal counterparts, it’s invaluable when working with multi-channel data like RGB images, time-series with multiple features, or any scenario requiring a third dimension.

Performance Considerations and Best Practices

Here’s where many developers hurt their application’s performance: concatenating arrays inside loops.

import numpy as np
import time

# BAD: Concatenating in a loop
def slow_concatenation(n):
    result = np.array([])
    for i in range(n):
        result = np.concatenate((result, np.array([i * 2])))
    return result

# GOOD: Pre-allocate with np.empty()
def fast_preallocated(n):
    result = np.empty(n, dtype=np.int64)
    for i in range(n):
        result[i] = i * 2
    return result

# GOOD: Collect in list, concatenate once
def fast_list_approach(n):
    chunks = []
    for i in range(n):
        chunks.append(np.array([i * 2]))
    return np.concatenate(chunks)

# Benchmark
n = 10000

start = time.time()
slow_concatenation(n)
print(f"Loop concatenation: {time.time() - start:.4f}s")

start = time.time()
fast_preallocated(n)
print(f"Pre-allocated: {time.time() - start:.4f}s")

start = time.time()
fast_list_approach(n)
print(f"List + single concat: {time.time() - start:.4f}s")

# Typical output:
# Loop concatenation: 0.1842s
# Pre-allocated: 0.0031s
# List + single concat: 0.0089s

The difference is dramatic. Loop concatenation is O(n²) because each concatenation creates a new array and copies all existing data. Pre-allocation is O(n).

Additional best practices:

  1. Use out parameter when possible: If you already have a destination array, pass it to avoid allocation overhead.

  2. Match dtypes: Concatenating arrays with different dtypes triggers type coercion and additional memory allocation.

  3. Consider np.block(): For complex arrangements of arrays into grids, np.block() can be more readable than nested stacking calls.

# np.block for grid arrangements
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
C = np.array([[9, 10], [11, 12]])
D = np.array([[13, 14], [15, 16]])

grid = np.block([[A, B],
                 [C, D]])
print(grid)
# Output:
# [[ 1  2  5  6]
#  [ 3  4  7  8]
#  [ 9 10 11 12]
#  [13 14 15 16]]

Quick Reference Summary

Function Axis 1D Array Behavior Primary Use Case
concatenate() Any (default 0) Joins end-to-end General-purpose, explicit axis control
vstack() / row_stack() 0 Converts to row vectors, stacks Adding rows to a matrix
hstack() 1 (0 for 1D) Joins end-to-end Extending arrays horizontally
column_stack() 1 Converts to column vectors, stacks Adding columns to a matrix
dstack() 2 Promotes to 3D Creating 3D arrays, image channels
block() Grid-based Follows grid structure Complex 2D arrangements

Choose concatenate() when you need explicit axis control or are working with higher-dimensional arrays. Use the specialized stacking functions when their names match your intent—they make code more self-documenting and handle edge cases with 1D arrays more gracefully.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.