NumPy - Concatenate Arrays (np.concatenate)

import numpy as np

Key Insights

  • np.concatenate() joins arrays along existing axes and requires all input arrays to have the same shape except in the concatenation dimension, while np.stack() creates a new axis
  • The axis parameter determines the concatenation direction: axis=0 stacks vertically (rows), axis=1 horizontally (columns), with default axis=0
  • For performance-critical applications, pre-allocating arrays is faster than repeated concatenation in loops, as each np.concatenate() call creates a new array in memory

Basic Concatenation Syntax

np.concatenate() takes a sequence of arrays and joins them along an existing axis. The fundamental requirement: all arrays must have identical shapes except along the concatenation axis.

import numpy as np

# 1D arrays - simple concatenation
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = np.concatenate([a, b])
print(result)  # [1 2 3 4 5 6]

# Multiple arrays at once
c = np.array([7, 8, 9])
result = np.concatenate([a, b, c])
print(result)  # [1 2 3 4 5 6 7 8 9]

The function accepts a tuple or list of arrays as its first argument. For single-dimensional arrays, the operation is straightforward—arrays are joined end-to-end.

Axis Parameter for Multi-Dimensional Arrays

The axis parameter controls which dimension to concatenate along. Understanding axis orientation is critical for correct array manipulation.

# 2D arrays - vertical stacking (axis=0, default)
x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6], [7, 8]])

vertical = np.concatenate([x, y], axis=0)
print(vertical)
# [[1 2]
#  [3 4]
#  [5 6]
#  [7 8]]

# Horizontal stacking (axis=1)
horizontal = np.concatenate([x, y], axis=1)
print(horizontal)
# [[1 2 5 6]
#  [3 4 7 8]]

# 3D arrays - concatenate along depth (axis=2)
a = np.array([[[1, 2]], [[3, 4]]])
b = np.array([[[5, 6]], [[7, 8]]])
depth_concat = np.concatenate([a, b], axis=2)
print(depth_concat.shape)  # (2, 1, 4)

Negative axis values work as expected: axis=-1 refers to the last axis, axis=-2 to the second-to-last, and so on.

Shape Compatibility Requirements

Arrays must have compatible shapes. The only dimension that can differ is the concatenation axis.

# Compatible shapes
a = np.array([[1, 2, 3]])      # Shape: (1, 3)
b = np.array([[4, 5, 6],       # Shape: (2, 3)
              [7, 8, 9]])
result = np.concatenate([a, b], axis=0)
print(result.shape)  # (3, 3)

# Incompatible shapes - this will fail
try:
    a = np.array([[1, 2]])      # Shape: (1, 2)
    b = np.array([[3, 4, 5]])   # Shape: (1, 3)
    np.concatenate([a, b], axis=0)
except ValueError as e:
    print(f"Error: {e}")
    # Error: all the input array dimensions except for the concatenation axis must match exactly

When working with arrays of different dimensions, ensure alignment along non-concatenation axes.

# Working with different row counts but same column count
data_batch1 = np.random.rand(100, 5)  # 100 samples, 5 features
data_batch2 = np.random.rand(50, 5)   # 50 samples, 5 features
combined = np.concatenate([data_batch1, data_batch2], axis=0)
print(combined.shape)  # (150, 5)

Practical Use Cases

Building Datasets Incrementally

# Simulating streaming data collection
results = np.array([]).reshape(0, 3)  # Initialize empty array with correct column count

for i in range(5):
    # Simulate getting new batch of data
    new_data = np.random.rand(10, 3)
    results = np.concatenate([results, new_data], axis=0)

print(results.shape)  # (50, 3)

Adding Padding or Borders

# Add border to image-like array
image = np.random.randint(0, 255, (28, 28))
border_width = 2
border_value = 0

# Top and bottom borders
top_border = np.full((border_width, 28), border_value)
bottom_border = np.full((border_width, 28), border_value)
image_with_tb = np.concatenate([top_border, image, bottom_border], axis=0)

# Left and right borders
left_border = np.full((32, border_width), border_value)
right_border = np.full((32, border_width), border_value)
image_bordered = np.concatenate([left_border, image_with_tb, right_border], axis=1)

print(image_bordered.shape)  # (32, 32)

Combining Feature Sets

# Machine learning scenario: combining different feature types
numerical_features = np.random.rand(1000, 10)
categorical_encoded = np.random.randint(0, 2, (1000, 5))
text_embeddings = np.random.randn(1000, 50)

all_features = np.concatenate([
    numerical_features,
    categorical_encoded,
    text_embeddings
], axis=1)

print(all_features.shape)  # (1000, 65)

Performance Considerations

Repeated concatenation in loops creates performance bottlenecks because NumPy allocates new memory for each operation.

import time

# Inefficient: repeated concatenation
start = time.time()
result = np.array([])
for i in range(1000):
    result = np.concatenate([result, np.array([i])])
inefficient_time = time.time() - start

# Efficient: pre-allocate and assign
start = time.time()
result = np.empty(1000)
for i in range(1000):
    result[i] = i
efficient_time = time.time() - start

print(f"Inefficient: {inefficient_time:.4f}s")
print(f"Efficient: {efficient_time:.4f}s")
print(f"Speedup: {inefficient_time/efficient_time:.1f}x")

When the final size is unknown, use Python lists for accumulation, then convert once:

# Better approach for unknown sizes
data_list = []
for i in range(1000):
    # Simulate variable-length processing
    batch = np.random.rand(np.random.randint(1, 10), 5)
    data_list.append(batch)

final_array = np.concatenate(data_list, axis=0)

NumPy provides specialized functions that may be more appropriate depending on your use case:

# np.vstack - vertical stacking (equivalent to concatenate with axis=0)
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(np.vstack([a, b]))
# [[1 2 3]
#  [4 5 6]]

# np.hstack - horizontal stacking (equivalent to concatenate with axis=1)
print(np.hstack([a, b]))  # [1 2 3 4 5 6]

# np.column_stack - stacks 1D arrays as columns
print(np.column_stack([a, b]))
# [[1 4]
#  [2 5]
#  [3 6]]

# np.stack - creates NEW axis
stacked = np.stack([a, b], axis=0)
print(stacked.shape)  # (2, 3) - new dimension created

The key difference: np.concatenate() joins along an existing axis, while np.stack() creates a new one.

Common Pitfalls

Empty Array Handling

# Concatenating with empty arrays
a = np.array([1, 2, 3])
b = np.array([])

# This works but may not behave as expected
result = np.concatenate([a, b])
print(result)  # [1. 2. 3.] - note dtype conversion

# Better: check before concatenating
arrays = [a, b]
non_empty = [arr for arr in arrays if arr.size > 0]
if non_empty:
    result = np.concatenate(non_empty)

Dtype Mismatches

# Different dtypes get promoted
int_array = np.array([1, 2, 3], dtype=np.int32)
float_array = np.array([4.5, 5.5, 6.5], dtype=np.float64)

result = np.concatenate([int_array, float_array])
print(result.dtype)  # float64 - integers promoted to float

# Explicit dtype control
result = np.concatenate([int_array, float_array]).astype(np.int32)
print(result)  # [1 2 3 4 5 6] - truncation occurred

np.concatenate() is the workhorse for array joining operations. Master axis manipulation, understand shape compatibility, and choose the right tool for your specific use case. For production code handling large datasets, always profile and consider pre-allocation strategies to avoid performance degradation.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.