NumPy - Concatenate Arrays (np.concatenate)
import numpy as np
Key Insights
np.concatenate()joins arrays along existing axes and requires all input arrays to have the same shape except in the concatenation dimension, whilenp.stack()creates a new axis- The
axisparameter determines the concatenation direction:axis=0stacks vertically (rows),axis=1horizontally (columns), with defaultaxis=0 - For performance-critical applications, pre-allocating arrays is faster than repeated concatenation in loops, as each
np.concatenate()call creates a new array in memory
Basic Concatenation Syntax
np.concatenate() takes a sequence of arrays and joins them along an existing axis. The fundamental requirement: all arrays must have identical shapes except along the concatenation axis.
import numpy as np
# 1D arrays - simple concatenation
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = np.concatenate([a, b])
print(result) # [1 2 3 4 5 6]
# Multiple arrays at once
c = np.array([7, 8, 9])
result = np.concatenate([a, b, c])
print(result) # [1 2 3 4 5 6 7 8 9]
The function accepts a tuple or list of arrays as its first argument. For single-dimensional arrays, the operation is straightforward—arrays are joined end-to-end.
Axis Parameter for Multi-Dimensional Arrays
The axis parameter controls which dimension to concatenate along. Understanding axis orientation is critical for correct array manipulation.
# 2D arrays - vertical stacking (axis=0, default)
x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6], [7, 8]])
vertical = np.concatenate([x, y], axis=0)
print(vertical)
# [[1 2]
# [3 4]
# [5 6]
# [7 8]]
# Horizontal stacking (axis=1)
horizontal = np.concatenate([x, y], axis=1)
print(horizontal)
# [[1 2 5 6]
# [3 4 7 8]]
# 3D arrays - concatenate along depth (axis=2)
a = np.array([[[1, 2]], [[3, 4]]])
b = np.array([[[5, 6]], [[7, 8]]])
depth_concat = np.concatenate([a, b], axis=2)
print(depth_concat.shape) # (2, 1, 4)
Negative axis values work as expected: axis=-1 refers to the last axis, axis=-2 to the second-to-last, and so on.
Shape Compatibility Requirements
Arrays must have compatible shapes. The only dimension that can differ is the concatenation axis.
# Compatible shapes
a = np.array([[1, 2, 3]]) # Shape: (1, 3)
b = np.array([[4, 5, 6], # Shape: (2, 3)
[7, 8, 9]])
result = np.concatenate([a, b], axis=0)
print(result.shape) # (3, 3)
# Incompatible shapes - this will fail
try:
a = np.array([[1, 2]]) # Shape: (1, 2)
b = np.array([[3, 4, 5]]) # Shape: (1, 3)
np.concatenate([a, b], axis=0)
except ValueError as e:
print(f"Error: {e}")
# Error: all the input array dimensions except for the concatenation axis must match exactly
When working with arrays of different dimensions, ensure alignment along non-concatenation axes.
# Working with different row counts but same column count
data_batch1 = np.random.rand(100, 5) # 100 samples, 5 features
data_batch2 = np.random.rand(50, 5) # 50 samples, 5 features
combined = np.concatenate([data_batch1, data_batch2], axis=0)
print(combined.shape) # (150, 5)
Practical Use Cases
Building Datasets Incrementally
# Simulating streaming data collection
results = np.array([]).reshape(0, 3) # Initialize empty array with correct column count
for i in range(5):
# Simulate getting new batch of data
new_data = np.random.rand(10, 3)
results = np.concatenate([results, new_data], axis=0)
print(results.shape) # (50, 3)
Adding Padding or Borders
# Add border to image-like array
image = np.random.randint(0, 255, (28, 28))
border_width = 2
border_value = 0
# Top and bottom borders
top_border = np.full((border_width, 28), border_value)
bottom_border = np.full((border_width, 28), border_value)
image_with_tb = np.concatenate([top_border, image, bottom_border], axis=0)
# Left and right borders
left_border = np.full((32, border_width), border_value)
right_border = np.full((32, border_width), border_value)
image_bordered = np.concatenate([left_border, image_with_tb, right_border], axis=1)
print(image_bordered.shape) # (32, 32)
Combining Feature Sets
# Machine learning scenario: combining different feature types
numerical_features = np.random.rand(1000, 10)
categorical_encoded = np.random.randint(0, 2, (1000, 5))
text_embeddings = np.random.randn(1000, 50)
all_features = np.concatenate([
numerical_features,
categorical_encoded,
text_embeddings
], axis=1)
print(all_features.shape) # (1000, 65)
Performance Considerations
Repeated concatenation in loops creates performance bottlenecks because NumPy allocates new memory for each operation.
import time
# Inefficient: repeated concatenation
start = time.time()
result = np.array([])
for i in range(1000):
result = np.concatenate([result, np.array([i])])
inefficient_time = time.time() - start
# Efficient: pre-allocate and assign
start = time.time()
result = np.empty(1000)
for i in range(1000):
result[i] = i
efficient_time = time.time() - start
print(f"Inefficient: {inefficient_time:.4f}s")
print(f"Efficient: {efficient_time:.4f}s")
print(f"Speedup: {inefficient_time/efficient_time:.1f}x")
When the final size is unknown, use Python lists for accumulation, then convert once:
# Better approach for unknown sizes
data_list = []
for i in range(1000):
# Simulate variable-length processing
batch = np.random.rand(np.random.randint(1, 10), 5)
data_list.append(batch)
final_array = np.concatenate(data_list, axis=0)
Alternatives and Related Functions
NumPy provides specialized functions that may be more appropriate depending on your use case:
# np.vstack - vertical stacking (equivalent to concatenate with axis=0)
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(np.vstack([a, b]))
# [[1 2 3]
# [4 5 6]]
# np.hstack - horizontal stacking (equivalent to concatenate with axis=1)
print(np.hstack([a, b])) # [1 2 3 4 5 6]
# np.column_stack - stacks 1D arrays as columns
print(np.column_stack([a, b]))
# [[1 4]
# [2 5]
# [3 6]]
# np.stack - creates NEW axis
stacked = np.stack([a, b], axis=0)
print(stacked.shape) # (2, 3) - new dimension created
The key difference: np.concatenate() joins along an existing axis, while np.stack() creates a new one.
Common Pitfalls
Empty Array Handling
# Concatenating with empty arrays
a = np.array([1, 2, 3])
b = np.array([])
# This works but may not behave as expected
result = np.concatenate([a, b])
print(result) # [1. 2. 3.] - note dtype conversion
# Better: check before concatenating
arrays = [a, b]
non_empty = [arr for arr in arrays if arr.size > 0]
if non_empty:
result = np.concatenate(non_empty)
Dtype Mismatches
# Different dtypes get promoted
int_array = np.array([1, 2, 3], dtype=np.int32)
float_array = np.array([4.5, 5.5, 6.5], dtype=np.float64)
result = np.concatenate([int_array, float_array])
print(result.dtype) # float64 - integers promoted to float
# Explicit dtype control
result = np.concatenate([int_array, float_array]).astype(np.int32)
print(result) # [1 2 3 4 5 6] - truncation occurred
np.concatenate() is the workhorse for array joining operations. Master axis manipulation, understand shape compatibility, and choose the right tool for your specific use case. For production code handling large datasets, always profile and consider pre-allocation strategies to avoid performance degradation.