NumPy - Flatten Array (flatten vs ravel)
Array flattening converts a multi-dimensional array into a one-dimensional array. NumPy provides two primary methods: `flatten()` and `ravel()`. While both produce the same output shape, their...
Key Insights
flatten()creates a deep copy of the array and always returns a new memory allocation, whileravel()returns a view whenever possible, making it more memory-efficient for large datasetsravel()can fail with non-contiguous arrays or return copies in specific cases, whereasflatten()guarantees a copy regardless of the input array’s memory layout- For performance-critical applications processing millions of elements,
ravel()can be 10-100x faster thanflatten()when working with contiguous arrays
Understanding Array Flattening in NumPy
Array flattening converts a multi-dimensional array into a one-dimensional array. NumPy provides two primary methods: flatten() and ravel(). While both produce the same output shape, their underlying behavior differs significantly in memory management and performance characteristics.
import numpy as np
# Create a 2D array
arr = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
flat_result = arr.flatten()
ravel_result = arr.ravel()
print(f"Original shape: {arr.shape}")
print(f"Flattened: {flat_result}") # [1 2 3 4 5 6 7 8 9]
print(f"Raveled: {ravel_result}") # [1 2 3 4 5 6 7 8 9]
Memory Behavior: Copy vs View
The critical difference lies in memory allocation. flatten() always creates a copy, while ravel() returns a view when the array’s memory layout permits.
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
# flatten() creates a copy
flat = arr.flatten()
flat[0] = 999
print(f"Original after flatten modification: {arr[0, 0]}") # 1 (unchanged)
# ravel() returns a view (for contiguous arrays)
rav = arr.ravel()
rav[0] = 999
print(f"Original after ravel modification: {arr[0, 0]}") # 999 (changed!)
This behavior has important implications. Modifying a ravel() result affects the original array, while flatten() modifications remain isolated. Use flatten() when you need guaranteed independence from the source array.
Checking View vs Copy
You can verify whether ravel() returned a view or copy by checking the base attribute:
import numpy as np
arr = np.array([[1, 2], [3, 4]])
flat = arr.flatten()
rav = arr.ravel()
print(f"flatten() is a copy: {flat.base is None}") # True
print(f"ravel() is a view: {rav.base is not None}") # True
print(f"ravel() base is arr: {rav.base is arr}") # True
Performance Comparison
The performance difference becomes dramatic with large arrays:
import numpy as np
import time
# Create a large array
large_arr = np.random.rand(1000, 1000)
# Benchmark flatten()
start = time.perf_counter()
for _ in range(1000):
_ = large_arr.flatten()
flatten_time = time.perf_counter() - start
# Benchmark ravel()
start = time.perf_counter()
for _ in range(1000):
_ = large_arr.ravel()
ravel_time = time.perf_counter() - start
print(f"flatten() time: {flatten_time:.4f}s")
print(f"ravel() time: {ravel_time:.4f}s")
print(f"ravel() is {flatten_time/ravel_time:.1f}x faster")
On a 1000x1000 array, ravel() typically executes 50-100x faster because it avoids memory allocation and copying.
Order Parameters: C vs Fortran
Both methods support order parameters that control how multi-dimensional data maps to one dimension:
import numpy as np
arr = np.array([[1, 2, 3],
[4, 5, 6]])
# C-order (row-major, default)
c_order = arr.flatten(order='C')
print(f"C-order: {c_order}") # [1 2 3 4 5 6]
# Fortran-order (column-major)
f_order = arr.flatten(order='F')
print(f"F-order: {f_order}") # [1 4 2 5 3 6]
# Also works with ravel()
rav_f = arr.ravel(order='F')
print(f"Ravel F-order: {rav_f}") # [1 4 2 5 3 6]
C-order reads elements row by row, while Fortran-order reads column by column. This matters when interfacing with libraries that expect specific memory layouts.
When ravel() Returns a Copy
ravel() doesn’t always return a view. Non-contiguous arrays force a copy:
import numpy as np
arr = np.array([[1, 2, 3, 4],
[5, 6, 7, 8]])
# Slicing creates a non-contiguous array
sliced = arr[:, ::2] # Every other column
print(f"Sliced array:\n{sliced}")
rav = sliced.ravel()
print(f"ravel() returned a copy: {rav.base is None}") # True
# Modifying doesn't affect original
rav[0] = 999
print(f"Original unchanged: {sliced[0, 0]}") # 1
Transposed arrays also typically require copies:
import numpy as np
arr = np.array([[1, 2], [3, 4]])
transposed = arr.T
rav_t = transposed.ravel()
print(f"Transposed ravel() is copy: {rav_t.base is None}") # Often True
Practical Use Cases
Data Preprocessing for Machine Learning
import numpy as np
# Flatten image data for neural network input
images = np.random.rand(100, 28, 28) # 100 grayscale images
# Use ravel() for memory efficiency
flattened = images.reshape(100, -1) # Equivalent to ravel per image
# Or manually: [img.ravel() for img in images]
print(f"Shape: {flattened.shape}") # (100, 784)
Matrix Operations
import numpy as np
# Calculate dot product of flattened matrices
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# Use ravel() for performance
dot_product = np.dot(A.ravel(), B.ravel())
print(f"Dot product: {dot_product}") # 70
Safe Data Transformation
import numpy as np
def process_data(arr):
# Use flatten() to ensure original data isn't modified
working_copy = arr.flatten()
working_copy *= 2
return working_copy.reshape(arr.shape)
original = np.array([[1, 2], [3, 4]])
result = process_data(original)
print(f"Original preserved: {original[0, 0]}") # 1
print(f"Result modified: {result[0, 0]}") # 2
Alternative: reshape(-1)
NumPy’s reshape(-1) provides similar functionality to ravel():
import numpy as np
arr = np.array([[1, 2], [3, 4]])
reshaped = arr.reshape(-1)
raveled = arr.ravel()
print(f"Results equal: {np.array_equal(reshaped, raveled)}") # True
print(f"Both are views: {reshaped.base is arr and raveled.base is arr}") # True
The -1 tells NumPy to infer the dimension size. reshape(-1) behaves like ravel() but offers more flexibility for other reshaping operations.
Decision Framework
Choose flatten() when:
- You need guaranteed data independence
- Modifying the result shouldn’t affect the original
- Memory overhead is acceptable
- Working with small to medium datasets
Choose ravel() when:
- Performance is critical
- Working with large arrays (millions of elements)
- You understand the view/copy implications
- Memory efficiency matters
For production code processing large datasets, ravel() typically provides the best performance. For data integrity and predictable behavior, flatten() offers safety at a performance cost.