Python - Flatten a Nested List | Application Architect

Key Insights

Python offers multiple approaches to flatten nested lists, from simple recursive solutions to high-performance itertools-based implementations that can be 3-5x faster for large datasets
The choice of flattening method depends on nesting depth requirements: some solutions handle arbitrary depth while others work only for single-level nesting
Generator-based approaches provide memory-efficient streaming for large nested structures, avoiding the overhead of building intermediate lists

Basic Recursive Approach

The most intuitive way to flatten a nested list uses recursion. This method works for arbitrarily deep nesting levels and handles mixed data types gracefully.

def flatten_recursive(nested_list):
    result = []
    for item in nested_list:
        if isinstance(item, list):
            result.extend(flatten_recursive(item))
        else:
            result.append(item)
    return result

# Example usage
nested = [1, [2, 3, [4, 5]], 6, [7, [8, 9]]]
flattened = flatten_recursive(nested)
print(flattened)  # [1, 2, 3, 4, 5, 6, 7, 8, 9]

This approach checks each element’s type and recursively processes sublists. The isinstance() check determines whether to recurse or append the element directly. While readable, this method creates multiple intermediate lists during recursion, which impacts memory efficiency.

Generator-Based Flattening

For memory-intensive applications, generators provide lazy evaluation that processes elements on-demand without building intermediate structures.

def flatten_generator(nested_list):
    for item in nested_list:
        if isinstance(item, list):
            yield from flatten_generator(item)
        else:
            yield item

# Example usage
nested = [1, [2, [3, [4, [5]]]]]
flattened = list(flatten_generator(nested))
print(flattened)  # [1, 2, 3, 4, 5]

# Stream processing without materializing full list
for num in flatten_generator(nested):
    print(num, end=' ')  # 1 2 3 4 5

The yield from syntax delegates iteration to recursive calls, creating a memory-efficient pipeline. This approach excels when processing large datasets where you don’t need all elements in memory simultaneously.

Single-Level Flattening with List Comprehension

When dealing with lists nested only one level deep, list comprehensions offer concise, performant solutions.

# Method 1: Nested comprehension
nested = [[1, 2], [3, 4], [5, 6]]
flattened = [item for sublist in nested for item in sublist]
print(flattened)  # [1, 2, 3, 4, 5, 6]

# Method 2: Using sum() with empty list
flattened = sum(nested, [])
print(flattened)  # [1, 2, 3, 4, 5, 6]

# Method 3: itertools.chain.from_iterable
from itertools import chain

flattened = list(chain.from_iterable(nested))
print(flattened)  # [1, 2, 3, 4, 5, 6]

The sum() approach works by using list concatenation as the operation, starting with an empty list. However, itertools.chain.from_iterable() significantly outperforms both alternatives for large datasets due to optimized C implementation.

Performance-Optimized Implementation

For production systems handling large nested structures, performance matters. Here’s a comparison of different approaches:

import timeit
from itertools import chain

# Test data
nested = [[i, i+1, i+2] for i in range(0, 10000, 3)]

# Recursive approach
def flatten_recursive(nested_list):
    result = []
    for item in nested_list:
        if isinstance(item, list):
            result.extend(flatten_recursive(item))
        else:
            result.append(item)
    return result

# Generator approach
def flatten_generator(nested_list):
    for item in nested_list:
        if isinstance(item, list):
            yield from flatten_generator(item)
        else:
            yield item

# Benchmark
print("Recursive:", timeit.timeit(
    lambda: flatten_recursive(nested), number=100))

print("Generator:", timeit.timeit(
    lambda: list(flatten_generator(nested)), number=100))

print("chain.from_iterable:", timeit.timeit(
    lambda: list(chain.from_iterable(nested)), number=100))

print("List comprehension:", timeit.timeit(
    lambda: [item for sublist in nested for item in sublist], number=100))

Results typically show chain.from_iterable() as the fastest for single-level nesting, often 3-5x faster than recursive approaches. List comprehensions come in second, while recursive methods trade performance for flexibility with arbitrary nesting depth.

Handling Mixed Types and Edge Cases

Real-world data often contains mixed types requiring careful handling:

def flatten_safe(nested_list, exclude_types=(str, bytes)):
    """
    Flatten nested lists while treating certain iterables as atomic.
    
    Args:
        nested_list: The nested structure to flatten
        exclude_types: Types to treat as non-iterable
    """
    result = []
    for item in nested_list:
        if isinstance(item, list):
            result.extend(flatten_safe(item, exclude_types))
        elif isinstance(item, exclude_types):
            result.append(item)
        else:
            try:
                # Handle other iterables (tuples, sets, etc.)
                result.extend(flatten_safe(list(item), exclude_types))
            except TypeError:
                # Item is not iterable
                result.append(item)
    return result

# Example usage
mixed = [1, "hello", [2, 3], ("a", "b"), [[4, 5]], {"key": "value"}]
flattened = flatten_safe(mixed)
print(flattened)  # [1, 'hello', 2, 3, 'a', 'b', 4, 5, {'key': 'value'}]

This implementation prevents strings from being split into characters and handles tuples, sets, and other iterables. The exclude_types parameter provides flexibility for domain-specific requirements.

Stack-Based Iterative Solution

Recursive solutions risk stack overflow with deeply nested structures. An iterative approach using an explicit stack eliminates this limitation:

def flatten_iterative(nested_list):
    stack = [nested_list]
    result = []
    
    while stack:
        current = stack.pop()
        
        if isinstance(current, list):
            # Add items in reverse order to maintain original order
            stack.extend(reversed(current))
        else:
            result.append(current)
    
    return result

# Example with deep nesting
deep_nested = [1, [2, [3, [4, [5, [6, [7, [8, [9, [10]]]]]]]]]]]
flattened = flatten_iterative(deep_nested)
print(flattened)  # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

The stack-based approach processes elements iteratively, avoiding recursion depth limits. Items are added in reverse order to maintain the original sequence when popping from the stack.

NumPy Alternative for Numeric Data

When working exclusively with numeric arrays, NumPy provides optimized flattening:

import numpy as np

# Create nested structure
nested = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# Convert to NumPy array and flatten
arr = np.array(nested)
flattened = arr.flatten()
print(flattened)  # [1 2 3 4 5 6 7 8 9]

# Alternative: ravel() for view instead of copy
flattened_view = arr.ravel()
print(flattened_view)  # [1 2 3 4 5 6 7 8 9]

NumPy’s flatten() creates a copy while ravel() returns a view when possible, offering better performance for read-only operations. This approach works only for regular, rectangular nested structures with uniform data types.

Choosing the Right Approach

Select your flattening method based on these criteria:

Single-level nesting + performance critical: Use itertools.chain.from_iterable()
Arbitrary depth + memory constraints: Use generator-based approach
Arbitrary depth + simplicity: Use recursive approach
Very deep nesting: Use stack-based iterative approach
Numeric data only: Use NumPy’s flatten() or ravel()
Mixed types with special handling: Use safe flattening with type exclusions

Each method has distinct trade-offs between performance, memory usage, and flexibility. Profile your specific use case to make informed decisions.