Python - Shallow vs Deep Copy

Python uses reference semantics for object assignment. When you assign one variable to another, both point to the same object in memory.

Key Insights

  • Shallow copies create a new object but reference the same nested objects, while deep copies recursively duplicate all nested structures, preventing unintended side effects in complex data structures
  • Python’s assignment operator creates references, not copies—modifications affect all references, making explicit copying essential for data isolation
  • Performance and memory trade-offs exist: shallow copies are faster and lighter but can cause subtle bugs with mutable nested objects, while deep copies provide safety at the cost of resources

Understanding References vs Copies

Python uses reference semantics for object assignment. When you assign one variable to another, both point to the same object in memory.

original_list = [1, 2, 3]
reference = original_list
reference.append(4)

print(original_list)  # [1, 2, 3, 4]
print(reference)      # [1, 2, 3, 4]
print(id(original_list) == id(reference))  # True

This behavior causes unexpected mutations when you intend to work with independent copies. The solution requires explicit copying mechanisms.

Shallow Copy Mechanics

A shallow copy creates a new container object but populates it with references to the original’s contents. For single-level structures with immutable elements, this works perfectly.

import copy

# Shallow copy methods
original = [1, 2, 3, 4, 5]

# Method 1: Slice notation
copy1 = original[:]

# Method 2: list() constructor
copy2 = list(original)

# Method 3: copy module
copy3 = copy.copy(original)

# Method 4: list.copy() method
copy4 = original.copy()

copy1.append(6)
print(original)  # [1, 2, 3, 4, 5]
print(copy1)     # [1, 2, 3, 4, 5, 6]

The problem emerges with nested mutable objects. The outer container is new, but nested objects remain shared references.

original = [[1, 2], [3, 4], [5, 6]]
shallow = original.copy()

# Modifying the outer list is safe
shallow.append([7, 8])
print(len(original))  # 3
print(len(shallow))   # 4

# Modifying nested lists affects both
shallow[0].append(99)
print(original[0])  # [1, 2, 99]
print(shallow[0])   # [1, 2, 99]

Deep Copy Implementation

Deep copies recursively duplicate all nested objects, creating completely independent structures.

import copy

original = [[1, 2], [3, 4], [5, 6]]
deep = copy.deepcopy(original)

deep[0].append(99)
print(original[0])  # [1, 2]
print(deep[0])      # [1, 2, 99]

# Verify different memory addresses
print(id(original[0]) == id(deep[0]))  # False

Deep copying handles complex nested structures including dictionaries, sets, and custom objects.

import copy

class Node:
    def __init__(self, value, children=None):
        self.value = value
        self.children = children or []

# Create nested structure
root = Node(1, [
    Node(2, [Node(4), Node(5)]),
    Node(3, [Node(6)])
])

shallow_copy = copy.copy(root)
deep_copy = copy.deepcopy(root)

# Shallow copy shares children
shallow_copy.children[0].value = 999
print(root.children[0].value)  # 999

# Deep copy is independent
deep_copy.children[1].value = 888
print(root.children[1].value)  # 3

Dictionary and Set Copying

Dictionaries and sets follow the same shallow/deep copy principles.

import copy

# Shallow dictionary copy
original_dict = {
    'name': 'Alice',
    'scores': [85, 90, 95],
    'metadata': {'attempts': 3}
}

shallow_dict = original_dict.copy()
deep_dict = copy.deepcopy(original_dict)

# Shallow copy shares nested objects
shallow_dict['scores'].append(100)
print(original_dict['scores'])  # [85, 90, 95, 100]

# Deep copy is independent
deep_dict['scores'].append(100)
print(original_dict['scores'])  # [85, 90, 95, 100] (unchanged from deep_dict)

# Modifying top-level keys is always safe
shallow_dict['name'] = 'Bob'
print(original_dict['name'])  # 'Alice'

Custom Objects and copy Methods

Control copying behavior for custom classes by implementing __copy__ and __deepcopy__ methods.

import copy

class Database:
    def __init__(self, connection_string):
        self.connection_string = connection_string
        self.cache = {}
    
    def __copy__(self):
        # Create new instance with same connection
        new_db = Database(self.connection_string)
        # Share cache reference (shallow)
        new_db.cache = self.cache
        return new_db
    
    def __deepcopy__(self, memo):
        # Create new instance
        new_db = Database(self.connection_string)
        # Deep copy the cache
        new_db.cache = copy.deepcopy(self.cache, memo)
        return new_db

db = Database("postgresql://localhost")
db.cache['users'] = [{'id': 1, 'name': 'Alice'}]

shallow = copy.copy(db)
shallow.cache['users'].append({'id': 2, 'name': 'Bob'})
print(len(db.cache['users']))  # 2 (shared cache)

deep = copy.deepcopy(db)
deep.cache['users'].append({'id': 3, 'name': 'Charlie'})
print(len(db.cache['users']))  # 2 (independent cache)

The memo dictionary in __deepcopy__ prevents infinite recursion with circular references.

Circular References and Copy Behavior

Deep copy handles circular references automatically by tracking copied objects.

import copy

class TreeNode:
    def __init__(self, value):
        self.value = value
        self.parent = None
        self.children = []

# Create circular reference
root = TreeNode(1)
child = TreeNode(2)
root.children.append(child)
child.parent = root  # Circular reference

# Deep copy handles this correctly
root_copy = copy.deepcopy(root)
print(root_copy.children[0].parent.value)  # 1
print(root_copy.children[0].parent is root_copy)  # True

Performance Considerations

Benchmark shallow vs deep copy operations to understand performance implications.

import copy
import time

# Create nested structure
data = [[i] * 100 for i in range(1000)]

# Shallow copy timing
start = time.perf_counter()
for _ in range(1000):
    shallow = data.copy()
shallow_time = time.perf_counter() - start

# Deep copy timing
start = time.perf_counter()
for _ in range(1000):
    deep = copy.deepcopy(data)
deep_time = time.perf_counter() - start

print(f"Shallow: {shallow_time:.4f}s")
print(f"Deep: {deep_time:.4f}s")
print(f"Deep is {deep_time/shallow_time:.1f}x slower")

Deep copies typically run 10-100x slower depending on structure complexity and nesting depth.

Practical Decision Framework

Choose shallow copy when:

  • Working with single-level collections of immutable objects (strings, numbers, tuples)
  • Performance is critical and you control all mutations
  • You explicitly want to share nested object references

Choose deep copy when:

  • Nested mutable objects exist (lists of lists, dictionaries of lists)
  • Complete data isolation is required
  • Debugging mysterious shared-state bugs
  • Working with user input or untrusted data structures
import copy

def process_config(base_config, overrides):
    """
    Safe config processing requires deep copy to prevent
    modifications from affecting the base configuration.
    """
    config = copy.deepcopy(base_config)
    
    for key, value in overrides.items():
        if isinstance(value, dict) and key in config:
            config[key].update(value)
        else:
            config[key] = value
    
    return config

base = {
    'database': {'host': 'localhost', 'port': 5432},
    'cache': {'ttl': 3600}
}

prod_config = process_config(base, {'database': {'host': 'prod.db'}})
print(base['database']['host'])  # 'localhost' (unchanged)

Understanding copy semantics prevents entire classes of bugs related to unintended mutations and shared state. Choose the appropriate copying strategy based on your data structure complexity and performance requirements.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.