Python - Shallow vs Deep Copy
Python uses reference semantics for object assignment. When you assign one variable to another, both point to the same object in memory.
Key Insights
- Shallow copies create a new object but reference the same nested objects, while deep copies recursively duplicate all nested structures, preventing unintended side effects in complex data structures
- Python’s assignment operator creates references, not copies—modifications affect all references, making explicit copying essential for data isolation
- Performance and memory trade-offs exist: shallow copies are faster and lighter but can cause subtle bugs with mutable nested objects, while deep copies provide safety at the cost of resources
Understanding References vs Copies
Python uses reference semantics for object assignment. When you assign one variable to another, both point to the same object in memory.
original_list = [1, 2, 3]
reference = original_list
reference.append(4)
print(original_list) # [1, 2, 3, 4]
print(reference) # [1, 2, 3, 4]
print(id(original_list) == id(reference)) # True
This behavior causes unexpected mutations when you intend to work with independent copies. The solution requires explicit copying mechanisms.
Shallow Copy Mechanics
A shallow copy creates a new container object but populates it with references to the original’s contents. For single-level structures with immutable elements, this works perfectly.
import copy
# Shallow copy methods
original = [1, 2, 3, 4, 5]
# Method 1: Slice notation
copy1 = original[:]
# Method 2: list() constructor
copy2 = list(original)
# Method 3: copy module
copy3 = copy.copy(original)
# Method 4: list.copy() method
copy4 = original.copy()
copy1.append(6)
print(original) # [1, 2, 3, 4, 5]
print(copy1) # [1, 2, 3, 4, 5, 6]
The problem emerges with nested mutable objects. The outer container is new, but nested objects remain shared references.
original = [[1, 2], [3, 4], [5, 6]]
shallow = original.copy()
# Modifying the outer list is safe
shallow.append([7, 8])
print(len(original)) # 3
print(len(shallow)) # 4
# Modifying nested lists affects both
shallow[0].append(99)
print(original[0]) # [1, 2, 99]
print(shallow[0]) # [1, 2, 99]
Deep Copy Implementation
Deep copies recursively duplicate all nested objects, creating completely independent structures.
import copy
original = [[1, 2], [3, 4], [5, 6]]
deep = copy.deepcopy(original)
deep[0].append(99)
print(original[0]) # [1, 2]
print(deep[0]) # [1, 2, 99]
# Verify different memory addresses
print(id(original[0]) == id(deep[0])) # False
Deep copying handles complex nested structures including dictionaries, sets, and custom objects.
import copy
class Node:
def __init__(self, value, children=None):
self.value = value
self.children = children or []
# Create nested structure
root = Node(1, [
Node(2, [Node(4), Node(5)]),
Node(3, [Node(6)])
])
shallow_copy = copy.copy(root)
deep_copy = copy.deepcopy(root)
# Shallow copy shares children
shallow_copy.children[0].value = 999
print(root.children[0].value) # 999
# Deep copy is independent
deep_copy.children[1].value = 888
print(root.children[1].value) # 3
Dictionary and Set Copying
Dictionaries and sets follow the same shallow/deep copy principles.
import copy
# Shallow dictionary copy
original_dict = {
'name': 'Alice',
'scores': [85, 90, 95],
'metadata': {'attempts': 3}
}
shallow_dict = original_dict.copy()
deep_dict = copy.deepcopy(original_dict)
# Shallow copy shares nested objects
shallow_dict['scores'].append(100)
print(original_dict['scores']) # [85, 90, 95, 100]
# Deep copy is independent
deep_dict['scores'].append(100)
print(original_dict['scores']) # [85, 90, 95, 100] (unchanged from deep_dict)
# Modifying top-level keys is always safe
shallow_dict['name'] = 'Bob'
print(original_dict['name']) # 'Alice'
Custom Objects and copy Methods
Control copying behavior for custom classes by implementing __copy__ and __deepcopy__ methods.
import copy
class Database:
def __init__(self, connection_string):
self.connection_string = connection_string
self.cache = {}
def __copy__(self):
# Create new instance with same connection
new_db = Database(self.connection_string)
# Share cache reference (shallow)
new_db.cache = self.cache
return new_db
def __deepcopy__(self, memo):
# Create new instance
new_db = Database(self.connection_string)
# Deep copy the cache
new_db.cache = copy.deepcopy(self.cache, memo)
return new_db
db = Database("postgresql://localhost")
db.cache['users'] = [{'id': 1, 'name': 'Alice'}]
shallow = copy.copy(db)
shallow.cache['users'].append({'id': 2, 'name': 'Bob'})
print(len(db.cache['users'])) # 2 (shared cache)
deep = copy.deepcopy(db)
deep.cache['users'].append({'id': 3, 'name': 'Charlie'})
print(len(db.cache['users'])) # 2 (independent cache)
The memo dictionary in __deepcopy__ prevents infinite recursion with circular references.
Circular References and Copy Behavior
Deep copy handles circular references automatically by tracking copied objects.
import copy
class TreeNode:
def __init__(self, value):
self.value = value
self.parent = None
self.children = []
# Create circular reference
root = TreeNode(1)
child = TreeNode(2)
root.children.append(child)
child.parent = root # Circular reference
# Deep copy handles this correctly
root_copy = copy.deepcopy(root)
print(root_copy.children[0].parent.value) # 1
print(root_copy.children[0].parent is root_copy) # True
Performance Considerations
Benchmark shallow vs deep copy operations to understand performance implications.
import copy
import time
# Create nested structure
data = [[i] * 100 for i in range(1000)]
# Shallow copy timing
start = time.perf_counter()
for _ in range(1000):
shallow = data.copy()
shallow_time = time.perf_counter() - start
# Deep copy timing
start = time.perf_counter()
for _ in range(1000):
deep = copy.deepcopy(data)
deep_time = time.perf_counter() - start
print(f"Shallow: {shallow_time:.4f}s")
print(f"Deep: {deep_time:.4f}s")
print(f"Deep is {deep_time/shallow_time:.1f}x slower")
Deep copies typically run 10-100x slower depending on structure complexity and nesting depth.
Practical Decision Framework
Choose shallow copy when:
- Working with single-level collections of immutable objects (strings, numbers, tuples)
- Performance is critical and you control all mutations
- You explicitly want to share nested object references
Choose deep copy when:
- Nested mutable objects exist (lists of lists, dictionaries of lists)
- Complete data isolation is required
- Debugging mysterious shared-state bugs
- Working with user input or untrusted data structures
import copy
def process_config(base_config, overrides):
"""
Safe config processing requires deep copy to prevent
modifications from affecting the base configuration.
"""
config = copy.deepcopy(base_config)
for key, value in overrides.items():
if isinstance(value, dict) and key in config:
config[key].update(value)
else:
config[key] = value
return config
base = {
'database': {'host': 'localhost', 'port': 5432},
'cache': {'ttl': 3600}
}
prod_config = process_config(base, {'database': {'host': 'prod.db'}})
print(base['database']['host']) # 'localhost' (unchanged)
Understanding copy semantics prevents entire classes of bugs related to unintended mutations and shared state. Choose the appropriate copying strategy based on your data structure complexity and performance requirements.