Python Copy: Shallow vs Deep Copy Explained
Python's assignment operator doesn't copy objects—it creates new references to existing objects. This behavior catches many developers off guard, especially when working with mutable data structures...
Key Insights
- Assignment in Python creates references to the same object, not copies—modifying one variable affects all references to that object
- Shallow copies duplicate the outer container but share references to nested objects, making them fast but potentially dangerous with mutable nested data
- Deep copies recursively duplicate all nested objects for complete independence, at the cost of performance and memory overhead
The Problem with Assignment
Python’s assignment operator doesn’t copy objects—it creates new references to existing objects. This behavior catches many developers off guard, especially when working with mutable data structures like lists and dictionaries.
original = [1, 2, 3]
copy = original
copy.append(4)
print(original) # [1, 2, 3, 4]
print(copy) # [1, 2, 3, 4]
print(id(original) == id(copy)) # True
Both variables point to the same object in memory. When you modify copy, you’re modifying original because they’re the same object. The id() function confirms they share the same memory address.
This behavior is intentional and efficient—Python avoids unnecessary duplication. But when you actually need independent copies, you need to be explicit about it.
Shallow Copy: Copying the Surface
A shallow copy creates a new object but doesn’t recursively copy nested objects. Instead, it copies references to those nested objects. Think of it as copying the container but sharing the contents.
Python provides several ways to create shallow copies:
import copy
# Method 1: List's copy() method
original = [[1, 2], [3, 4]]
shallow1 = original.copy()
# Method 2: Constructor
shallow2 = list(original)
# Method 3: Slicing
shallow3 = original[:]
# Method 4: copy.copy()
shallow4 = copy.copy(original)
All four methods produce the same result. Here’s where shallow copying gets interesting:
original = [[1, 2], [3, 4]]
shallow = original.copy()
# Modifying the outer list
shallow.append([5, 6])
print(original) # [[1, 2], [3, 4]]
print(shallow) # [[1, 2], [3, 4], [5, 6]]
# Modifying a nested list
shallow[0].append(99)
print(original) # [[1, 2, 99], [3, 4]]
print(shallow) # [[1, 2, 99], [3, 4], [5, 6]]
# Memory addresses prove it
print(id(original[0]) == id(shallow[0])) # True
The outer list is copied, but the nested lists are shared. Appending to the outer structure works independently, but modifying nested objects affects both copies.
Deep Copy: Copying All the Way Down
Deep copy recursively duplicates every object in the structure, creating complete independence. Use copy.deepcopy() from the copy module:
import copy
original = [[1, 2], [3, 4]]
deep = copy.deepcopy(original)
# Modify nested list
deep[0].append(99)
print(original) # [[1, 2], [3, 4]]
print(deep) # [[1, 2, 99], [3, 4]]
# Memory addresses are different
print(id(original[0]) == id(deep[0])) # False
Now the nested lists are truly independent. Deep copy traverses the entire object graph, creating new copies of everything it finds.
Practical Comparison with Common Data Structures
Let’s compare all three approaches with a complex nested dictionary:
import copy
original = {
'name': 'John',
'scores': [85, 90, 78],
'metadata': {
'attempts': 3,
'tags': ['python', 'advanced']
}
}
# Assignment (reference)
assigned = original
# Shallow copy
shallow = original.copy()
# Deep copy
deep = copy.deepcopy(original)
# Modify nested list
original['scores'].append(95)
# Modify nested dictionary
original['metadata']['attempts'] = 5
print("Assigned:", assigned['scores']) # [85, 90, 78, 95]
print("Shallow:", shallow['scores']) # [85, 90, 78, 95]
print("Deep:", deep['scores']) # [85, 90, 78]
print("Assigned attempts:", assigned['metadata']['attempts']) # 5
print("Shallow attempts:", shallow['metadata']['attempts']) # 5
print("Deep attempts:", deep['metadata']['attempts']) # 3
The assignment shares everything. The shallow copy creates a new outer dictionary but shares nested objects. Only the deep copy is truly independent.
When to Use Each Approach
Use assignment when you want multiple variables to reference the same object. This is common when passing objects to functions or creating aliases:
def process_data(data):
# data is just a reference, no copying
return [x * 2 for x in data]
Use shallow copy for flat data structures or when you intentionally want to share nested objects. It’s fast and memory-efficient:
# Configuration with shared references
default_config = {
'timeout': 30,
'retry': 3,
'endpoints': ['api.example.com'] # Shared list
}
# Multiple services share endpoint list
service1_config = default_config.copy()
service2_config = default_config.copy()
# Adding endpoint affects all services (intentional)
default_config['endpoints'].append('backup.example.com')
Use deep copy when you need complete independence, especially for complex state management:
import copy
class GameState:
def __init__(self, player_positions, inventory):
self.player_positions = player_positions
self.inventory = inventory
# Save game state for undo functionality
current_state = GameState(
player_positions={'player1': [10, 20], 'player2': [15, 25]},
inventory={'player1': ['sword', 'shield']}
)
saved_state = copy.deepcopy(current_state)
# Modify current state
current_state.player_positions['player1'][0] = 50
current_state.inventory['player1'].append('potion')
# Saved state remains unchanged
print(saved_state.player_positions['player1']) # [10, 20]
print(saved_state.inventory['player1']) # ['sword', 'shield']
Common Pitfalls and Edge Cases
Circular references are handled automatically by deepcopy():
import copy
original = {'name': 'Node'}
original['self'] = original # Circular reference
deep = copy.deepcopy(original)
print(deep['self'] is deep) # True - maintains structure
Immutable objects like strings, numbers, and tuples are never copied—Python reuses them for efficiency:
original = (1, 2, [3, 4])
shallow = copy.copy(original)
print(id(original) == id(shallow)) # True - tuples are immutable
print(id(original[2]) == id(shallow[2])) # True - nested list is shared
Custom objects need special handling if they have non-standard copying requirements:
import copy
class Connection:
def __init__(self, host):
self.host = host
self.socket = self._connect()
def _connect(self):
return f"socket_to_{self.host}"
def __copy__(self):
# Shallow copy: new object, same socket
new_obj = Connection.__new__(Connection)
new_obj.host = self.host
new_obj.socket = self.socket
return new_obj
def __deepcopy__(self, memo):
# Deep copy: new object, new connection
new_obj = Connection.__new__(Connection)
new_obj.host = copy.deepcopy(self.host, memo)
new_obj.socket = new_obj._connect()
return new_obj
conn = Connection('localhost')
shallow = copy.copy(conn)
deep = copy.deepcopy(conn)
print(id(conn.socket) == id(shallow.socket)) # True
print(id(conn.socket) == id(deep.socket)) # False
Performance Benchmarks and Best Practices
Deep copying is significantly slower than shallow copying, especially with large nested structures:
import copy
import timeit
# Flat list
flat_data = list(range(10000))
# Nested list
nested_data = [[i] * 100 for i in range(100)]
# Benchmark
shallow_flat = timeit.timeit(lambda: flat_data.copy(), number=10000)
deep_flat = timeit.timeit(lambda: copy.deepcopy(flat_data), number=10000)
shallow_nested = timeit.timeit(lambda: nested_data.copy(), number=10000)
deep_nested = timeit.timeit(lambda: copy.deepcopy(nested_data), number=10000)
print(f"Flat - Shallow: {shallow_flat:.4f}s, Deep: {deep_flat:.4f}s")
print(f"Nested - Shallow: {shallow_nested:.4f}s, Deep: {deep_nested:.4f}s")
# Typical output:
# Flat - Shallow: 0.0234s, Deep: 0.0456s
# Nested - Shallow: 0.0012s, Deep: 0.2341s
Best practices:
- Default to assignment unless you specifically need a copy
- Use shallow copy for simple data structures without nested mutables
- Reserve deep copy for complex state management, undo systems, or when complete independence is required
- Profile your code if copying becomes a bottleneck—sometimes restructuring data is better than copying
- Document your intent when using shallow copies with nested structures to avoid confusion
Choose the right copying strategy based on your data structure and requirements. Assignment is fastest but shares references. Shallow copy provides a middle ground. Deep copy guarantees independence at the cost of performance. Understanding these trade-offs makes you a more effective Python developer.