Python - Merge Two Dictionaries
Python provides multiple approaches to merge dictionaries, each with distinct performance characteristics and use cases. The most straightforward method uses the `update()` method, which modifies the...
Key Insights
- Python 3.9+ offers the merge operator (
|) as the most concise way to combine dictionaries, while the**unpacking operator works across all Python 3+ versions - Dictionary merge operations handle key conflicts by prioritizing values from the rightmost dictionary, making order critical when merging data with overlapping keys
- For complex merging scenarios requiring custom conflict resolution or deep merging of nested structures, the
ChainMapclass and recursive approaches provide production-ready solutions
Basic Dictionary Merging Techniques
Python provides multiple approaches to merge dictionaries, each with distinct performance characteristics and use cases. The most straightforward method uses the update() method, which modifies the original dictionary in place:
user_defaults = {'theme': 'dark', 'language': 'en', 'notifications': True}
user_preferences = {'theme': 'light', 'timezone': 'UTC'}
user_defaults.update(user_preferences)
print(user_defaults)
# Output: {'theme': 'light', 'language': 'en', 'notifications': True, 'timezone': 'UTC'}
The update() method mutates the original dictionary, which may not be desirable when you need to preserve the original data structures. For immutable operations, dictionary unpacking with the ** operator creates a new dictionary:
config_defaults = {'timeout': 30, 'retries': 3, 'debug': False}
config_overrides = {'timeout': 60, 'log_level': 'INFO'}
merged_config = {**config_defaults, **config_overrides}
print(merged_config)
# Output: {'timeout': 60, 'retries': 3, 'debug': False, 'log_level': 'INFO'}
The Merge Operator (Python 3.9+)
Python 3.9 introduced the | operator for dictionary merging, providing cleaner syntax with the same semantics as unpacking:
database_config = {'host': 'localhost', 'port': 5432, 'pool_size': 10}
production_overrides = {'host': 'prod.db.example.com', 'ssl': True}
final_config = database_config | production_overrides
print(final_config)
# Output: {'host': 'prod.db.example.com', 'port': 5432, 'pool_size': 10, 'ssl': True}
The augmented assignment operator |= provides an in-place alternative:
settings = {'api_version': 'v2', 'cache_enabled': False}
environment_vars = {'cache_enabled': True, 'rate_limit': 1000}
settings |= environment_vars
print(settings)
# Output: {'api_version': 'v2', 'cache_enabled': True, 'rate_limit': 1000}
Merging Multiple Dictionaries
Real-world applications often require merging more than two dictionaries, such as combining default settings, environment-specific configs, and user preferences:
defaults = {'timeout': 30, 'retries': 3, 'compression': 'gzip'}
environment = {'timeout': 60, 'endpoint': 'https://api.example.com'}
user_settings = {'retries': 5, 'custom_header': 'X-Client-ID'}
# Using unpacking
final = {**defaults, **environment, **user_settings}
print(final)
# Output: {'timeout': 60, 'retries': 5, 'compression': 'gzip',
# 'endpoint': 'https://api.example.com', 'custom_header': 'X-Client-ID'}
# Using merge operator (Python 3.9+)
final = defaults | environment | user_settings
print(final)
# Same output as above
For dynamic scenarios with an arbitrary number of dictionaries, use reduce() from the functools module:
from functools import reduce
import operator
config_layers = [
{'service': 'api', 'version': '1.0'},
{'version': '2.0', 'auth': 'bearer'},
{'endpoint': '/v2/data', 'auth': 'oauth2'},
{'timeout': 45}
]
merged = reduce(operator.or_, config_layers)
print(merged)
# Output: {'service': 'api', 'version': '2.0', 'auth': 'oauth2',
# 'endpoint': '/v2/data', 'timeout': 45}
ChainMap for Layered Configuration
The collections.ChainMap class provides a memory-efficient approach for layered dictionaries without creating copies:
from collections import ChainMap
system_defaults = {'log_level': 'WARNING', 'max_connections': 100}
app_config = {'log_level': 'INFO', 'app_name': 'DataProcessor'}
runtime_overrides = {'max_connections': 200}
config = ChainMap(runtime_overrides, app_config, system_defaults)
print(config['log_level']) # Output: INFO
print(config['max_connections']) # Output: 200
print(config['app_name']) # Output: DataProcessor
# Convert to regular dict if needed
final_dict = dict(config)
print(final_dict)
# Output: {'max_connections': 200, 'log_level': 'INFO', 'app_name': 'DataProcessor'}
ChainMap searches through the dictionaries in order and returns the first match, making it ideal for configuration hierarchies where you want to maintain separate layers:
from collections import ChainMap
def get_config(user_id):
global_settings = {'theme': 'light', 'language': 'en', 'timeout': 30}
user_prefs = load_user_preferences(user_id) # Hypothetical function
session_data = {'timeout': 60, 'session_id': 'abc123'}
return ChainMap(session_data, user_prefs, global_settings)
def load_user_preferences(user_id):
return {'theme': 'dark', 'notifications': True}
config = get_config('user_42')
print(dict(config))
# Output: {'timeout': 60, 'session_id': 'abc123', 'theme': 'dark',
# 'notifications': True, 'language': 'en'}
Deep Merging Nested Dictionaries
Simple merge operations don’t handle nested dictionaries recursively. When merging configuration objects with nested structures, you need a custom deep merge function:
def deep_merge(base, override):
"""Recursively merge override into base."""
result = base.copy()
for key, value in override.items():
if key in result and isinstance(result[key], dict) and isinstance(value, dict):
result[key] = deep_merge(result[key], value)
else:
result[key] = value
return result
base_config = {
'database': {
'host': 'localhost',
'port': 5432,
'credentials': {
'user': 'admin',
'password': 'default'
}
},
'cache': {
'enabled': True
}
}
override_config = {
'database': {
'host': 'prod.example.com',
'credentials': {
'password': 'secure_password'
}
},
'cache': {
'ttl': 3600
}
}
merged = deep_merge(base_config, override_config)
print(merged)
# Output: {
# 'database': {
# 'host': 'prod.example.com',
# 'port': 5432,
# 'credentials': {'user': 'admin', 'password': 'secure_password'}
# },
# 'cache': {'enabled': True, 'ttl': 3600}
# }
Custom Merge Strategies
For production applications requiring specific conflict resolution logic, implement custom merge functions:
def merge_with_strategy(dict1, dict2, strategy='override'):
"""
Merge dictionaries with configurable conflict resolution.
Strategies:
- override: dict2 values take precedence (default)
- keep: dict1 values take precedence
- combine: combine values into lists
"""
result = dict1.copy()
for key, value in dict2.items():
if key not in result:
result[key] = value
elif strategy == 'override':
result[key] = value
elif strategy == 'keep':
pass # Keep existing value
elif strategy == 'combine':
if isinstance(result[key], list):
result[key].append(value)
else:
result[key] = [result[key], value]
return result
metrics_a = {'requests': 1000, 'errors': 5, 'endpoint': '/api/v1'}
metrics_b = {'requests': 1500, 'latency': 250, 'endpoint': '/api/v2'}
print(merge_with_strategy(metrics_a, metrics_b, 'override'))
# Output: {'requests': 1500, 'errors': 5, 'endpoint': '/api/v2', 'latency': 250}
print(merge_with_strategy(metrics_a, metrics_b, 'keep'))
# Output: {'requests': 1000, 'errors': 5, 'endpoint': '/api/v1', 'latency': 250}
print(merge_with_strategy(metrics_a, metrics_b, 'combine'))
# Output: {'requests': [1000, 1500], 'errors': 5, 'endpoint': ['/api/v1', '/api/v2'],
# 'latency': 250}
Performance Considerations
For performance-critical applications, benchmark different merge approaches:
import timeit
setup = """
dict1 = {f'key_{i}': i for i in range(1000)}
dict2 = {f'key_{i}': i * 2 for i in range(500, 1500)}
"""
print("update():", timeit.timeit('d = dict1.copy(); d.update(dict2)', setup=setup, number=10000))
print("unpacking:", timeit.timeit('d = {**dict1, **dict2}', setup=setup, number=10000))
print("merge operator:", timeit.timeit('d = dict1 | dict2', setup=setup, number=10000))
# Typical results (times vary by system):
# update(): ~0.15s
# unpacking: ~0.16s
# merge operator: ~0.16s
The | operator and ** unpacking show similar performance, while update() on a copy is marginally faster. Choose based on readability and whether you need to preserve the original dictionary.