Python - DefaultDict with Examples | Application Architect

Key Insights

• DefaultDict eliminates KeyError exceptions by automatically creating missing keys with default values, reducing boilerplate code and making dictionary operations more concise • Factory functions passed to DefaultDict determine the type and initial value of missing keys, enabling patterns like auto-vivification and grouping operations without explicit key checking • DefaultDict behaves identically to regular dictionaries for existing keys, making it a drop-in replacement that only affects missing key access patterns

Understanding DefaultDict Basics

DefaultDict is a subclass of Python’s built-in dict that overrides one method to provide default values for missing keys. When you access a non-existent key, instead of raising a KeyError, it calls a factory function to create a default value.

from collections import defaultdict

# Regular dict raises KeyError
regular_dict = {}
try:
    value = regular_dict['missing_key']
except KeyError:
    print("KeyError raised")

# DefaultDict returns default value
default_dict = defaultdict(int)
value = default_dict['missing_key']
print(value)  # Output: 0

The factory function can be any callable that takes no arguments. Common choices include int, list, set, str, or custom lambda functions.

# Different factory functions
int_dict = defaultdict(int)          # Default: 0
list_dict = defaultdict(list)        # Default: []
set_dict = defaultdict(set)          # Default: set()
str_dict = defaultdict(str)          # Default: ''
custom_dict = defaultdict(lambda: 'N/A')  # Default: 'N/A'

Counting and Aggregation Patterns

The most common use case for DefaultDict is counting occurrences without checking if keys exist.

from collections import defaultdict

# Count word frequencies
text = "the quick brown fox jumps over the lazy dog the fox"
word_count = defaultdict(int)

for word in text.split():
    word_count[word] += 1

print(dict(word_count))
# Output: {'the': 3, 'quick': 1, 'brown': 1, 'fox': 2, ...}

# Compare with regular dict
regular_count = {}
for word in text.split():
    if word not in regular_count:
        regular_count[word] = 0
    regular_count[word] += 1

For more complex aggregations, combine DefaultDict with other data structures:

# Group items by category with sum
sales_data = [
    ('Electronics', 100),
    ('Clothing', 50),
    ('Electronics', 200),
    ('Food', 30),
    ('Clothing', 75)
]

category_totals = defaultdict(int)
for category, amount in sales_data:
    category_totals[category] += amount

print(dict(category_totals))
# Output: {'Electronics': 300, 'Clothing': 125, 'Food': 30}

Grouping and Collection Operations

DefaultDict with list or set as the factory function enables elegant grouping operations.

from collections import defaultdict

# Group students by grade
students = [
    ('Alice', 'A'),
    ('Bob', 'B'),
    ('Charlie', 'A'),
    ('David', 'C'),
    ('Eve', 'B')
]

grade_groups = defaultdict(list)
for name, grade in students:
    grade_groups[grade].append(name)

print(dict(grade_groups))
# Output: {'A': ['Alice', 'Charlie'], 'B': ['Bob', 'Eve'], 'C': ['David']}

# Group with sets to avoid duplicates
tags_by_article = defaultdict(set)
article_tags = [
    ('article1', 'python'),
    ('article1', 'coding'),
    ('article2', 'python'),
    ('article1', 'python'),  # Duplicate
]

for article, tag in article_tags:
    tags_by_article[article].add(tag)

print(dict(tags_by_article))
# Output: {'article1': {'python', 'coding'}, 'article2': {'python'}}

Nested DefaultDict Structures

DefaultDict supports nesting for multi-level data structures, useful for sparse matrices or hierarchical data.

from collections import defaultdict

# Create a nested defaultdict
nested_dict = defaultdict(lambda: defaultdict(int))

# Track user actions by date
user_actions = [
    ('user1', '2024-01-01', 5),
    ('user1', '2024-01-02', 3),
    ('user2', '2024-01-01', 7),
    ('user1', '2024-01-01', 2),  # Additional actions
]

action_tracker = defaultdict(lambda: defaultdict(int))
for user, date, count in user_actions:
    action_tracker[user][date] += count

print(dict(action_tracker))
# Output: {'user1': {'2024-01-01': 7, '2024-01-02': 3}, 'user2': {'2024-01-01': 7}}

# Three-level nesting for complex hierarchies
tree = lambda: defaultdict(tree)
taxonomy = tree()
taxonomy['Animal']['Mammal']['Dog'] = 'Canis familiaris'
taxonomy['Animal']['Mammal']['Cat'] = 'Felis catus'
taxonomy['Plant']['Flower']['Rose'] = 'Rosa'

print(taxonomy['Animal']['Mammal']['Dog'])  # Output: Canis familiaris

Graph and Adjacency List Representations

DefaultDict simplifies graph implementations by automatically handling missing vertices.

from collections import defaultdict

# Build an adjacency list for a directed graph
graph = defaultdict(list)

edges = [
    ('A', 'B'),
    ('A', 'C'),
    ('B', 'D'),
    ('C', 'D'),
    ('D', 'E')
]

for source, dest in edges:
    graph[source].append(dest)

print(dict(graph))
# Output: {'A': ['B', 'C'], 'B': ['D'], 'C': ['D'], 'D': ['E']}

# Weighted graph with nested defaultdict
weighted_graph = defaultdict(lambda: defaultdict(int))
weighted_edges = [
    ('A', 'B', 5),
    ('A', 'C', 3),
    ('B', 'C', 2)
]

for source, dest, weight in weighted_edges:
    weighted_graph[source][dest] = weight

# BFS traversal using the graph
def bfs(graph, start):
    visited = set()
    queue = [start]
    result = []
    
    while queue:
        node = queue.pop(0)
        if node not in visited:
            visited.add(node)
            result.append(node)
            queue.extend(graph[node])
    
    return result

print(bfs(graph, 'A'))  # Output: ['A', 'B', 'C', 'D', 'E']

Converting Between DefaultDict and Regular Dict

DefaultDict can be converted to regular dictionaries when needed for serialization or API compatibility.

from collections import defaultdict
import json

# Create and populate defaultdict
dd = defaultdict(list)
dd['fruits'].extend(['apple', 'banana'])
dd['vegetables'].append('carrot')

# Convert to regular dict
regular = dict(dd)
print(type(regular))  # Output: <class 'dict'>

# JSON serialization requires regular dict
json_string = json.dumps(regular)
print(json_string)
# Output: {"fruits": ["apple", "banana"], "vegetables": ["carrot"]}

# Converting back to defaultdict
loaded = json.loads(json_string)
dd_restored = defaultdict(list, loaded)
dd_restored['new_category'].append('item')  # Works with default behavior

Performance Considerations and Gotchas

DefaultDict provides O(1) average-case performance for key access, identical to regular dictionaries. However, be aware of these behaviors:

from collections import defaultdict

# Missing key access creates the key
dd = defaultdict(int)
value = dd['never_assigned']  # Key is now created
print('never_assigned' in dd)  # Output: True
print(len(dd))  # Output: 1

# This can lead to unintended key creation
dd = defaultdict(list)
if dd['check']:  # Creates empty list, which is falsy
    print("This won't execute")
print(len(dd))  # Output: 1 - key was created!

# Use 'in' operator to check without creating
dd = defaultdict(int)
if 'key' in dd:
    value = dd['key']
else:
    print("Key doesn't exist and wasn't created")

# Access default_factory attribute
dd = defaultdict(list)
print(dd.default_factory)  # Output: <class 'list'>

# Can modify default_factory
dd.default_factory = set
dd['new_key'].add('item')  # Now creates sets instead of lists

DefaultDict excels at eliminating conditional key checks and reducing code complexity. Use it when you have predictable default values and want cleaner aggregation, grouping, or counting logic. For cases requiring explicit key existence checks or custom missing key handling, stick with regular dictionaries or implement __missing__ in a custom dict subclass.