Composite Pattern in Python: File System Example
The Composite pattern is a structural design pattern that lets you compose objects into tree structures and then work with those structures as if they were individual objects. The core insight is...
Key Insights
- The Composite pattern lets you treat individual objects and groups of objects uniformly through a shared interface, eliminating scattered type-checking logic
- File systems are the canonical example because directories contain both files and other directories, yet both need common operations like calculating size or searching
- The pattern trades some type safety for flexibility—you can add children to any component, so runtime checks or careful design prevent invalid operations
Introduction to the Composite Pattern
The Composite pattern is a structural design pattern that lets you compose objects into tree structures and then work with those structures as if they were individual objects. The core insight is simple: when you have a hierarchy where containers hold both primitive elements and other containers, give them all the same interface.
This pattern shows up constantly in software. GUI frameworks use it for nested widgets. Document processors use it for paragraphs containing text and images. Organization charts use it for departments containing employees and sub-departments. But the clearest example—and the one we’ll build today—is a file system.
A directory can contain files. It can also contain other directories. Yet when you ask “what’s the total size of this folder?” you don’t want to care about that distinction. You want to call get_size() and get an answer.
The Problem: Managing Hierarchical Data
Let’s start with the naive approach. You have files and directories, and you need to calculate sizes, search for items, and display the structure. Here’s what happens without the Composite pattern:
class File:
def __init__(self, name: str, size: int):
self.name = name
self.size = size
class Directory:
def __init__(self, name: str):
self.name = name
self.contents: list = []
def add(self, item):
self.contents.append(item)
def calculate_size(item) -> int:
if isinstance(item, File):
return item.size
elif isinstance(item, Directory):
total = 0
for child in item.contents:
if isinstance(child, File):
total += child.size
elif isinstance(child, Directory):
total += calculate_size(child)
return total
else:
raise TypeError(f"Unknown type: {type(item)}")
def search(item, query: str) -> list:
results = []
if isinstance(item, File):
if query in item.name:
results.append(item)
elif isinstance(item, Directory):
if query in item.name:
results.append(item)
for child in item.contents:
results.extend(search(child, query))
return results
This works, but it’s ugly. Every operation requires type checking. The logic for traversing the hierarchy is duplicated. Adding a new type (say, a symlink) means updating every function. And the operations live outside the classes, violating basic object-oriented principles.
The Composite pattern eliminates this mess by pushing the behavior into the objects themselves behind a common interface.
Core Components of the Pattern
The Composite pattern has three participants:
- Component: The abstract base class or interface that declares operations common to both simple and complex objects
- Leaf: A primitive object with no children (files in our case)
- Composite: A container that holds children and implements operations by delegating to those children (directories)
Here’s our abstract base:
from abc import ABC, abstractmethod
from typing import Iterator, Optional
class FileSystemComponent(ABC):
def __init__(self, name: str):
self.name = name
self.parent: Optional['FileSystemComponent'] = None
@abstractmethod
def get_size(self) -> int:
"""Return size in bytes."""
pass
@abstractmethod
def display(self, indent: int = 0) -> str:
"""Return a string representation with proper indentation."""
pass
@abstractmethod
def search(self, query: str) -> Iterator['FileSystemComponent']:
"""Yield all components matching the query."""
pass
def get_path(self) -> str:
"""Return the full path to this component."""
if self.parent is None:
return self.name
return f"{self.parent.get_path()}/{self.name}"
# These methods only make sense for composites, but we define them here
# with default implementations that raise errors for leaves
def add(self, component: 'FileSystemComponent') -> None:
raise NotImplementedError("Cannot add to a leaf component")
def remove(self, component: 'FileSystemComponent') -> None:
raise NotImplementedError("Cannot remove from a leaf component")
Notice the add() and remove() methods on the base class. This is a design decision with trade-offs. By putting them here with default implementations that raise errors, we maintain a uniform interface. The alternative—putting them only on Directory—is more type-safe but forces clients to know which type they’re dealing with.
Implementing the File System
Now let’s build the concrete classes. The File class is straightforward—it’s a leaf with no children:
class File(FileSystemComponent):
def __init__(self, name: str, size: int):
super().__init__(name)
self._size = size
def get_size(self) -> int:
return self._size
def display(self, indent: int = 0) -> str:
return " " * indent + f"📄 {self.name} ({self._size} bytes)"
def search(self, query: str) -> Iterator[FileSystemComponent]:
if query.lower() in self.name.lower():
yield self
The Directory class is where the pattern shines. It implements operations by delegating to its children:
class Directory(FileSystemComponent):
def __init__(self, name: str):
super().__init__(name)
self._children: list[FileSystemComponent] = []
def get_size(self) -> int:
# Recursive: sum of all children's sizes
return sum(child.get_size() for child in self._children)
def display(self, indent: int = 0) -> str:
lines = [" " * indent + f"📁 {self.name}/"]
for child in self._children:
lines.append(child.display(indent + 2))
return "\n".join(lines)
def search(self, query: str) -> Iterator[FileSystemComponent]:
# Check self first
if query.lower() in self.name.lower():
yield self
# Then delegate to children
for child in self._children:
yield from child.search(query)
def add(self, component: FileSystemComponent) -> None:
self._children.append(component)
component.parent = self
def remove(self, component: FileSystemComponent) -> None:
self._children.remove(component)
component.parent = None
def __iter__(self) -> Iterator[FileSystemComponent]:
return iter(self._children)
def __len__(self) -> int:
return len(self._children)
The key insight is in get_size(). A directory’s size is the sum of its children’s sizes. Each child might be a file (returning its own size) or another directory (recursively computing its subtree’s size). The caller doesn’t care. They just call get_size().
Putting It Together: Usage Examples
Let’s build a realistic directory structure and see the pattern in action:
def create_sample_filesystem() -> Directory:
# Create root
root = Directory("project")
# Source directory
src = Directory("src")
src.add(File("main.py", 2048))
src.add(File("utils.py", 1024))
# Nested module
models = Directory("models")
models.add(File("user.py", 512))
models.add(File("product.py", 768))
models.add(File("__init__.py", 64))
src.add(models)
# Tests directory
tests = Directory("tests")
tests.add(File("test_main.py", 1536))
tests.add(File("test_utils.py", 1024))
# Config files at root
root.add(src)
root.add(tests)
root.add(File("README.md", 4096))
root.add(File("pyproject.toml", 512))
return root
# Build the filesystem
fs = create_sample_filesystem()
# Display the entire tree
print(fs.display())
# Output:
# 📁 project/
# 📁 src/
# 📄 main.py (2048 bytes)
# 📄 utils.py (1024 bytes)
# 📁 models/
# 📄 user.py (512 bytes)
# 📄 product.py (768 bytes)
# 📄 __init__.py (64 bytes)
# 📁 tests/
# 📄 test_main.py (1536 bytes)
# 📄 test_utils.py (1024 bytes)
# 📄 README.md (4096 bytes)
# 📄 pyproject.toml (512 bytes)
# Calculate total size - works on any component
print(f"Total project size: {fs.get_size()} bytes") # 11584 bytes
# Search works uniformly across the tree
print("\nFiles containing 'test':")
for item in fs.search("test"):
print(f" {item.get_path()}")
# Output:
# project/tests
# project/tests/test_main.py
# project/tests/test_utils.py
# Get size of just the src directory
src_dir = next(child for child in fs if child.name == "src")
print(f"\nSource code size: {src_dir.get_size()} bytes") # 4416 bytes
The beauty is that every operation works identically whether you call it on a single file, a directory, or the entire filesystem. No type checking. No special cases. Just polymorphism doing its job.
Trade-offs and When to Use
The Composite pattern isn’t free. Here are the trade-offs:
Benefits:
- Uniform treatment of individual objects and compositions
- Easy to add new component types without changing existing code
- Client code stays simple—no type checking or special cases
- Recursive operations become trivial to implement
Drawbacks:
- Can make designs overly general when you don’t need that flexibility
- Type safety suffers—nothing prevents you from trying to add a file to another file at compile time
- The shared interface might include methods that don’t make sense for all types (like
add()on a file)
When to use it:
- You have a tree-like hierarchical structure
- You want clients to treat leaves and composites uniformly
- You expect to traverse the structure and apply operations recursively
Alternatives:
- If you need to add operations without modifying classes, combine Composite with the Visitor pattern
- If the hierarchy is shallow or operations differ significantly between types, simple inheritance might suffice
- For immutable structures, consider a functional approach with pattern matching
Conclusion
The Composite pattern solves a specific problem elegantly: treating individual objects and groups of objects uniformly. The file system example demonstrates this perfectly—calculating size, searching, and displaying work identically whether you’re dealing with a single file or a deeply nested directory tree.
The pattern’s power comes from pushing behavior into the objects themselves and letting polymorphism handle the recursion. Your client code calls get_size() and doesn’t care whether it’s asking a file or a directory containing thousands of files.
Beyond file systems, consider the Composite pattern for:
- UI component trees (panels containing buttons, labels, and other panels)
- Organization hierarchies (departments containing employees and sub-departments)
- Menu systems (menus containing items and sub-menus)
- Document structures (sections containing paragraphs, images, and sub-sections)
- Mathematical expressions (operations containing operands and sub-expressions)
The next time you find yourself writing isinstance() checks to handle containers differently from their contents, reach for the Composite pattern. Your future self will thank you.