Python - Working with Paths (pathlib)

The `pathlib` module, introduced in Python 3.4, replaces string-based path manipulation with Path objects. This eliminates common errors from manual string concatenation and platform-specific...

Key Insights

  • pathlib provides object-oriented path manipulation that’s more intuitive and safer than string concatenation with os.path
  • Path objects work seamlessly across Windows, Linux, and macOS without manual separator handling
  • The module combines path construction, file I/O, and filesystem operations in a single, chainable interface

Why pathlib Over os.path

The pathlib module, introduced in Python 3.4, replaces string-based path manipulation with Path objects. This eliminates common errors from manual string concatenation and platform-specific separators.

from pathlib import Path
import os

# Old way - error-prone
old_path = os.path.join('data', 'users', 'profile.json')

# New way - explicit and safe
new_path = Path('data') / 'users' / 'profile.json'

# Works on Windows (data\users\profile.json) and Unix (data/users/profile.json)
print(new_path)

The / operator overload makes path construction readable and prevents separator mistakes. Path objects also provide type safety - you can’t accidentally pass a string where a Path is expected without explicit conversion.

Creating and Navigating Paths

Path objects support both absolute and relative paths, with methods to resolve, navigate, and inspect the filesystem hierarchy.

from pathlib import Path

# Current working directory
cwd = Path.cwd()
print(f"Working directory: {cwd}")

# Home directory
home = Path.home()
print(f"Home: {home}")

# Absolute path
abs_path = Path('/var/log/app.log')

# Relative path
rel_path = Path('config/settings.yaml')

# Resolve relative to absolute
resolved = rel_path.resolve()
print(f"Resolved: {resolved}")

# Navigate up the tree
parent = cwd.parent
grandparent = cwd.parent.parent

# Or use parents indexing
print(cwd.parents[0])  # Same as parent
print(cwd.parents[1])  # Same as grandparent

The resolve() method is critical for converting relative paths to absolute ones, following symlinks, and eliminating . and .. components.

Path Components and Properties

Path objects expose individual components as properties, eliminating regex or string splitting.

from pathlib import Path

path = Path('/home/user/projects/app/src/main.py')

print(f"Name: {path.name}")           # main.py
print(f"Stem: {path.stem}")           # main
print(f"Suffix: {path.suffix}")       # .py
print(f"Parent: {path.parent}")       # /home/user/projects/app/src
print(f"Parts: {path.parts}")         # ('/', 'home', 'user', 'projects', 'app', 'src', 'main.py')

# Multiple suffixes
archive = Path('backup.tar.gz')
print(f"Suffixes: {archive.suffixes}")  # ['.tar', '.gz']

# Check if path is absolute
print(f"Is absolute: {path.is_absolute()}")  # True

# Get anchor (root on Unix, drive on Windows)
print(f"Anchor: {path.anchor}")  # /

These properties are immutable - modifying a path creates a new Path object rather than mutating the original.

File and Directory Operations

pathlib integrates common filesystem operations directly into Path objects, reducing imports and improving discoverability.

from pathlib import Path

# Check existence and type
config_file = Path('config.json')

if config_file.exists():
    print(f"Is file: {config_file.is_file()}")
    print(f"Is directory: {config_file.is_dir()}")
    print(f"Is symlink: {config_file.is_symlink()}")

# Create directories
data_dir = Path('data/processed/2024')
data_dir.mkdir(parents=True, exist_ok=True)

# parents=True creates intermediate directories
# exist_ok=True prevents errors if directory exists

# Create a file
log_file = Path('logs/app.log')
log_file.parent.mkdir(parents=True, exist_ok=True)
log_file.touch()  # Creates empty file or updates timestamp

# Delete operations
temp_file = Path('temp.txt')
temp_file.unlink(missing_ok=True)  # Delete file, no error if missing

empty_dir = Path('empty')
empty_dir.rmdir()  # Only works on empty directories

For recursive directory deletion, use shutil.rmtree() with a Path object - pathlib intentionally omits this dangerous operation.

Reading and Writing Files

Path objects provide methods for common file I/O operations without explicitly opening file handles.

from pathlib import Path
import json

# Read entire file as string
config_path = Path('config.txt')
content = config_path.read_text(encoding='utf-8')

# Read as bytes
binary_data = config_path.read_bytes()

# Write text
output_path = Path('output.txt')
output_path.write_text('Processing complete\n', encoding='utf-8')

# Write bytes
output_path.write_bytes(b'\x00\x01\x02')

# For line-by-line processing, use open()
log_path = Path('app.log')
with log_path.open('r', encoding='utf-8') as f:
    for line in f:
        if 'ERROR' in line:
            print(line.strip())

# JSON example
data = {'status': 'active', 'count': 42}
json_path = Path('data.json')
json_path.write_text(json.dumps(data, indent=2))

# Read JSON
loaded_data = json.loads(json_path.read_text())

The read_text() and write_text() methods are convenient for small files. For large files, use open() to avoid loading everything into memory.

Glob Patterns and Iteration

pathlib provides powerful pattern matching for finding files, replacing glob.glob() with an object-oriented interface.

from pathlib import Path

project_root = Path('.')

# Find all Python files recursively
py_files = list(project_root.rglob('*.py'))
print(f"Found {len(py_files)} Python files")

# Non-recursive glob
config_files = list(project_root.glob('*.json'))

# Pattern matching
test_files = list(project_root.rglob('test_*.py'))

# Iterate over directory contents
src_dir = Path('src')
if src_dir.exists():
    for item in src_dir.iterdir():
        if item.is_file():
            print(f"File: {item.name} ({item.stat().st_size} bytes)")
        elif item.is_dir():
            print(f"Directory: {item.name}")

# Filter with generator expression
large_files = (f for f in project_root.rglob('*') 
               if f.is_file() and f.stat().st_size > 1_000_000)

for large_file in large_files:
    print(f"{large_file}: {large_file.stat().st_size / 1_000_000:.2f} MB")

The glob() method matches patterns in the current directory, while rglob() searches recursively. Both return generators for memory efficiency with large result sets.

File Metadata and Permissions

Path objects expose file metadata through the stat() method, returning an os.stat_result object.

from pathlib import Path
import time

file_path = Path('data.csv')

if file_path.exists():
    stats = file_path.stat()
    
    # Size in bytes
    print(f"Size: {stats.st_size:,} bytes")
    
    # Timestamps
    modified = time.ctime(stats.st_mtime)
    created = time.ctime(stats.st_ctime)
    accessed = time.ctime(stats.st_atime)
    
    print(f"Modified: {modified}")
    print(f"Created: {created}")
    print(f"Accessed: {accessed}")
    
    # Permissions (Unix-like systems)
    mode = oct(stats.st_mode)[-3:]
    print(f"Permissions: {mode}")

# Change permissions (Unix-like)
script_path = Path('deploy.sh')
if script_path.exists():
    script_path.chmod(0o755)  # rwxr-xr-x

On Windows, some metadata like st_ctime behaves differently - it represents creation time rather than metadata change time.

Practical Pattern: Configuration File Discovery

A common pattern is searching upward through directories to find configuration files, similar to how Git finds .git directories.

from pathlib import Path

def find_config(start_path: Path, config_name: str) -> Path | None:
    """Search upward for configuration file."""
    current = start_path.resolve()
    
    while True:
        config_path = current / config_name
        if config_path.exists():
            return config_path
        
        # Reached filesystem root
        if current == current.parent:
            return None
            
        current = current.parent

# Usage
config = find_config(Path.cwd(), 'pyproject.toml')
if config:
    print(f"Found config: {config}")
    settings = config.read_text()
else:
    print("No config found in tree")

This pattern is more robust than hardcoded relative paths and adapts to different project structures.

Cross-Platform Path Handling

pathlib automatically handles platform differences, but you can force specific path types when needed.

from pathlib import Path, PurePosixPath, PureWindowsPath

# Current platform
native_path = Path('data/file.txt')

# Force Unix-style (no filesystem access)
posix_path = PurePosixPath('data/file.txt')
print(posix_path)  # data/file.txt

# Force Windows-style (no filesystem access)
windows_path = PureWindowsPath('data/file.txt')
print(windows_path)  # data\file.txt

# Convert between types
path_str = str(native_path)
reconstructed = Path(path_str)

# Use PurePath variants for path manipulation without filesystem access
# Useful for generating paths for remote systems

The PurePath variants perform path operations without filesystem access, making them safe for manipulating paths for different platforms or remote systems.

pathlib’s object-oriented approach eliminates entire classes of path-related bugs while providing a more discoverable API than string-based alternatives. The integration of path construction, inspection, and filesystem operations into a single interface reduces cognitive load and improves code maintainability.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.