Memory-Mapped Files: Direct File Access

Key Insights

Memory-mapped files eliminate explicit read/write syscalls by mapping file contents directly to virtual memory, letting you treat files as byte arrays—but this trades syscall overhead for page fault overhead.
Mmap excels at random access patterns and shared data between processes, but traditional I/O often wins for sequential reads and small files due to lower page fault costs.
Proper error handling is critical: file truncation causes SIGBUS crashes, 32-bit systems hit address space limits quickly, and msync() is required for durability guarantees.

What Are Memory-Mapped Files?

Traditional file I/O follows a predictable pattern: open a file, read bytes into a buffer, process them, write results back. Every read and write involves a syscall—a context switch into kernel mode where the OS copies data between kernel buffers and your application’s memory.

Memory-mapped files flip this model. Instead of copying data, you ask the OS to map the file’s contents directly into your process’s virtual address space. The file appears as a contiguous byte array. Reading byte 1,000,000 is just a pointer dereference. Writing is a memory store. No explicit syscalls, no buffer management, no copying.

This isn’t magic—it’s virtual memory doing what it was designed for. The same mechanism that lets your program use more memory than physically available also lets files masquerade as RAM.

How Memory Mapping Works Under the Hood

When you call mmap(), the kernel doesn’t immediately load the file into memory. It creates entries in your process’s page table that point to the file on disk. The actual memory pages are marked as “not present.”

When your code first accesses a mapped address, the CPU triggers a page fault. The kernel intercepts this, reads the corresponding file block from disk into a physical memory page, updates the page table to point to this new page, and resumes your program. This is demand paging—data loads only when touched.

Subsequent accesses to the same page hit physical RAM directly. The Translation Lookaside Buffer (TLB) caches page table entries, making repeated accesses nearly as fast as regular memory operations.

#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>

int main() {
    int fd = open("data.bin", O_RDONLY);
    if (fd == -1) return 1;

    struct stat sb;
    fstat(fd, &sb);

    // Map entire file into memory
    char *mapped = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
    if (mapped == MAP_FAILED) {
        close(fd);
        return 1;
    }

    // Direct pointer access—no read() calls
    printf("First byte: 0x%02x\n", (unsigned char)mapped[0]);
    printf("Byte at offset 1000: 0x%02x\n", (unsigned char)mapped[1000]);

    munmap(mapped, sb.st_size);
    close(fd);
    return 0;
}

The file descriptor can be closed immediately after mapping—the mapping maintains its own reference to the underlying file.

Performance Characteristics and Trade-offs

Mmap’s performance advantage comes from eliminating syscall overhead and data copying. Each read() call requires a context switch and a copy from kernel buffer to user buffer. With mmap, data moves directly from disk to your address space.

But mmap isn’t universally faster. Page faults have overhead too—each fault involves a kernel trap, page table manipulation, and potentially disk I/O. For sequential access, the OS can’t predict your access pattern as effectively as it can with read(), which triggers aggressive readahead.

Mmap wins when:

Random access dominates (databases, search indices)
Multiple processes share the same file (shared libraries, IPC)
You’re accessing a small subset of a large file
The working set fits in RAM and gets reused

Traditional I/O wins when:

Sequential streaming through a file once
Files are small (syscall overhead is negligible)
You need precise control over caching behavior
Portability across exotic filesystems matters

// Benchmark: Random access comparison
#include <sys/mman.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <time.h>

#define FILE_SIZE (1L << 30)  // 1 GB
#define NUM_READS 100000

void benchmark_mmap(const char *path) {
    int fd = open(path, O_RDONLY);
    char *map = mmap(NULL, FILE_SIZE, PROT_READ, MAP_PRIVATE, fd, 0);
    
    clock_t start = clock();
    volatile char c;
    for (int i = 0; i < NUM_READS; i++) {
        size_t offset = (rand() * (size_t)rand()) % FILE_SIZE;
        c = map[offset];  // Direct memory access
    }
    printf("mmap: %.3f seconds\n", (double)(clock() - start) / CLOCKS_PER_SEC);
    
    munmap(map, FILE_SIZE);
    close(fd);
}

void benchmark_fread(const char *path) {
    FILE *f = fopen(path, "rb");
    char buf[1];
    
    clock_t start = clock();
    for (int i = 0; i < NUM_READS; i++) {
        size_t offset = (rand() * (size_t)rand()) % FILE_SIZE;
        fseek(f, offset, SEEK_SET);
        fread(buf, 1, 1, f);
    }
    printf("fread: %.3f seconds\n", (double)(clock() - start) / CLOCKS_PER_SEC);
    
    fclose(f);
}

On typical systems with warm caches, mmap random access runs 2-5x faster than fread for this pattern. The gap widens with more reads.

Practical Use Cases

Database storage engines are the canonical mmap use case. SQLite offers an mmap mode. LMDB uses mmap exclusively. The random access patterns of B-tree traversal align perfectly with mmap’s strengths.

Inter-process communication becomes trivial with MAP_SHARED. Two processes mapping the same file see each other’s writes immediately (after appropriate synchronization). No serialization, no pipes, no sockets—just shared memory backed by a file.

Large file processing benefits when you only need portions of the data. A 100GB log file can be mapped entirely; the OS loads only the pages you touch.

import mmap
import re

def search_large_log(filepath: str, pattern: str) -> list[tuple[int, str]]:
    """Search a multi-gigabyte log file without loading it entirely."""
    matches = []
    regex = re.compile(pattern.encode())
    
    with open(filepath, 'rb') as f:
        # Memory-map the file; OS handles paging
        with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
            for match in regex.finditer(mm):
                # Find line boundaries
                line_start = mm.rfind(b'\n', 0, match.start()) + 1
                line_end = mm.find(b'\n', match.end())
                line = mm[line_start:line_end].decode('utf-8', errors='replace')
                matches.append((match.start(), line))
    
    return matches

# Search a 50GB log file—only touched pages load into RAM
results = search_large_log('/var/log/massive.log', r'ERROR.*timeout')

Python’s mmap supports slicing, searching, and even regex directly on the mapped region.

Common Pitfalls and Safety Considerations

Address space exhaustion kills 32-bit applications. A 32-bit process has roughly 3GB of usable address space. Map a 4GB file and you’re dead before you start. Even on 64-bit systems, mapping terabytes fragments your address space.

File truncation crashes your program. If another process truncates a mapped file, accessing the now-invalid region triggers SIGBUS—not a catchable exception, a signal that typically kills your process. There’s no portable way to handle this gracefully.

Writes aren’t durable without msync(). Modified pages sit in the page cache indefinitely. The OS flushes them eventually, but “eventually” might be after a power failure. Call msync() with MS_SYNC for durability guarantees.

#include <sys/mman.h>
#include <signal.h>
#include <setjmp.h>

static sigjmp_buf jump_buffer;
static volatile sig_atomic_t got_sigbus = 0;

void sigbus_handler(int sig) {
    got_sigbus = 1;
    siglongjmp(jump_buffer, 1);
}

int safe_mmap_access(char *mapped, size_t offset, size_t len) {
    struct sigaction sa = {.sa_handler = sigbus_handler};
    struct sigaction old_sa;
    sigaction(SIGBUS, &sa, &old_sa);
    
    if (sigsetjmp(jump_buffer, 1) == 0) {
        // Try to access the mapped region
        volatile char c = mapped[offset];
        (void)c;
        sigaction(SIGBUS, &old_sa, NULL);
        return 0;  // Success
    } else {
        // SIGBUS caught—file was truncated or mapping invalid
        sigaction(SIGBUS, &old_sa, NULL);
        return -1;  // Failure
    }
}

This SIGBUS handling is fragile and non-portable. The real solution: don’t let files get truncated while mapped, or accept that truncation means process death.

Cross-Platform Implementation Notes

POSIX mmap() and Windows CreateFileMapping()/MapViewOfFile() achieve the same result with different APIs. Windows requires creating an explicit file mapping object before creating views—an extra step that enables some additional sharing semantics.

High-level languages abstract these differences. Rust’s memmap2 crate provides safe, cross-platform memory mapping:

use memmap2::{Mmap, MmapOptions};
use std::fs::File;
use std::io::Result;

fn process_large_file(path: &str) -> Result<u64> {
    let file = File::open(path)?;
    
    // Safe, cross-platform memory mapping
    let mmap: Mmap = unsafe { MmapOptions::new().map(&file)? };
    
    // Treat file as a byte slice
    let bytes: &[u8] = &mmap[..];
    
    // Count newlines without loading entire file
    let count = bytes.iter().filter(|&&b| b == b'\n').count();
    
    Ok(count as u64)
}

fn main() -> Result<()> {
    let lines = process_large_file("huge_dataset.csv")?;
    println!("Lines: {}", lines);
    Ok(())
}

The unsafe block is required because Rust can’t guarantee the underlying file won’t be modified or truncated—the same SIGBUS problem exists regardless of language.

Java’s MappedByteBuffer and Python’s mmap module provide similar abstractions with their own trade-offs around garbage collection and buffer management.

Conclusion

Memory-mapped files are a powerful tool when your access patterns align with their strengths: random access, shared data, and working sets that benefit from OS-managed caching. They’re not a universal performance win—sequential streaming and small files often fare better with traditional I/O.

Use mmap when you’re building databases, search indices, or IPC mechanisms. Avoid it when you need precise control over caching, when files might be truncated externally, or when you’re targeting 32-bit systems with large files.

The abstraction is elegant—files as memory—but the implementation details matter. Understand page faults, respect address space limits, and always have a plan for SIGBUS.