Processes: Process Creation and IPC

A process is an instance of a running program with its own memory space, file descriptors, and system resources. Unlike threads, which share memory within a process, processes are isolated from each...

Key Insights

  • Process creation via fork() uses copy-on-write semantics, making it efficient for spawning child processes, but understanding the parent-child relationship is critical to avoiding zombie processes and resource leaks.
  • IPC mechanism selection should be driven by your specific requirements: pipes for simple parent-child streaming, shared memory for high-throughput data exchange, and message queues or sockets when you need structured communication between unrelated processes.
  • Modern applications often benefit from higher-level abstractions like Unix domain sockets or gRPC, but understanding the underlying primitives helps you debug performance issues and make informed architectural decisions.

Introduction to Processes

A process is an instance of a running program with its own memory space, file descriptors, and system resources. Unlike threads, which share memory within a process, processes are isolated from each other by the operating system’s memory protection mechanisms. This isolation provides security and stability—a crash in one process won’t bring down another—but it also means communication between processes requires explicit mechanisms.

For application architects, understanding process management matters because many systems rely on multi-process architectures: web servers forking worker processes, databases using separate processes for different tasks, and microservices communicating over IPC. Getting these fundamentals wrong leads to resource leaks, zombie processes accumulating in your process table, and subtle race conditions that only manifest under load.

Process Creation Fundamentals

Unix-like systems create processes primarily through fork() and the exec() family of functions. The fork() system call creates a near-exact copy of the calling process, while exec() replaces the current process image with a new program.

The key insight is copy-on-write (COW) semantics. When you call fork(), the kernel doesn’t immediately duplicate all memory pages. Instead, both parent and child share the same physical pages marked as read-only. Only when either process attempts to write does the kernel copy that specific page. This makes fork() surprisingly cheap, even for processes with large memory footprints.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

int main() {
    pid_t pid = fork();
    
    if (pid < 0) {
        perror("fork failed");
        exit(1);
    } else if (pid == 0) {
        // Child process
        printf("Child PID: %d, Parent PID: %d\n", getpid(), getppid());
        
        // Replace child with a new program
        execlp("ls", "ls", "-la", NULL);
        
        // Only reached if exec fails
        perror("exec failed");
        exit(1);
    } else {
        // Parent process
        printf("Parent PID: %d, created child: %d\n", getpid(), pid);
        
        int status;
        waitpid(pid, &status, 0);
        
        if (WIFEXITED(status)) {
            printf("Child exited with status: %d\n", WEXITSTATUS(status));
        }
    }
    
    return 0;
}

Python’s multiprocessing module abstracts these details while providing similar functionality:

import multiprocessing
import os

def worker(name):
    print(f"Worker {name}: PID={os.getpid()}, Parent={os.getppid()}")
    return f"Result from {name}"

if __name__ == "__main__":
    with multiprocessing.Pool(processes=4) as pool:
        results = pool.map(worker, ["A", "B", "C", "D"])
        print(f"Results: {results}")

The spawn method (default on Windows, optional on Unix) creates a fresh Python interpreter rather than forking, avoiding issues with forking multi-threaded processes but incurring higher startup costs.

Process Lifecycle Management

Processes move through several states: running, sleeping (waiting for I/O or a signal), stopped, and zombie. The zombie state is particularly important to understand—it occurs when a child process terminates but its parent hasn’t yet called wait() to retrieve its exit status.

Zombie processes consume minimal resources (just a process table entry), but they accumulate if not reaped. Orphan processes—those whose parent terminates first—are adopted by init (PID 1), which automatically reaps them.

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <sys/wait.h>
#include <unistd.h>

volatile sig_atomic_t child_count = 0;

void sigchld_handler(int sig) {
    int saved_errno = errno;
    pid_t pid;
    int status;
    
    // Reap all available zombie children
    while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {
        child_count--;
        // Log or handle the exit status as needed
    }
    
    errno = saved_errno;
}

int main() {
    struct sigaction sa;
    sa.sa_handler = sigchld_handler;
    sigemptyset(&sa.sa_mask);
    sa.sa_flags = SA_RESTART | SA_NOCLDSTOP;
    
    if (sigaction(SIGCHLD, &sa, NULL) == -1) {
        perror("sigaction");
        exit(1);
    }
    
    // Fork multiple children
    for (int i = 0; i < 5; i++) {
        if (fork() == 0) {
            sleep(i + 1);
            exit(i);
        }
        child_count++;
    }
    
    // Parent continues working while children are reaped asynchronously
    while (child_count > 0) {
        printf("Working... %d children remaining\n", child_count);
        sleep(1);
    }
    
    return 0;
}

The WNOHANG flag in waitpid() is crucial—it prevents blocking when called from a signal handler and allows reaping multiple children that may have exited simultaneously.

Inter-Process Communication: Pipes and FIFOs

Pipes are the simplest IPC mechanism. Anonymous pipes work between related processes (parent-child), while named pipes (FIFOs) allow communication between unrelated processes.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main() {
    int pipefd[2];  // pipefd[0] for reading, pipefd[1] for writing
    
    if (pipe(pipefd) == -1) {
        perror("pipe");
        exit(1);
    }
    
    pid_t pid = fork();
    
    if (pid == 0) {
        // Child: producer
        close(pipefd[0]);  // Close unused read end
        
        const char *messages[] = {"Hello", "World", "From", "Child"};
        for (int i = 0; i < 4; i++) {
            write(pipefd[1], messages[i], strlen(messages[i]) + 1);
            usleep(100000);
        }
        
        close(pipefd[1]);
        exit(0);
    } else {
        // Parent: consumer
        close(pipefd[1]);  // Close unused write end
        
        char buffer[256];
        ssize_t bytes;
        
        while ((bytes = read(pipefd[0], buffer, sizeof(buffer))) > 0) {
            printf("Received: %s\n", buffer);
        }
        
        close(pipefd[0]);
        wait(NULL);
    }
    
    return 0;
}

For bidirectional communication, you need two pipes. Named pipes are created with mkfifo() and appear as special files in the filesystem, enabling communication between any processes that can access the file.

Shared Memory and Memory-Mapped Files

When you need high-throughput data sharing, shared memory eliminates the copy overhead inherent in pipes and message queues. POSIX shared memory uses shm_open() to create a shared memory object and mmap() to map it into the process’s address space.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <semaphore.h>
#include <unistd.h>

#define SHM_NAME "/my_shared_mem"
#define SEM_NAME "/my_semaphore"
#define SHM_SIZE 4096

typedef struct {
    int counter;
    char message[256];
} SharedData;

int main() {
    // Create shared memory
    int shm_fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666);
    ftruncate(shm_fd, SHM_SIZE);
    
    SharedData *shared = mmap(NULL, SHM_SIZE, 
                               PROT_READ | PROT_WRITE, 
                               MAP_SHARED, shm_fd, 0);
    
    // Create semaphore for synchronization
    sem_t *sem = sem_open(SEM_NAME, O_CREAT, 0666, 1);
    
    if (fork() == 0) {
        // Child: writer
        for (int i = 0; i < 10; i++) {
            sem_wait(sem);
            shared->counter = i;
            snprintf(shared->message, 256, "Update %d from child", i);
            sem_post(sem);
            usleep(50000);
        }
        exit(0);
    } else {
        // Parent: reader
        for (int i = 0; i < 10; i++) {
            sem_wait(sem);
            printf("Counter: %d, Message: %s\n", 
                   shared->counter, shared->message);
            sem_post(sem);
            usleep(50000);
        }
        wait(NULL);
    }
    
    // Cleanup
    munmap(shared, SHM_SIZE);
    shm_unlink(SHM_NAME);
    sem_close(sem);
    sem_unlink(SEM_NAME);
    
    return 0;
}

The semaphore is essential—without synchronization, you’ll encounter race conditions. For more complex scenarios, consider using read-write locks or lock-free data structures.

Message-Based IPC: Queues and Sockets

POSIX message queues provide structured message passing with priority support. Unlike pipes, messages have defined boundaries, eliminating the need to implement your own framing protocol.

Unix domain sockets offer the most flexible IPC mechanism. They support both stream (like TCP) and datagram (like UDP) modes, work with select()/poll()/epoll(), and can pass file descriptors between processes.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <unistd.h>

#define SOCKET_PATH "/tmp/my_socket"

void run_server() {
    int server_fd = socket(AF_UNIX, SOCK_STREAM, 0);
    
    struct sockaddr_un addr = {0};
    addr.sun_family = AF_UNIX;
    strncpy(addr.sun_path, SOCKET_PATH, sizeof(addr.sun_path) - 1);
    
    unlink(SOCKET_PATH);
    bind(server_fd, (struct sockaddr*)&addr, sizeof(addr));
    listen(server_fd, 5);
    
    int client_fd = accept(server_fd, NULL, NULL);
    
    char buffer[256];
    ssize_t bytes = read(client_fd, buffer, sizeof(buffer));
    printf("Server received: %s\n", buffer);
    
    write(client_fd, "ACK", 4);
    
    close(client_fd);
    close(server_fd);
    unlink(SOCKET_PATH);
}

void run_client() {
    sleep(1);  // Wait for server
    
    int client_fd = socket(AF_UNIX, SOCK_STREAM, 0);
    
    struct sockaddr_un addr = {0};
    addr.sun_family = AF_UNIX;
    strncpy(addr.sun_path, SOCKET_PATH, sizeof(addr.sun_path) - 1);
    
    connect(client_fd, (struct sockaddr*)&addr, sizeof(addr));
    
    write(client_fd, "Hello from client", 18);
    
    char buffer[256];
    read(client_fd, buffer, sizeof(buffer));
    printf("Client received: %s\n", buffer);
    
    close(client_fd);
}

Choosing the Right IPC Mechanism

Mechanism Latency Throughput Complexity Best For
Pipes Low Medium Low Parent-child streaming
Shared Memory Lowest Highest High High-frequency data sharing
Message Queues Medium Medium Medium Structured messages with priorities
Unix Sockets Low High Medium Flexible bidirectional communication

For modern applications, consider higher-level abstractions. D-Bus provides a standardized message bus for desktop applications. gRPC with Unix domain sockets gives you strongly-typed RPC with excellent tooling. These add overhead but dramatically reduce development time and bugs.

The fundamental rule: start with the simplest mechanism that meets your requirements. Pipes and Unix sockets handle most cases. Only reach for shared memory when profiling proves you need the performance—the synchronization complexity rarely justifies premature optimization.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.