Garbage Collection: Mark-Sweep, Generational, Reference Counting

Key Insights

Reference counting provides immediate cleanup but fails on circular references—understanding this limitation explains why Python combines it with a cycle-detecting collector.
Generational GC exploits the empirical observation that most objects die young, making minor collections fast and major collections rare.
Modern production runtimes use hybrid approaches; tuning GC requires understanding your application’s allocation patterns, not memorizing magic flags.

Why Garbage Collection Matters

Manual memory management kills projects. Not dramatically, but slowly—through use-after-free bugs that corrupt data, memory leaks that accumulate over weeks, and double-free errors that crash production at 3 AM. C and C++ developers spend significant cognitive overhead tracking ownership and lifetimes. The rest of us traded that burden for garbage collection.

Languages like Java, Python, Go, JavaScript, and C# all use automatic memory management. The runtime tracks object allocations and reclaims memory when objects become unreachable. This isn’t free—GC introduces pauses, consumes CPU cycles, and requires memory overhead. Understanding how your runtime’s collector works helps you write code that cooperates with it rather than fighting it.

The three fundamental approaches—reference counting, mark-sweep, and generational collection—form the building blocks of every production garbage collector. Let’s examine each.

Reference Counting: The Intuitive Approach

Reference counting is conceptually simple. Every object maintains a count of how many references point to it. When you assign a reference, increment the target’s count. When a reference goes out of scope or gets reassigned, decrement the old target’s count. When the count hits zero, the object is garbage—free it immediately.

Python uses reference counting as its primary collection mechanism:

import sys

class Widget:
    def __init__(self, name):
        self.name = name

w = Widget("primary")
print(f"Reference count: {sys.getrefcount(w)}")  # 2 (w + getrefcount's arg)

w2 = w
print(f"Reference count: {sys.getrefcount(w)}")  # 3

del w2
print(f"Reference count: {sys.getrefcount(w)}")  # 2

The appeal is obvious: deterministic destruction. When an object’s count reaches zero, its destructor runs immediately. This matters for resources like file handles or database connections where prompt cleanup prevents resource exhaustion.

But reference counting has a fatal flaw—circular references:

class Node:
    def __init__(self, value):
        self.value = value
        self.next = None

# Create a cycle
a = Node(1)
b = Node(2)
a.next = b
b.next = a  # Circular reference

# Even after deleting our references, the cycle persists
del a
del b
# Both nodes still have refcount=1 (from each other)
# Pure reference counting would leak this memory

Each node references the other, so neither count ever reaches zero. This is why Python supplements reference counting with a cycle-detecting collector. Languages that rely solely on reference counting (like early Objective-C) require developers to manually break cycles using weak references—references that don’t increment the count.

Reference counting also suffers from performance overhead. Every assignment requires atomic increment/decrement operations, which are expensive on modern CPUs. When an object with many references dies, the cascade of decrements can cause latency spikes.

Mark-Sweep: The Classic Algorithm

Mark-sweep takes a different approach. Instead of tracking references continuously, it periodically traces the object graph from known roots (global variables, stack frames, CPU registers) and marks everything reachable. Then it sweeps through the heap, freeing anything unmarked.

# Pseudocode implementation of mark-sweep
class MarkSweepCollector:
    def __init__(self):
        self.heap = []  # All allocated objects
        self.roots = []  # Global refs, stack refs
    
    def collect(self):
        # Phase 1: Mark
        for obj in self.heap:
            obj.marked = False
        
        worklist = list(self.roots)
        while worklist:
            obj = worklist.pop()
            if not obj.marked:
                obj.marked = True
                worklist.extend(obj.references)
        
        # Phase 2: Sweep
        self.heap = [obj for obj in self.heap if obj.marked]
    
    def allocate(self, obj):
        if self.needs_collection():
            self.collect()
        self.heap.append(obj)
        return obj

Mark-sweep handles cycles naturally—if objects in a cycle are unreachable from roots, they simply don’t get marked. No special handling required.

The downside is stop-the-world pauses. During collection, the application freezes while the collector traces the entire heap. For large heaps, this can mean pauses of hundreds of milliseconds—unacceptable for interactive applications.

Mark-sweep also causes memory fragmentation. After sweeping, the heap contains gaps where freed objects lived. Either you accept fragmentation (and slower allocation), or you add a compaction phase that moves surviving objects together—but compaction requires updating all references, extending the pause.

Generational Collection: Exploiting Object Lifetimes

Here’s an empirical observation that revolutionized garbage collection: most objects die young. A function allocates temporary objects, uses them briefly, and discards them. Long-lived objects—cached data, connection pools, configuration—are the exception.

Generational collectors exploit this by partitioning the heap into generations. New objects go into the young generation (often called the nursery or eden space). When the young generation fills up, the collector runs a minor collection—tracing only young objects. Survivors get promoted to the old generation. Major collections, which trace the entire heap, happen rarely.

This works because minor collections are fast. The young generation is small, and most of its objects are garbage. You get frequent, quick pauses instead of rare, long ones.

Here’s how to observe this in the JVM:

// GenerationalDemo.java
import java.util.ArrayList;
import java.util.List;

public class GenerationalDemo {
    static List<byte[]> oldGenObjects = new ArrayList<>();
    
    public static void main(String[] args) throws InterruptedException {
        // Create long-lived objects (will be promoted to old gen)
        for (int i = 0; i < 100; i++) {
            oldGenObjects.add(new byte[1024 * 100]); // 100KB each
        }
        
        // Create short-lived garbage (stays in young gen)
        for (int i = 0; i < 10000; i++) {
            byte[] garbage = new byte[1024 * 10]; // 10KB, immediately garbage
            if (i % 1000 == 0) {
                Thread.sleep(100);
            }
        }
    }
}

Run with GC logging enabled:

java -Xms256m -Xmx256m -Xlog:gc*:stdout:time,level,tags GenerationalDemo

You’ll see many minor GC events (young generation) and few major GC events. The short-lived allocations never burden the old generation.

Hybrid Approaches in Production Runtimes

No production runtime uses a single algorithm. They combine techniques based on workload characteristics.

JVM G1 (Garbage-First): Divides the heap into regions rather than fixed generations. It prioritizes collecting regions with the most garbage first, targeting predictable pause times. G1 uses concurrent marking to reduce stop-the-world pauses.

V8 (Chrome/Node.js): Uses generational collection with a parallel scavenger for the young generation and concurrent mark-sweep for the old generation. It also employs incremental marking, spreading marking work across multiple small pauses.

Go: Uses a concurrent, tri-color mark-sweep collector. Go prioritizes low latency over throughput, targeting sub-millisecond pauses. It achieves this through write barriers that track mutations during concurrent marking.

Compare GC behavior across JVM collectors:

# G1 (default in modern JVMs) - balanced latency/throughput
java -XX:+UseG1GC -Xlog:gc*:file=g1.log -jar app.jar

# ZGC - ultra-low latency (sub-millisecond pauses)
java -XX:+UseZGC -Xlog:gc*:file=zgc.log -jar app.jar

# Parallel GC - maximum throughput, longer pauses
java -XX:+UseParallelGC -Xlog:gc*:file=parallel.log -jar app.jar

Analyze the logs to compare pause time distributions. ZGC will show consistent sub-millisecond pauses; Parallel GC will show longer but less frequent pauses with higher overall throughput.

Tuning and Debugging GC Performance

Before tuning, profile. Premature optimization of GC is as wasteful as any other premature optimization.

Signs of GC pressure:

High CPU usage from GC threads
Frequent long pauses visible in latency percentiles
Out-of-memory errors despite available heap space (fragmentation)
Throughput degradation under load

In Java, use jstat for live monitoring:

jstat -gcutil <pid> 1000  # Sample every 1000ms

For allocation profiling, async-profiler captures allocation hot spots:

./profiler.sh -e alloc -d 30 -f allocations.html <pid>

In Go, the runtime exposes GC statistics:

import "runtime"

func printGCStats() {
    var stats runtime.MemStats
    runtime.ReadMemStats(&stats)
    
    fmt.Printf("Alloc: %d MB\n", stats.Alloc/1024/1024)
    fmt.Printf("NumGC: %d\n", stats.NumGC)
    fmt.Printf("PauseTotal: %d ms\n", stats.PauseTotalNs/1000000)
}

Allocation patterns that help GC:

Reuse objects via pools for hot paths
Preallocate slices/arrays when size is known
Avoid boxing primitives in tight loops
Use value types where possible (structs vs. classes in Go/C#)

Patterns that hurt:

Creating closures in hot loops (each closure allocates)
String concatenation in loops (use builders)
Excessive use of finalizers/destructors

Choosing Your Runtime

Your choice of language and runtime should consider GC characteristics:

Throughput-critical batch processing: JVM with Parallel GC or high-throughput G1 tuning
Low-latency services: Go, JVM with ZGC/Shenandoah, or careful C++ with custom allocators
Scripting and prototyping: Python/JavaScript; GC overhead rarely matters
Systems programming with no GC tolerance: Rust’s ownership model eliminates GC entirely through compile-time lifetime tracking

The future points toward concurrent, low-pause collectors becoming standard. ZGC and Shenandoah have made sub-millisecond pauses routine on the JVM. Go continues improving its concurrent collector. And Rust demonstrates that for some domains, compile-time memory management eliminates the GC question entirely.

Understanding these fundamentals—reference counting’s simplicity and limitations, mark-sweep’s cycle handling, generational collection’s exploitation of object lifetimes—gives you the mental model to reason about memory behavior in any managed runtime. When GC becomes a bottleneck, you’ll know where to look.