Go Profiling: pprof Performance Analysis

Key Insights

pprof adds negligible overhead when enabled but not actively profiling, making it safe to include in production applications with HTTP endpoints exposed behind authentication
CPU profiling reveals hot paths in execution, while memory profiling distinguishes between total allocations (alloc_space) and current memory usage (inuse_space)—understanding this difference is critical for diagnosing leaks versus allocation churn
Goroutine and block profiles expose concurrency issues that CPU and memory profiling miss entirely, such as goroutine leaks from forgotten context cancellations or mutex contention in high-throughput code paths

Introduction to Go Profiling

Performance issues in production are inevitable. Your Go application might handle traffic fine during development, then crawl under real-world load. The question isn’t whether you’ll need profiling—it’s when.

Go’s pprof package provides runtime profiling for CPU usage, memory allocations, goroutine execution, and synchronization primitives. Unlike external profilers that require instrumentation or sampling overhead, pprof is built into the Go runtime. This means you get accurate, low-overhead profiling with minimal setup.

Common performance bottlenecks fall into four categories: CPU-bound operations consuming excessive cycles, memory allocations causing GC pressure, goroutine leaks exhausting resources, and lock contention serializing concurrent operations. Each requires different profiling approaches, and pprof handles all of them.

Setting Up pprof

For HTTP services, enabling pprof is a single import statement. The blank import registers HTTP handlers automatically:

package main

import (
    "log"
    "net/http"
    _ "net/http/pprof"
)

func main() {
    http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        w.Write([]byte("Hello, World!"))
    })
    
    log.Fatal(http.ListenAndServe(":8080", nil))
}

This exposes profiling endpoints at /debug/pprof/. For production, mount these under a separate admin server or behind authentication:

adminMux := http.NewServeMux()
adminMux.HandleFunc("/debug/pprof/", pprof.Index)
adminMux.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
adminMux.HandleFunc("/debug/pprof/profile", pprof.Profile)
adminMux.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
adminMux.HandleFunc("/debug/pprof/trace", pprof.Trace)

go http.ListenAndServe("localhost:6060", adminMux)

For standalone applications without HTTP servers, use runtime/pprof directly:

package main

import (
    "os"
    "runtime/pprof"
)

func main() {
    f, _ := os.Create("cpu.prof")
    defer f.Close()
    
    pprof.StartCPUProfile(f)
    defer pprof.StopCPUProfile()
    
    // Your application code here
    doWork()
}

CPU Profiling

CPU profiling identifies where your program spends execution time. It samples the call stack at regular intervals (100Hz by default) to build a statistical picture of hot paths.

Consider this inefficient string concatenation:

func processLogs(logs []string) string {
    var result string
    for _, log := range logs {
        result += log + "\n"  // Creates new string each iteration
    }
    return result
}

To profile this, either hit the HTTP endpoint or programmatically capture:

func main() {
    f, _ := os.Create("cpu.prof")
    defer f.Close()
    
    pprof.StartCPUProfile(f)
    defer pprof.StopCPUProfile()
    
    logs := make([]string, 10000)
    for i := range logs {
        logs[i] = "Log entry with some content"
    }
    
    result := processLogs(logs)
    _ = result
}

For HTTP services, collect a 30-second profile:

curl http://localhost:8080/debug/pprof/profile?seconds=30 > cpu.prof

Analyze with the interactive tool:

go tool pprof cpu.prof

Inside pprof’s interactive mode, use top10 to see the most expensive functions:

(pprof) top10
Showing nodes accounting for 2.50s, 89.29% of 2.80s total
      flat  flat%   sum%        cum   cum%
     1.20s 42.86% 42.86%      2.30s 82.14%  main.processLogs
     0.80s 28.57% 71.43%      0.80s 28.57%  runtime.concatstrings
     0.30s 10.71% 82.14%      0.30s 10.71%  runtime.mallocgc

The flat column shows time spent in the function itself, while cum includes time spent in called functions. Use list processLogs to see annotated source code showing which lines consume the most time.

Memory Profiling

Memory profiling reveals allocation patterns and potential leaks. Go provides two perspectives: alloc_space (total allocations) and inuse_space (current heap usage).

Here’s code that allocates aggressively:

func generateData() [][]byte {
    var data [][]byte
    for i := 0; i < 100000; i++ {
        // Each iteration allocates without reusing
        entry := make([]byte, 1024)
        data = append(data, entry)
    }
    return data
}

Capture a heap profile:

func main() {
    data := generateData()
    
    f, _ := os.Create("mem.prof")
    defer f.Close()
    
    runtime.GC() // Get accurate heap snapshot
    pprof.WriteHeapProfile(f)
    
    _ = data // Prevent optimization
}

Or via HTTP:

curl http://localhost:8080/debug/pprof/heap > mem.prof

Analyze allocation sources:

go tool pprof -alloc_space mem.prof

Inside pprof:

(pprof) top5
Showing nodes accounting for 97.66MB, 100% of 97.66MB total
      flat  flat%   sum%        cum   cum%
   97.66MB   100%   100%    97.66MB   100%  main.generateData

Compare alloc_space (total allocated) versus inuse_space (currently held):

go tool pprof -inuse_space mem.prof

If alloc_space is high but inuse_space is low, you have allocation churn—lots of temporary allocations that GC must handle. If both are high, you’re holding onto memory unnecessarily.

Use -base to compare profiles over time:

go tool pprof -base mem1.prof mem2.prof

This shows what changed between captures, essential for tracking down memory leaks.

Goroutine and Block Profiling

Goroutine leaks are insidious. They consume memory and scheduler resources without obvious symptoms until your application exhausts file descriptors or memory.

This code leaks goroutines:

func handleRequests() {
    for i := 0; i < 1000; i++ {
        go func() {
            // No context cancellation, blocks forever
            select {}
        }()
    }
}

Capture goroutine profile via HTTP:

curl http://localhost:8080/debug/pprof/goroutine > goroutine.prof

Or programmatically:

f, _ := os.Create("goroutine.prof")
pprof.Lookup("goroutine").WriteTo(f, 0)
f.Close()

Analyze:

go tool pprof goroutine.prof

The profile shows all active goroutines and their stack traces. Look for unexpected counts:

(pprof) top
1000 @ 0x43a8e6 0x44b245 0x44b220 0x469c85
#   0x469c84    main.handleRequests.func1+0x24    /app/main.go:15

Block profiling reveals synchronization bottlenecks:

var mu sync.Mutex
var counter int

func incrementCounter() {
    mu.Lock()
    time.Sleep(10 * time.Millisecond) // Simulate work under lock
    counter++
    mu.Unlock()
}

Enable block profiling:

runtime.SetBlockProfileRate(1) // Capture every blocking event

Then capture and analyze:

curl http://localhost:8080/debug/pprof/block > block.prof
go tool pprof block.prof

High block times indicate lock contention. Consider using finer-grained locking, lock-free data structures, or sharding.

Interactive Analysis with pprof Web UI

The web UI provides superior visualization compared to the terminal interface. Launch it:

go tool pprof -http=:8080 cpu.prof

This opens a browser with multiple views:

Flame Graph: Shows call stack hierarchy with width representing time/allocations
Top: Tabular view of most expensive functions
Source: Annotated source code with per-line metrics
Graph: Call graph visualization

The flame graph is particularly powerful. Each box represents a function, with width proportional to resource consumption. Click boxes to zoom into specific call paths.

Use the “View” dropdown to switch between different metrics. For memory profiles, toggle between alloc_space, alloc_objects, inuse_space, and inuse_objects.

The “Refine” menu lets you focus analysis:

Focus: main.processLogs

This filters the profile to only show paths through processLogs, eliminating noise from unrelated code.

Best Practices and Production Tips

Always enable pprof in production applications, but control access. Expose profiling endpoints on a separate port bound to localhost or behind authentication:

if os.Getenv("ENABLE_PPROF") == "true" {
    go func() {
        log.Println("Starting pprof on :6060")
        log.Println(http.ListenAndServe("localhost:6060", nil))
    }()
}

Profile collection has minimal overhead, but continuous profiling in high-traffic services should be rate-limited:

var lastProfile time.Time
var profileMutex sync.Mutex

func maybeProfile(w http.ResponseWriter, r *http.Request) {
    profileMutex.Lock()
    defer profileMutex.Unlock()
    
    if time.Since(lastProfile) < 5*time.Minute {
        http.Error(w, "Rate limited", http.StatusTooManyRequests)
        return
    }
    
    lastProfile = time.Now()
    pprof.Profile(w, r)
}

For continuous profiling, integrate with services like Grafana Pyroscope or Google Cloud Profiler:

import "cloud.google.com/go/profiler"

func main() {
    if err := profiler.Start(profiler.Config{
        Service:        "my-service",
        ServiceVersion: "1.0.0",
    }); err != nil {
        log.Printf("Failed to start profiler: %v", err)
    }
    
    // Rest of application
}

This automatically collects and uploads profiles, enabling historical analysis and comparison across deployments.

Profile before and after optimizations. Intuition about performance is often wrong—measure, don’t guess. Collect baseline profiles, make changes, then compare with -base to verify improvements.

Finally, combine profiling with benchmarks. Microbenchmarks validate specific optimizations, while profiling reveals systemic issues in real workloads. Use both tools together for comprehensive performance analysis.