Go Benchmark Tests: Performance Measurement in Go

Key Insights

Go’s built-in benchmark framework uses an adaptive iteration count (b.N) that automatically scales to produce statistically meaningful results, eliminating guesswork about sample sizes.
Memory allocation tracking with -benchmem often reveals more optimization opportunities than raw execution time, since allocations directly impact garbage collection pressure.
The benchstat tool transforms benchmark output from anecdotal observations into statistically rigorous comparisons, making it essential for validating performance improvements.

Introduction to Go Benchmarks

Performance measurement separates professional Go code from hobbyist projects. You can’t optimize what you don’t measure, and Go’s standard library provides a robust benchmarking framework that most developers underutilize.

The testing package includes benchmark capabilities alongside unit tests, meaning you don’t need external dependencies to measure performance. This integration encourages treating benchmarks as first-class citizens in your codebase rather than afterthoughts.

Write benchmarks when you’re optimizing hot paths, comparing algorithm implementations, or establishing performance contracts for critical code. Don’t benchmark everything—focus on code that runs frequently, handles large data sets, or sits in latency-sensitive paths. If you’re debating whether a clever optimization actually helps, a benchmark gives you the answer in seconds.

Benchmark Function Basics

A benchmark function follows a specific signature: it starts with Benchmark, takes a *testing.B parameter, and contains a loop that runs b.N times. Go’s test runner adjusts b.N automatically until the benchmark runs long enough to produce reliable measurements.

package stringutil

import (
    "strings"
    "testing"
)

func ConcatWithPlus(strs []string) string {
    result := ""
    for _, s := range strs {
        result += s
    }
    return result
}

func ConcatWithBuilder(strs []string) string {
    var builder strings.Builder
    for _, s := range strs {
        builder.WriteString(s)
    }
    return builder.String()
}

func BenchmarkConcatWithPlus(b *testing.B) {
    strs := []string{"hello", "world", "this", "is", "a", "test"}
    for i := 0; i < b.N; i++ {
        ConcatWithPlus(strs)
    }
}

func BenchmarkConcatWithBuilder(b *testing.B) {
    strs := []string{"hello", "world", "this", "is", "a", "test"}
    for i := 0; i < b.N; i++ {
        ConcatWithBuilder(strs)
    }
}

The b.N loop is non-negotiable. Go starts with a small b.N value and increases it exponentially until the benchmark runs for at least one second (by default). This adaptive approach means your benchmarks automatically adjust to both fast and slow operations without manual tuning.

Running and Interpreting Benchmarks

Run benchmarks with go test -bench. The -bench flag takes a regular expression matching benchmark names. Use -bench=. to run all benchmarks.

go test -bench=. -benchmem ./...

Key flags you’ll use regularly:

-benchtime=5s: Run each benchmark for 5 seconds instead of the default 1 second
-count=10: Run each benchmark 10 times for statistical analysis
-benchmem: Report memory allocation statistics

Here’s typical output:

BenchmarkConcatWithPlus-8       1000000     1052 ns/op    530 B/op    6 allocs/op
BenchmarkConcatWithBuilder-8    5000000      234 ns/op    112 B/op    2 allocs/op

Breaking down this output:

BenchmarkConcatWithPlus-8: Benchmark name with GOMAXPROCS value (8 cores)
1000000: Number of iterations (b.N)
1052 ns/op: Nanoseconds per operation
530 B/op: Bytes allocated per operation
6 allocs/op: Number of allocations per operation

The strings.Builder approach runs 4.5x faster and allocates 5x less memory. This is the kind of insight benchmarks provide instantly.

Advanced Benchmark Techniques

Real-world benchmarking requires more than basic loops. Sub-benchmarks let you test multiple scenarios in a single function, while timer controls exclude setup costs from measurements.

func BenchmarkConcatSizes(b *testing.B) {
    sizes := []int{10, 100, 1000, 10000}
    
    for _, size := range sizes {
        // Generate test data outside the timed section
        strs := make([]string, size)
        for i := range strs {
            strs[i] = "word"
        }
        
        b.Run(fmt.Sprintf("Plus/size=%d", size), func(b *testing.B) {
            for i := 0; i < b.N; i++ {
                ConcatWithPlus(strs)
            }
        })
        
        b.Run(fmt.Sprintf("Builder/size=%d", size), func(b *testing.B) {
            for i := 0; i < b.N; i++ {
                ConcatWithBuilder(strs)
            }
        })
    }
}

When setup is expensive and must happen inside the benchmark function, use timer controls:

func BenchmarkWithExpensiveSetup(b *testing.B) {
    for i := 0; i < b.N; i++ {
        b.StopTimer()
        data := generateLargeDataset() // Expensive setup
        b.StartTimer()
        
        processData(data) // Only this gets measured
    }
}

For one-time setup, use b.ResetTimer() after initialization:

func BenchmarkWithOneTimeSetup(b *testing.B) {
    data := generateLargeDataset() // Runs once
    b.ResetTimer() // Reset timer after setup
    
    for i := 0; i < b.N; i++ {
        processData(data)
    }
}

Call b.ReportAllocs() to enable memory reporting for a specific benchmark without the -benchmem flag:

func BenchmarkMemoryTracking(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i < b.N; i++ {
        _ = make([]byte, 1024)
    }
}

Comparing and Tracking Performance

Single benchmark runs provide limited value. Statistical comparison between versions reveals whether changes actually improve performance or just introduce noise.

Install benchstat:

go install golang.org/x/perf/cmd/benchstat@latest

Capture baseline measurements, make changes, then compare:

# Before optimization
go test -bench=BenchmarkConcat -count=10 > old.txt

# After optimization
go test -bench=BenchmarkConcat -count=10 > new.txt

# Compare
benchstat old.txt new.txt

Output shows statistical significance:

name               old time/op    new time/op    delta
ConcatWithPlus-8   1.05µs ± 2%    0.23µs ± 1%   -78.10%  (p=0.000 n=10+10)

name               old alloc/op   new alloc/op   delta
ConcatWithPlus-8    530B ± 0%      112B ± 0%     -78.87%  (p=0.000 n=10+10)

name               old allocs/op  new allocs/op  delta
ConcatWithPlus-8    6.00 ± 0%      2.00 ± 0%     -66.67%  (p=0.000 n=10+10)

The p=0.000 indicates high statistical confidence. The ± 2% shows measurement variance. Without benchstat, you’re essentially guessing whether performance differences are real.

Common Pitfalls and Best Practices

The Go compiler aggressively optimizes code, including eliminating computations whose results aren’t used. This defeats benchmarks that don’t consume their results.

// BAD: Compiler may optimize away the entire computation
func BenchmarkBadExample(b *testing.B) {
    for i := 0; i < b.N; i++ {
        ComputeExpensiveResult() // Result discarded
    }
}

// GOOD: Use a package-level variable to prevent optimization
var result int

func BenchmarkGoodExample(b *testing.B) {
    var r int
    for i := 0; i < b.N; i++ {
        r = ComputeExpensiveResult()
    }
    result = r // Assign to package-level variable
}

Alternatively, use runtime.KeepAlive:

func BenchmarkWithKeepAlive(b *testing.B) {
    for i := 0; i < b.N; i++ {
        r := ComputeExpensiveResult()
        runtime.KeepAlive(r)
    }
}

Other pitfalls to avoid:

Cache warming: The first iteration may be slower due to cold caches. Run benchmarks multiple times with -count to get stable measurements.

Unrealistic data: Benchmarking with trivial inputs produces misleading results. Use representative data sizes and patterns.

Shared state: Ensure each iteration operates independently. Accumulated state across iterations skews results.

// BAD: Buffer grows across iterations
func BenchmarkSharedState(b *testing.B) {
    var buf bytes.Buffer
    for i := 0; i < b.N; i++ {
        buf.WriteString("data") // Buffer keeps growing
    }
}

// GOOD: Reset state each iteration
func BenchmarkCleanState(b *testing.B) {
    for i := 0; i < b.N; i++ {
        var buf bytes.Buffer
        buf.WriteString("data")
    }
}

Integrating Benchmarks into CI/CD

Automated benchmark tracking catches performance regressions before they reach production. The simplest approach runs benchmarks on every PR and compares against the main branch.

# .github/workflows/benchmark.yml
name: Benchmark
on:
  pull_request:
    branches: [main]

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Go
        uses: actions/setup-go@v5
        with:
          go-version: '1.22'
      
      - name: Run benchmarks
        run: go test -bench=. -count=5 -benchmem ./... | tee new.txt
      
      - name: Checkout main
        run: git checkout main
      
      - name: Run baseline benchmarks
        run: go test -bench=. -count=5 -benchmem ./... | tee old.txt
      
      - name: Compare
        run: |
          go install golang.org/x/perf/cmd/benchstat@latest
          benchstat old.txt new.txt

For more sophisticated tracking, consider tools like gobench for historical storage or continuous profiling services that track performance trends over time.

Set performance budgets for critical paths. If a core function exceeds 1ms or allocates more than 1KB, fail the build. This prevents gradual performance degradation that’s easy to miss in individual commits.

Benchmarks aren’t just for optimization—they’re documentation of performance expectations. When you establish that a function runs in 50 microseconds, you’ve created a contract that future changes must respect. Treat your benchmarks with the same care as your unit tests, and your production performance will thank you.