Go Benchmarking: Performance Testing Guide

Performance matters. Whether you're optimizing a hot path in your API or choosing between two implementation approaches, you need data. Go's `testing` package includes a robust benchmarking framework...

Key Insights

  • Go’s built-in benchmarking framework automatically determines optimal iteration counts and provides memory allocation metrics alongside execution time, making it superior to manual timing code
  • Compiler optimizations can invalidate benchmarks by eliminating “dead” code—always assign results to package-level variables or use the result in meaningful ways to prevent false measurements
  • Statistical analysis with tools like benchstat is essential for detecting real performance changes versus noise, especially when comparing implementations or tracking regressions in CI/CD pipelines

Introduction to Go Benchmarking

Performance matters. Whether you’re optimizing a hot path in your API or choosing between two implementation approaches, you need data. Go’s testing package includes a robust benchmarking framework that integrates seamlessly with the standard toolchain—no third-party dependencies required.

Unlike profiling tools that show you where time is spent in existing code, benchmarks let you measure specific operations in isolation. Use benchmarks when you want to compare algorithms, validate optimization attempts, or establish performance baselines. Use profiling when you need to understand where your application spends time in production-like scenarios.

The beauty of Go’s approach is its simplicity. Benchmarks live alongside your tests, run with the same go test command, and follow familiar conventions. If you can write a Go test, you can write a benchmark.

Writing Your First Benchmark

Benchmark functions follow a simple pattern: start with Benchmark, accept a *testing.B parameter, and loop b.N times. The testing framework automatically adjusts b.N until it can produce stable timing measurements.

package stringops

import (
    "strings"
    "testing"
)

func BenchmarkStringConcatOperator(b *testing.B) {
    for i := 0; i < b.N; i++ {
        result := ""
        for j := 0; j < 100; j++ {
            result += "x"
        }
        _ = result
    }
}

func BenchmarkStringBuilder(b *testing.B) {
    for i := 0; i < b.N; i++ {
        var builder strings.Builder
        for j := 0; j < 100; j++ {
            builder.WriteString("x")
        }
        _ = builder.String()
    }
}

Run these with go test -bench=.:

BenchmarkStringConcatOperator-8    50000    35421 ns/op
BenchmarkStringBuilder-8          500000     2847 ns/op

The strings.Builder approach is roughly 12x faster. The -8 suffix indicates GOMAXPROCS. The framework ran the first benchmark 50,000 times and the second 500,000 times to get stable measurements.

Benchmark Configuration and Options

The -benchmem flag adds memory allocation statistics—critical for understanding the full performance picture:

go test -bench=. -benchmem
BenchmarkStringConcatOperator-8    50000    35421 ns/op    53248 B/op    99 allocs/op
BenchmarkStringBuilder-8          500000     2847 ns/op      248 B/op     5 allocs/op

Now we see why strings.Builder wins: it allocates 200x less memory and performs 20x fewer allocations. String concatenation with += creates a new string on each iteration, while Builder pre-allocates and grows efficiently.

Other useful flags:

  • -benchtime=10s or -benchtime=1000000x: Run for a specific duration or iteration count
  • -count=10: Run each benchmark multiple times for statistical analysis
  • -cpu=1,2,4: Test with different GOMAXPROCS values
  • -bench=StringBuilder: Run only benchmarks matching the pattern

For reliable results, always use -count with at least 5-10 runs:

go test -bench=. -benchmem -count=10 | tee results.txt

Common Benchmarking Patterns

Excluding Setup Costs

When your benchmark requires expensive setup, use b.ResetTimer() to exclude it from measurements:

func BenchmarkJSONMarshal(b *testing.B) {
    data := generateLargeStruct() // Expensive setup
    
    b.ResetTimer() // Start timing here
    for i := 0; i < b.N; i++ {
        bytes, err := json.Marshal(data)
        if err != nil {
            b.Fatal(err)
        }
        _ = bytes
    }
}

Table-Driven Sub-Benchmarks

Test performance across different input sizes with b.Run():

func BenchmarkMapOperations(b *testing.B) {
    sizes := []int{10, 100, 1000, 10000}
    
    for _, size := range sizes {
        b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) {
            for i := 0; i < b.N; i++ {
                m := make(map[int]int, size)
                for j := 0; j < size; j++ {
                    m[j] = j * 2
                }
                _ = m
            }
        })
    }
}

This produces separate results for each size:

BenchmarkMapOperations/size=10-8      2000000    856 ns/op
BenchmarkMapOperations/size=100-8      200000   7234 ns/op
BenchmarkMapOperations/size=1000-8      20000  72451 ns/op

Parallel Benchmarks

Test concurrent performance with b.RunParallel():

func BenchmarkSyncMapReadWrite(b *testing.B) {
    var m sync.Map
    
    // Pre-populate
    for i := 0; i < 1000; i++ {
        m.Store(i, i)
    }
    
    b.RunParallel(func(pb *testing.PB) {
        i := 0
        for pb.Next() {
            if i%2 == 0 {
                m.Load(i % 1000)
            } else {
                m.Store(i%1000, i)
            }
            i++
        }
    })
}

This distributes work across GOMAXPROCS goroutines, revealing contention issues.

Avoiding Benchmarking Pitfalls

The Go compiler is aggressive about optimizing away unused code. This benchmark looks valid but measures nothing:

func BenchmarkBroken(b *testing.B) {
    for i := 0; i < b.N; i++ {
        expensiveCalculation() // Result unused, might be optimized away
    }
}

The compiler sees the result isn’t used and may eliminate the entire function call. Always consume benchmark results:

var result int // Package-level variable

func BenchmarkFixed(b *testing.B) {
    var r int
    for i := 0; i < b.N; i++ {
        r = expensiveCalculation()
    }
    result = r // Prevent optimization
}

In Go 1.21+, use the official solution:

func BenchmarkModern(b *testing.B) {
    for i := 0; i < b.N; i++ {
        result := expensiveCalculation()
        _ = result // Still recommended for clarity
    }
}

Another pitfall: unrealistic test data. Don’t benchmark with tiny, cache-friendly inputs if production handles megabytes. Don’t use sequential integers if real data is random. Match your benchmark conditions to reality.

Comparative Benchmarking and Analysis

Running benchmarks once tells you nothing about whether a change improved performance. You need statistical comparison. Save baseline results:

go test -bench=. -benchmem -count=10 > old.txt

Make your changes, then benchmark again:

go test -bench=. -benchmem -count=10 > new.txt

Install and use benchstat:

go install golang.org/x/perf/cmd/benchstat@latest
benchstat old.txt new.txt
name                old time/op  new time/op  delta
StringConcatOperator  35.4µs ± 2%  34.8µs ± 1%  -1.69%  (p=0.000 n=10+10)
StringBuilder          2.85µs ± 1%  2.12µs ± 2% -25.61%  (p=0.000 n=10+10)

name                old alloc/op new alloc/op delta
StringConcatOperator  53.2kB ± 0%  53.2kB ± 0%    ~     (all equal)
StringBuilder           248B ± 0%    128B ± 0% -48.39%  (p=0.000 n=10+10)

The p value indicates statistical significance. Values below 0.05 suggest real changes, not noise. The ± shows variance—high variance means unreliable results (close other applications, disable CPU frequency scaling).

Integrate this into CI/CD to catch regressions:

# In CI pipeline
git checkout main
go test -bench=. -count=10 > base.txt
git checkout feature-branch
go test -bench=. -count=10 > feature.txt
benchstat base.txt feature.txt || exit 1

Advanced Techniques

Profiling Integration

Benchmarks generate profiling data for deeper analysis:

go test -bench=StringBuilder -cpuprofile=cpu.prof -memprofile=mem.prof
go tool pprof cpu.prof

Inside pprof: top, list, web commands show where time is spent.

Benchmarking HTTP Handlers

Use httptest for realistic HTTP benchmarks:

func BenchmarkHTTPHandler(b *testing.B) {
    handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        data := processRequest(r)
        json.NewEncoder(w).Encode(data)
    })
    
    req := httptest.NewRequest("GET", "/api/data", nil)
    
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        w := httptest.NewRecorder()
        handler.ServeHTTP(w, req)
        if w.Code != 200 {
            b.Fatalf("unexpected status: %d", w.Code)
        }
    }
}

Database Query Benchmarks

For database operations, use test containers or in-memory databases:

func BenchmarkDatabaseQuery(b *testing.B) {
    db := setupTestDB(b) // Returns *sql.DB with test data
    defer db.Close()
    
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        var count int
        err := db.QueryRow("SELECT COUNT(*) FROM users WHERE active = ?", true).Scan(&count)
        if err != nil {
            b.Fatal(err)
        }
    }
}

Benchmark different query strategies, index configurations, or connection pool settings to optimize database performance with real measurements.

Conclusion

Go’s benchmarking framework gives you the tools to make performance decisions based on data, not guesses. Write benchmarks early—before optimizing—to establish baselines. Use -benchmem always, because memory allocations matter as much as speed. Apply statistical analysis with benchstat to separate real improvements from measurement noise.

The patterns covered here—sub-benchmarks, parallel execution, setup exclusion—handle most real-world scenarios. Integrate benchmarks into your development workflow and CI/CD pipeline to catch regressions before they reach production. Performance isn’t an accident; it’s the result of measurement, iteration, and validation.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.