Go sync.WaitGroup: Waiting for Goroutines

Go's goroutines make concurrent programming accessible, but they introduce a critical challenge: how do you know when your concurrent work is done? The naive approach of using `time.Sleep()` is...

Key Insights

  • sync.WaitGroup solves the fundamental problem of waiting for multiple goroutines to complete by maintaining an internal counter that tracks active goroutines
  • Always pass WaitGroup by pointer to goroutines and call Add() before launching them to avoid race conditions
  • For complex scenarios requiring error handling or cancellation, consider errgroup.Group instead of plain WaitGroup

Introduction to Goroutine Synchronization

Go’s goroutines make concurrent programming accessible, but they introduce a critical challenge: how do you know when your concurrent work is done? The naive approach of using time.Sleep() is fundamentally broken because you’re guessing at execution time rather than actually coordinating with your goroutines.

Here’s the problem in action:

package main

import (
    "fmt"
    "time"
)

func main() {
    for i := 0; i < 5; i++ {
        go func(n int) {
            time.Sleep(time.Duration(n*100) * time.Millisecond)
            fmt.Printf("Goroutine %d finished\n", n)
        }(i)
    }
    
    time.Sleep(time.Second) // Hope this is enough?
    fmt.Println("Main exiting")
}

This code has multiple problems. The sleep duration is arbitrary—too short and goroutines get cut off, too long and you’re wasting time. You have no idea when work actually completes. In production systems, this approach is a recipe for data loss and race conditions.

sync.WaitGroup provides the correct solution by giving you explicit synchronization primitives.

WaitGroup Basics

A WaitGroup maintains an internal counter. You increment it when starting work, decrement it when work completes, and block until the counter reaches zero. It’s that simple.

The API consists of three methods:

  • Add(delta int): Increments the counter by delta
  • Done(): Decrements the counter by one (equivalent to Add(-1))
  • Wait(): Blocks until the counter reaches zero

Here’s the corrected version of our earlier example:

package main

import (
    "fmt"
    "sync"
    "time"
)

func main() {
    var wg sync.WaitGroup
    
    for i := 0; i < 5; i++ {
        wg.Add(1) // Increment counter before launching goroutine
        go func(n int) {
            defer wg.Done() // Decrement when done
            time.Sleep(time.Duration(n*100) * time.Millisecond)
            fmt.Printf("Goroutine %d finished\n", n)
        }(i)
    }
    
    wg.Wait() // Block until all goroutines call Done()
    fmt.Println("All goroutines completed")
}

This code guarantees that “All goroutines completed” only prints after every goroutine finishes. No guessing, no arbitrary delays.

Common Patterns and Best Practices

The most critical rule: always pass WaitGroup by pointer. If you pass by value, each goroutine gets a copy, and calling Done() on a copy doesn’t affect the original counter.

Always call Add() before launching the goroutine, not inside it. This prevents a race condition where Wait() might return before Add() is called.

Always use defer for Done(). This ensures the counter decrements even if the goroutine panics or returns early.

Here’s a worker pool pattern that demonstrates these principles:

package main

import (
    "fmt"
    "io"
    "net/http"
    "sync"
    "time"
)

func fetchURL(url string, wg *sync.WaitGroup, results chan<- string) {
    defer wg.Done() // Guaranteed to execute
    
    client := &http.Client{Timeout: 5 * time.Second}
    resp, err := client.Get(url)
    if err != nil {
        results <- fmt.Sprintf("%s: ERROR - %v", url, err)
        return
    }
    defer resp.Body.Close()
    
    body, err := io.ReadAll(resp.Body)
    if err != nil {
        results <- fmt.Sprintf("%s: ERROR reading body - %v", url, err)
        return
    }
    
    results <- fmt.Sprintf("%s: %d bytes", url, len(body))
}

func main() {
    urls := []string{
        "https://golang.org",
        "https://go.dev",
        "https://pkg.go.dev",
    }
    
    var wg sync.WaitGroup
    results := make(chan string, len(urls))
    
    for _, url := range urls {
        wg.Add(1) // Increment BEFORE launching goroutine
        go fetchURL(url, &wg, results) // Pass WaitGroup by pointer
    }
    
    wg.Wait()        // Wait for all fetches to complete
    close(results)   // Safe to close now
    
    for result := range results {
        fmt.Println(result)
    }
}

This pattern is robust. The defer wg.Done() ensures the counter decrements regardless of success or failure. Passing &wg ensures all goroutines work with the same counter. Calling Add() before go prevents race conditions.

Common Pitfalls and How to Avoid Them

The most common mistake is mismatching Add() and Done() calls. If you call Done() more times than Add(), the counter goes negative and you get a panic:

// WRONG - This will panic
func badExample() {
    var wg sync.WaitGroup
    wg.Add(1)
    
    go func() {
        defer wg.Done()
        fmt.Println("First done")
    }()
    
    go func() {
        defer wg.Done() // Second Done() without Add()
        fmt.Println("Second done")
    }()
    
    wg.Wait() // panic: sync: negative WaitGroup counter
}

The fix is straightforward—match every goroutine with an Add(1):

// CORRECT
func goodExample() {
    var wg sync.WaitGroup
    
    wg.Add(1)
    go func() {
        defer wg.Done()
        fmt.Println("First done")
    }()
    
    wg.Add(1) // Added this
    go func() {
        defer wg.Done()
        fmt.Println("Second done")
    }()
    
    wg.Wait()
}

Another pitfall is copying a WaitGroup after use. The sync.WaitGroup documentation explicitly states it must not be copied after first use. Always pass by pointer and never embed it in structs that get copied.

Forgetting Done() is catastrophic—your program hangs forever at Wait(). Using defer makes this nearly impossible to mess up.

Real-World Use Case

Let’s build a practical file processor that reads multiple files concurrently, counts words in each, and aggregates results:

package main

import (
    "fmt"
    "os"
    "strings"
    "sync"
)

type FileResult struct {
    Filename  string
    WordCount int
    Error     error
}

func processFile(filename string, wg *sync.WaitGroup, results chan<- FileResult) {
    defer wg.Done()
    
    content, err := os.ReadFile(filename)
    if err != nil {
        results <- FileResult{Filename: filename, Error: err}
        return
    }
    
    words := strings.Fields(string(content))
    results <- FileResult{
        Filename:  filename,
        WordCount: len(words),
    }
}

func main() {
    files := []string{
        "document1.txt",
        "document2.txt",
        "document3.txt",
        "document4.txt",
    }
    
    var wg sync.WaitGroup
    results := make(chan FileResult, len(files))
    
    // Launch all file processors
    for _, file := range files {
        wg.Add(1)
        go processFile(file, &wg, results)
    }
    
    // Wait for completion in a separate goroutine
    go func() {
        wg.Wait()
        close(results)
    }()
    
    // Aggregate results
    totalWords := 0
    errorCount := 0
    
    for result := range results {
        if result.Error != nil {
            fmt.Printf("Error processing %s: %v\n", result.Filename, result.Error)
            errorCount++
            continue
        }
        fmt.Printf("%s: %d words\n", result.Filename, result.WordCount)
        totalWords += result.WordCount
    }
    
    fmt.Printf("\nTotal: %d words across %d files (%d errors)\n",
        totalWords, len(files)-errorCount, errorCount)
}

This pattern is production-ready. We handle errors gracefully, aggregate results, and use a separate goroutine to wait and close the channel, allowing the main goroutine to process results as they arrive.

Alternatives and When to Use Them

For simple synchronization, WaitGroup is perfect. But when you need error handling, cancellation, or limiting concurrency, consider alternatives.

The golang.org/x/sync/errgroup package extends WaitGroup with error propagation and context support:

package main

import (
    "context"
    "fmt"
    "golang.org/x/sync/errgroup"
    "time"
)

func main() {
    // With WaitGroup - no error handling
    // var wg sync.WaitGroup
    // for i := 0; i < 3; i++ {
    //     wg.Add(1)
    //     go func(n int) {
    //         defer wg.Done()
    //         // Can't return errors
    //     }(i)
    // }
    // wg.Wait()
    
    // With errgroup - clean error handling
    g, ctx := errgroup.WithContext(context.Background())
    
    for i := 0; i < 3; i++ {
        i := i // Capture loop variable
        g.Go(func() error {
            if i == 2 {
                return fmt.Errorf("task %d failed", i)
            }
            time.Sleep(100 * time.Millisecond)
            fmt.Printf("Task %d completed\n", i)
            return nil
        })
    }
    
    if err := g.Wait(); err != nil {
        fmt.Printf("Error occurred: %v\n", err)
    }
    
    _ = ctx // Context available for cancellation
}

Use WaitGroup when you just need to wait for goroutines. Use errgroup when you need error handling or cancellation. Use channels when goroutines need to communicate results. Use context when you need timeouts or cancellation signals.

sync.WaitGroup is a fundamental building block in Go’s concurrency toolkit. Master it, understand its patterns, avoid its pitfalls, and you’ll write robust concurrent code. The key is simplicity—increment before starting work, decrement when done, wait until zero. Everything else is just careful application of these principles.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.