Go sync.WaitGroup: Waiting for Goroutines
Go's goroutines make concurrent programming accessible, but they introduce a critical challenge: how do you know when your concurrent work is done? The naive approach of using `time.Sleep()` is...
Key Insights
sync.WaitGroupsolves the fundamental problem of waiting for multiple goroutines to complete by maintaining an internal counter that tracks active goroutines- Always pass
WaitGroupby pointer to goroutines and callAdd()before launching them to avoid race conditions - For complex scenarios requiring error handling or cancellation, consider
errgroup.Groupinstead of plainWaitGroup
Introduction to Goroutine Synchronization
Go’s goroutines make concurrent programming accessible, but they introduce a critical challenge: how do you know when your concurrent work is done? The naive approach of using time.Sleep() is fundamentally broken because you’re guessing at execution time rather than actually coordinating with your goroutines.
Here’s the problem in action:
package main
import (
"fmt"
"time"
)
func main() {
for i := 0; i < 5; i++ {
go func(n int) {
time.Sleep(time.Duration(n*100) * time.Millisecond)
fmt.Printf("Goroutine %d finished\n", n)
}(i)
}
time.Sleep(time.Second) // Hope this is enough?
fmt.Println("Main exiting")
}
This code has multiple problems. The sleep duration is arbitrary—too short and goroutines get cut off, too long and you’re wasting time. You have no idea when work actually completes. In production systems, this approach is a recipe for data loss and race conditions.
sync.WaitGroup provides the correct solution by giving you explicit synchronization primitives.
WaitGroup Basics
A WaitGroup maintains an internal counter. You increment it when starting work, decrement it when work completes, and block until the counter reaches zero. It’s that simple.
The API consists of three methods:
Add(delta int): Increments the counter by deltaDone(): Decrements the counter by one (equivalent toAdd(-1))Wait(): Blocks until the counter reaches zero
Here’s the corrected version of our earlier example:
package main
import (
"fmt"
"sync"
"time"
)
func main() {
var wg sync.WaitGroup
for i := 0; i < 5; i++ {
wg.Add(1) // Increment counter before launching goroutine
go func(n int) {
defer wg.Done() // Decrement when done
time.Sleep(time.Duration(n*100) * time.Millisecond)
fmt.Printf("Goroutine %d finished\n", n)
}(i)
}
wg.Wait() // Block until all goroutines call Done()
fmt.Println("All goroutines completed")
}
This code guarantees that “All goroutines completed” only prints after every goroutine finishes. No guessing, no arbitrary delays.
Common Patterns and Best Practices
The most critical rule: always pass WaitGroup by pointer. If you pass by value, each goroutine gets a copy, and calling Done() on a copy doesn’t affect the original counter.
Always call Add() before launching the goroutine, not inside it. This prevents a race condition where Wait() might return before Add() is called.
Always use defer for Done(). This ensures the counter decrements even if the goroutine panics or returns early.
Here’s a worker pool pattern that demonstrates these principles:
package main
import (
"fmt"
"io"
"net/http"
"sync"
"time"
)
func fetchURL(url string, wg *sync.WaitGroup, results chan<- string) {
defer wg.Done() // Guaranteed to execute
client := &http.Client{Timeout: 5 * time.Second}
resp, err := client.Get(url)
if err != nil {
results <- fmt.Sprintf("%s: ERROR - %v", url, err)
return
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
results <- fmt.Sprintf("%s: ERROR reading body - %v", url, err)
return
}
results <- fmt.Sprintf("%s: %d bytes", url, len(body))
}
func main() {
urls := []string{
"https://golang.org",
"https://go.dev",
"https://pkg.go.dev",
}
var wg sync.WaitGroup
results := make(chan string, len(urls))
for _, url := range urls {
wg.Add(1) // Increment BEFORE launching goroutine
go fetchURL(url, &wg, results) // Pass WaitGroup by pointer
}
wg.Wait() // Wait for all fetches to complete
close(results) // Safe to close now
for result := range results {
fmt.Println(result)
}
}
This pattern is robust. The defer wg.Done() ensures the counter decrements regardless of success or failure. Passing &wg ensures all goroutines work with the same counter. Calling Add() before go prevents race conditions.
Common Pitfalls and How to Avoid Them
The most common mistake is mismatching Add() and Done() calls. If you call Done() more times than Add(), the counter goes negative and you get a panic:
// WRONG - This will panic
func badExample() {
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
fmt.Println("First done")
}()
go func() {
defer wg.Done() // Second Done() without Add()
fmt.Println("Second done")
}()
wg.Wait() // panic: sync: negative WaitGroup counter
}
The fix is straightforward—match every goroutine with an Add(1):
// CORRECT
func goodExample() {
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
fmt.Println("First done")
}()
wg.Add(1) // Added this
go func() {
defer wg.Done()
fmt.Println("Second done")
}()
wg.Wait()
}
Another pitfall is copying a WaitGroup after use. The sync.WaitGroup documentation explicitly states it must not be copied after first use. Always pass by pointer and never embed it in structs that get copied.
Forgetting Done() is catastrophic—your program hangs forever at Wait(). Using defer makes this nearly impossible to mess up.
Real-World Use Case
Let’s build a practical file processor that reads multiple files concurrently, counts words in each, and aggregates results:
package main
import (
"fmt"
"os"
"strings"
"sync"
)
type FileResult struct {
Filename string
WordCount int
Error error
}
func processFile(filename string, wg *sync.WaitGroup, results chan<- FileResult) {
defer wg.Done()
content, err := os.ReadFile(filename)
if err != nil {
results <- FileResult{Filename: filename, Error: err}
return
}
words := strings.Fields(string(content))
results <- FileResult{
Filename: filename,
WordCount: len(words),
}
}
func main() {
files := []string{
"document1.txt",
"document2.txt",
"document3.txt",
"document4.txt",
}
var wg sync.WaitGroup
results := make(chan FileResult, len(files))
// Launch all file processors
for _, file := range files {
wg.Add(1)
go processFile(file, &wg, results)
}
// Wait for completion in a separate goroutine
go func() {
wg.Wait()
close(results)
}()
// Aggregate results
totalWords := 0
errorCount := 0
for result := range results {
if result.Error != nil {
fmt.Printf("Error processing %s: %v\n", result.Filename, result.Error)
errorCount++
continue
}
fmt.Printf("%s: %d words\n", result.Filename, result.WordCount)
totalWords += result.WordCount
}
fmt.Printf("\nTotal: %d words across %d files (%d errors)\n",
totalWords, len(files)-errorCount, errorCount)
}
This pattern is production-ready. We handle errors gracefully, aggregate results, and use a separate goroutine to wait and close the channel, allowing the main goroutine to process results as they arrive.
Alternatives and When to Use Them
For simple synchronization, WaitGroup is perfect. But when you need error handling, cancellation, or limiting concurrency, consider alternatives.
The golang.org/x/sync/errgroup package extends WaitGroup with error propagation and context support:
package main
import (
"context"
"fmt"
"golang.org/x/sync/errgroup"
"time"
)
func main() {
// With WaitGroup - no error handling
// var wg sync.WaitGroup
// for i := 0; i < 3; i++ {
// wg.Add(1)
// go func(n int) {
// defer wg.Done()
// // Can't return errors
// }(i)
// }
// wg.Wait()
// With errgroup - clean error handling
g, ctx := errgroup.WithContext(context.Background())
for i := 0; i < 3; i++ {
i := i // Capture loop variable
g.Go(func() error {
if i == 2 {
return fmt.Errorf("task %d failed", i)
}
time.Sleep(100 * time.Millisecond)
fmt.Printf("Task %d completed\n", i)
return nil
})
}
if err := g.Wait(); err != nil {
fmt.Printf("Error occurred: %v\n", err)
}
_ = ctx // Context available for cancellation
}
Use WaitGroup when you just need to wait for goroutines. Use errgroup when you need error handling or cancellation. Use channels when goroutines need to communicate results. Use context when you need timeouts or cancellation signals.
sync.WaitGroup is a fundamental building block in Go’s concurrency toolkit. Master it, understand its patterns, avoid its pitfalls, and you’ll write robust concurrent code. The key is simplicity—increment before starting work, decrement when done, wait until zero. Everything else is just careful application of these principles.