Go Graceful Shutdown: Signal Handling
When a production application receives a termination signal—whether from a deployment, autoscaling event, or manual intervention—how it shuts down matters significantly. An abrupt termination can...
Key Insights
- Graceful shutdown prevents data loss and corruption by allowing in-flight requests to complete and ensuring proper resource cleanup before process termination
- Go’s
contextpackage combined withsync.WaitGroupprovides the foundation for coordinating shutdown across HTTP servers, background workers, and database connections - Production systems should implement shutdown timeouts to prevent indefinite hangs while still attempting graceful cleanup of all resources
Introduction to Graceful Shutdown
When a production application receives a termination signal—whether from a deployment, autoscaling event, or manual intervention—how it shuts down matters significantly. An abrupt termination can leave requests half-processed, database transactions uncommitted, and connections dangling. These aren’t theoretical concerns; they manifest as corrupted data, failed payments, and angry customers.
Consider an HTTP server handling a file upload when it receives SIGTERM. Without graceful shutdown, the upload terminates mid-stream, leaving partial data in your storage system. With graceful shutdown, the server stops accepting new connections, completes the upload, closes the connection cleanly, and only then exits.
Here’s the difference in practice:
// Bad: Abrupt shutdown
func main() {
http.HandleFunc("/upload", handleUpload)
log.Fatal(http.ListenAndServe(":8080", nil))
// Process dies immediately on SIGTERM
}
// Good: Graceful shutdown
func main() {
srv := &http.Server{Addr: ":8080"}
http.HandleFunc("/upload", handleUpload)
go func() {
if err := srv.ListenAndServe(); err != http.ErrServerClosed {
log.Fatalf("Server error: %v", err)
}
}()
// Wait for interrupt signal
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
<-quit
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := srv.Shutdown(ctx); err != nil {
log.Fatalf("Shutdown error: %v", err)
}
}
The graceful version ensures existing uploads complete before the process exits.
Understanding OS Signals in Go
Operating systems communicate with processes through signals. The most common shutdown signals are:
- SIGTERM: Polite termination request. Kubernetes sends this during pod shutdown.
- SIGINT: Interrupt signal, typically from Ctrl+C. Used during local development.
- SIGQUIT: Quit signal requesting core dump. Less common for graceful shutdown.
Go’s os/signal package provides a channel-based API for receiving signals. Instead of traditional signal handlers, you receive signals on a channel, allowing you to handle them in normal Go control flow.
package main
import (
"fmt"
"os"
"os/signal"
"syscall"
"time"
)
func main() {
// Create channel to receive signals
sigChan := make(chan os.Signal, 1)
// Register to receive SIGINT and SIGTERM
signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
fmt.Println("Running... Press Ctrl+C to stop")
// Simulate work
go func() {
for {
fmt.Println("Working...")
time.Sleep(2 * time.Second)
}
}()
// Block until signal received
sig := <-sigChan
fmt.Printf("\nReceived signal: %v\n", sig)
fmt.Println("Shutting down gracefully...")
// Cleanup would happen here
time.Sleep(1 * time.Second)
fmt.Println("Shutdown complete")
}
The buffer size of 1 in make(chan os.Signal, 1) is important—it prevents missing signals if one arrives before you’re ready to receive it.
Implementing HTTP Server Graceful Shutdown
The http.Server type provides a Shutdown() method specifically designed for graceful termination. When called, it:
- Stops accepting new connections immediately
- Waits for active connections to become idle
- Closes all idle connections
- Returns when complete or when the context expires
The context parameter lets you set a timeout, preventing indefinite waits if connections don’t close.
package main
import (
"context"
"fmt"
"log"
"net/http"
"os"
"os/signal"
"syscall"
"time"
)
func main() {
mux := http.NewServeMux()
mux.HandleFunc("/slow", func(w http.ResponseWriter, r *http.Request) {
// Simulate slow request
time.Sleep(10 * time.Second)
fmt.Fprintf(w, "Completed at %s\n", time.Now().Format(time.RFC3339))
})
mux.HandleFunc("/fast", func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Fast response\n")
})
srv := &http.Server{
Addr: ":8080",
Handler: mux,
}
// Start server in goroutine
go func() {
log.Println("Server starting on :8080")
if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
log.Fatalf("ListenAndServe error: %v", err)
}
}()
// Setup signal handling
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
<-quit
log.Println("Shutdown signal received")
// Create shutdown context with timeout
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
// Attempt graceful shutdown
if err := srv.Shutdown(ctx); err != nil {
log.Fatalf("Server forced shutdown: %v", err)
}
log.Println("Server stopped gracefully")
}
If a request to /slow is in progress when you send SIGTERM, the server waits up to 30 seconds for it to complete. New requests receive connection refused errors immediately.
Managing Background Workers and Goroutines
HTTP servers aren’t the only components requiring graceful shutdown. Background workers processing queues, periodic tasks, and database connections all need coordinated cleanup.
Use context.Context for cancellation signaling and sync.WaitGroup for tracking goroutine completion:
package main
import (
"context"
"fmt"
"log"
"os"
"os/signal"
"sync"
"syscall"
"time"
)
type Worker struct {
id int
}
func (w *Worker) Process(ctx context.Context, wg *sync.WaitGroup) {
defer wg.Done()
log.Printf("Worker %d starting", w.id)
ticker := time.NewTicker(2 * time.Second)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
log.Printf("Worker %d received shutdown signal", w.id)
// Cleanup work here
time.Sleep(500 * time.Millisecond) // Simulate cleanup
log.Printf("Worker %d stopped", w.id)
return
case <-ticker.C:
log.Printf("Worker %d processing task", w.id)
// Simulate work
}
}
}
func main() {
ctx, cancel := context.WithCancel(context.Background())
var wg sync.WaitGroup
// Start worker pool
numWorkers := 3
for i := 0; i < numWorkers; i++ {
wg.Add(1)
worker := &Worker{id: i}
go worker.Process(ctx, &wg)
}
// Wait for shutdown signal
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
<-quit
log.Println("Initiating shutdown...")
// Cancel context to signal all workers
cancel()
// Wait for all workers with timeout
done := make(chan struct{})
go func() {
wg.Wait()
close(done)
}()
select {
case <-done:
log.Println("All workers stopped gracefully")
case <-time.After(10 * time.Second):
log.Println("Shutdown timeout exceeded, forcing exit")
}
}
This pattern scales to any number of workers. The context cancellation propagates immediately, while the WaitGroup ensures you don’t exit until all workers have completed cleanup.
Production-Ready Patterns
Real applications combine multiple components. Here’s a complete skeleton showing coordinated shutdown:
package main
import (
"context"
"database/sql"
"log"
"net/http"
"os"
"os/signal"
"sync"
"syscall"
"time"
)
type Application struct {
server *http.Server
db *sql.DB
wg sync.WaitGroup
}
func (app *Application) Shutdown(ctx context.Context) error {
log.Println("Starting graceful shutdown...")
// Shutdown order matters: stop accepting traffic first
if err := app.server.Shutdown(ctx); err != nil {
log.Printf("HTTP server shutdown error: %v", err)
}
// Wait for background workers
done := make(chan struct{})
go func() {
app.wg.Wait()
close(done)
}()
select {
case <-done:
log.Println("Background workers stopped")
case <-ctx.Done():
log.Println("Worker shutdown timeout exceeded")
}
// Close database connections last
if err := app.db.Close(); err != nil {
log.Printf("Database close error: %v", err)
return err
}
log.Println("Shutdown complete")
return nil
}
func (app *Application) backgroundWorker(ctx context.Context) {
defer app.wg.Done()
ticker := time.NewTicker(5 * time.Second)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
// Process background tasks
log.Println("Background task executed")
}
}
}
func main() {
// Initialize components
db, _ := sql.Open("postgres", "connection-string")
mux := http.NewServeMux()
mux.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
})
app := &Application{
server: &http.Server{Addr: ":8080", Handler: mux},
db: db,
}
// Start background workers
ctx, cancel := context.WithCancel(context.Background())
app.wg.Add(1)
go app.backgroundWorker(ctx)
// Start HTTP server
go func() {
if err := app.server.ListenAndServe(); err != http.ErrServerClosed {
log.Fatalf("Server error: %v", err)
}
}()
log.Println("Application started")
// Wait for shutdown signal
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
<-quit
// Cancel background workers
cancel()
// Graceful shutdown with timeout
shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 30*time.Second)
defer shutdownCancel()
if err := app.Shutdown(shutdownCtx); err != nil {
log.Fatalf("Shutdown failed: %v", err)
os.Exit(1)
}
}
Key principles: stop accepting new work first, allow existing work to complete, clean up resources in reverse dependency order (database connections last).
Testing Graceful Shutdown
Testing shutdown behavior ensures your production systems behave correctly under termination:
package main
import (
"context"
"net/http"
"syscall"
"testing"
"time"
)
func TestGracefulShutdown(t *testing.T) {
srv := &http.Server{Addr: ":8081"}
requestComplete := make(chan bool)
http.HandleFunc("/test", func(w http.ResponseWriter, r *http.Request) {
time.Sleep(2 * time.Second)
w.WriteHeader(http.StatusOK)
requestComplete <- true
})
go srv.ListenAndServe()
// Give server time to start
time.Sleep(100 * time.Millisecond)
// Start a request
go http.Get("http://localhost:8081/test")
// Trigger shutdown while request is in flight
time.Sleep(500 * time.Millisecond)
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
shutdownComplete := make(chan bool)
go func() {
srv.Shutdown(ctx)
shutdownComplete <- true
}()
// Verify request completes before shutdown
select {
case <-requestComplete:
t.Log("Request completed successfully")
case <-time.After(3 * time.Second):
t.Fatal("Request did not complete")
}
select {
case <-shutdownComplete:
t.Log("Shutdown completed successfully")
case <-time.After(6 * time.Second):
t.Fatal("Shutdown did not complete")
}
}
This test verifies that in-flight requests complete before shutdown finishes—the core promise of graceful shutdown.
Graceful shutdown isn’t optional for production systems. It’s the difference between deployments that occasionally corrupt data and ones that don’t. Implement it once, correctly, and every deployment becomes safer.