Go Graceful Shutdown: Signal Handling

Key Insights

Graceful shutdown prevents data loss and corruption by allowing in-flight requests to complete and ensuring proper resource cleanup before process termination
Go’s context package combined with sync.WaitGroup provides the foundation for coordinating shutdown across HTTP servers, background workers, and database connections
Production systems should implement shutdown timeouts to prevent indefinite hangs while still attempting graceful cleanup of all resources

Introduction to Graceful Shutdown

When a production application receives a termination signal—whether from a deployment, autoscaling event, or manual intervention—how it shuts down matters significantly. An abrupt termination can leave requests half-processed, database transactions uncommitted, and connections dangling. These aren’t theoretical concerns; they manifest as corrupted data, failed payments, and angry customers.

Consider an HTTP server handling a file upload when it receives SIGTERM. Without graceful shutdown, the upload terminates mid-stream, leaving partial data in your storage system. With graceful shutdown, the server stops accepting new connections, completes the upload, closes the connection cleanly, and only then exits.

Here’s the difference in practice:

// Bad: Abrupt shutdown
func main() {
    http.HandleFunc("/upload", handleUpload)
    log.Fatal(http.ListenAndServe(":8080", nil))
    // Process dies immediately on SIGTERM
}

// Good: Graceful shutdown
func main() {
    srv := &http.Server{Addr: ":8080"}
    http.HandleFunc("/upload", handleUpload)
    
    go func() {
        if err := srv.ListenAndServe(); err != http.ErrServerClosed {
            log.Fatalf("Server error: %v", err)
        }
    }()
    
    // Wait for interrupt signal
    quit := make(chan os.Signal, 1)
    signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
    <-quit
    
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()
    
    if err := srv.Shutdown(ctx); err != nil {
        log.Fatalf("Shutdown error: %v", err)
    }
}

The graceful version ensures existing uploads complete before the process exits.

Understanding OS Signals in Go

Operating systems communicate with processes through signals. The most common shutdown signals are:

SIGTERM: Polite termination request. Kubernetes sends this during pod shutdown.
SIGINT: Interrupt signal, typically from Ctrl+C. Used during local development.
SIGQUIT: Quit signal requesting core dump. Less common for graceful shutdown.

Go’s os/signal package provides a channel-based API for receiving signals. Instead of traditional signal handlers, you receive signals on a channel, allowing you to handle them in normal Go control flow.

package main

import (
    "fmt"
    "os"
    "os/signal"
    "syscall"
    "time"
)

func main() {
    // Create channel to receive signals
    sigChan := make(chan os.Signal, 1)
    
    // Register to receive SIGINT and SIGTERM
    signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
    
    fmt.Println("Running... Press Ctrl+C to stop")
    
    // Simulate work
    go func() {
        for {
            fmt.Println("Working...")
            time.Sleep(2 * time.Second)
        }
    }()
    
    // Block until signal received
    sig := <-sigChan
    fmt.Printf("\nReceived signal: %v\n", sig)
    fmt.Println("Shutting down gracefully...")
    
    // Cleanup would happen here
    time.Sleep(1 * time.Second)
    fmt.Println("Shutdown complete")
}

The buffer size of 1 in make(chan os.Signal, 1) is important—it prevents missing signals if one arrives before you’re ready to receive it.

Implementing HTTP Server Graceful Shutdown

The http.Server type provides a Shutdown() method specifically designed for graceful termination. When called, it:

Stops accepting new connections immediately
Waits for active connections to become idle
Closes all idle connections
Returns when complete or when the context expires

The context parameter lets you set a timeout, preventing indefinite waits if connections don’t close.

package main

import (
    "context"
    "fmt"
    "log"
    "net/http"
    "os"
    "os/signal"
    "syscall"
    "time"
)

func main() {
    mux := http.NewServeMux()
    
    mux.HandleFunc("/slow", func(w http.ResponseWriter, r *http.Request) {
        // Simulate slow request
        time.Sleep(10 * time.Second)
        fmt.Fprintf(w, "Completed at %s\n", time.Now().Format(time.RFC3339))
    })
    
    mux.HandleFunc("/fast", func(w http.ResponseWriter, r *http.Request) {
        fmt.Fprintf(w, "Fast response\n")
    })
    
    srv := &http.Server{
        Addr:    ":8080",
        Handler: mux,
    }
    
    // Start server in goroutine
    go func() {
        log.Println("Server starting on :8080")
        if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
            log.Fatalf("ListenAndServe error: %v", err)
        }
    }()
    
    // Setup signal handling
    quit := make(chan os.Signal, 1)
    signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
    
    <-quit
    log.Println("Shutdown signal received")
    
    // Create shutdown context with timeout
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()
    
    // Attempt graceful shutdown
    if err := srv.Shutdown(ctx); err != nil {
        log.Fatalf("Server forced shutdown: %v", err)
    }
    
    log.Println("Server stopped gracefully")
}

If a request to /slow is in progress when you send SIGTERM, the server waits up to 30 seconds for it to complete. New requests receive connection refused errors immediately.

Managing Background Workers and Goroutines

HTTP servers aren’t the only components requiring graceful shutdown. Background workers processing queues, periodic tasks, and database connections all need coordinated cleanup.

Use context.Context for cancellation signaling and sync.WaitGroup for tracking goroutine completion:

package main

import (
    "context"
    "fmt"
    "log"
    "os"
    "os/signal"
    "sync"
    "syscall"
    "time"
)

type Worker struct {
    id int
}

func (w *Worker) Process(ctx context.Context, wg *sync.WaitGroup) {
    defer wg.Done()
    
    log.Printf("Worker %d starting", w.id)
    
    ticker := time.NewTicker(2 * time.Second)
    defer ticker.Stop()
    
    for {
        select {
        case <-ctx.Done():
            log.Printf("Worker %d received shutdown signal", w.id)
            // Cleanup work here
            time.Sleep(500 * time.Millisecond) // Simulate cleanup
            log.Printf("Worker %d stopped", w.id)
            return
        case <-ticker.C:
            log.Printf("Worker %d processing task", w.id)
            // Simulate work
        }
    }
}

func main() {
    ctx, cancel := context.WithCancel(context.Background())
    var wg sync.WaitGroup
    
    // Start worker pool
    numWorkers := 3
    for i := 0; i < numWorkers; i++ {
        wg.Add(1)
        worker := &Worker{id: i}
        go worker.Process(ctx, &wg)
    }
    
    // Wait for shutdown signal
    quit := make(chan os.Signal, 1)
    signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
    <-quit
    
    log.Println("Initiating shutdown...")
    
    // Cancel context to signal all workers
    cancel()
    
    // Wait for all workers with timeout
    done := make(chan struct{})
    go func() {
        wg.Wait()
        close(done)
    }()
    
    select {
    case <-done:
        log.Println("All workers stopped gracefully")
    case <-time.After(10 * time.Second):
        log.Println("Shutdown timeout exceeded, forcing exit")
    }
}

This pattern scales to any number of workers. The context cancellation propagates immediately, while the WaitGroup ensures you don’t exit until all workers have completed cleanup.

Production-Ready Patterns

Real applications combine multiple components. Here’s a complete skeleton showing coordinated shutdown:

package main

import (
    "context"
    "database/sql"
    "log"
    "net/http"
    "os"
    "os/signal"
    "sync"
    "syscall"
    "time"
)

type Application struct {
    server *http.Server
    db     *sql.DB
    wg     sync.WaitGroup
}

func (app *Application) Shutdown(ctx context.Context) error {
    log.Println("Starting graceful shutdown...")
    
    // Shutdown order matters: stop accepting traffic first
    if err := app.server.Shutdown(ctx); err != nil {
        log.Printf("HTTP server shutdown error: %v", err)
    }
    
    // Wait for background workers
    done := make(chan struct{})
    go func() {
        app.wg.Wait()
        close(done)
    }()
    
    select {
    case <-done:
        log.Println("Background workers stopped")
    case <-ctx.Done():
        log.Println("Worker shutdown timeout exceeded")
    }
    
    // Close database connections last
    if err := app.db.Close(); err != nil {
        log.Printf("Database close error: %v", err)
        return err
    }
    
    log.Println("Shutdown complete")
    return nil
}

func (app *Application) backgroundWorker(ctx context.Context) {
    defer app.wg.Done()
    
    ticker := time.NewTicker(5 * time.Second)
    defer ticker.Stop()
    
    for {
        select {
        case <-ctx.Done():
            return
        case <-ticker.C:
            // Process background tasks
            log.Println("Background task executed")
        }
    }
}

func main() {
    // Initialize components
    db, _ := sql.Open("postgres", "connection-string")
    
    mux := http.NewServeMux()
    mux.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {
        w.WriteHeader(http.StatusOK)
    })
    
    app := &Application{
        server: &http.Server{Addr: ":8080", Handler: mux},
        db:     db,
    }
    
    // Start background workers
    ctx, cancel := context.WithCancel(context.Background())
    app.wg.Add(1)
    go app.backgroundWorker(ctx)
    
    // Start HTTP server
    go func() {
        if err := app.server.ListenAndServe(); err != http.ErrServerClosed {
            log.Fatalf("Server error: %v", err)
        }
    }()
    
    log.Println("Application started")
    
    // Wait for shutdown signal
    quit := make(chan os.Signal, 1)
    signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
    <-quit
    
    // Cancel background workers
    cancel()
    
    // Graceful shutdown with timeout
    shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer shutdownCancel()
    
    if err := app.Shutdown(shutdownCtx); err != nil {
        log.Fatalf("Shutdown failed: %v", err)
        os.Exit(1)
    }
}

Key principles: stop accepting new work first, allow existing work to complete, clean up resources in reverse dependency order (database connections last).

Testing Graceful Shutdown

Testing shutdown behavior ensures your production systems behave correctly under termination:

package main

import (
    "context"
    "net/http"
    "syscall"
    "testing"
    "time"
)

func TestGracefulShutdown(t *testing.T) {
    srv := &http.Server{Addr: ":8081"}
    
    requestComplete := make(chan bool)
    
    http.HandleFunc("/test", func(w http.ResponseWriter, r *http.Request) {
        time.Sleep(2 * time.Second)
        w.WriteHeader(http.StatusOK)
        requestComplete <- true
    })
    
    go srv.ListenAndServe()
    
    // Give server time to start
    time.Sleep(100 * time.Millisecond)
    
    // Start a request
    go http.Get("http://localhost:8081/test")
    
    // Trigger shutdown while request is in flight
    time.Sleep(500 * time.Millisecond)
    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
    defer cancel()
    
    shutdownComplete := make(chan bool)
    go func() {
        srv.Shutdown(ctx)
        shutdownComplete <- true
    }()
    
    // Verify request completes before shutdown
    select {
    case <-requestComplete:
        t.Log("Request completed successfully")
    case <-time.After(3 * time.Second):
        t.Fatal("Request did not complete")
    }
    
    select {
    case <-shutdownComplete:
        t.Log("Shutdown completed successfully")
    case <-time.After(6 * time.Second):
        t.Fatal("Shutdown did not complete")
    }
}

This test verifies that in-flight requests complete before shutdown finishes—the core promise of graceful shutdown.

Graceful shutdown isn’t optional for production systems. It’s the difference between deployments that occasionally corrupt data and ones that don’t. Implement it once, correctly, and every deployment becomes safer.