Kubernetes StatefulSets: Stateful Application Deployment

Kubernetes Deployments work brilliantly for stateless applications where any pod is interchangeable. But the moment you need to run databases, message queues, or distributed systems with leader...

Key Insights

  • StatefulSets provide stable network identities and persistent storage guarantees that standard Deployments cannot offer, making them essential for databases, message queues, and distributed systems that require predictable pod names and stable storage.
  • Each pod in a StatefulSet gets its own PersistentVolumeClaim through volumeClaimTemplates, and these claims persist even when pods are deleted, ensuring data survives pod rescheduling and failures.
  • Ordered deployment and scaling operations in StatefulSets enable safe bootstrapping of clustered applications where initialization order matters, though this comes with slower rollout times compared to Deployments.

Introduction to StatefulSets vs Deployments

Kubernetes Deployments work brilliantly for stateless applications where any pod is interchangeable. But the moment you need to run databases, message queues, or distributed systems with leader election, Deployments fall apart. Why? Because stateful applications require three guarantees that Deployments don’t provide: stable network identities, persistent storage that follows the pod, and ordered deployment with predictable naming.

StatefulSets solve these problems by treating pods as unique, persistent entities rather than interchangeable replicas. Each pod gets a stable hostname, its own dedicated storage, and a predictable ordinal index. When a pod fails and gets rescheduled, it comes back with the same identity and reconnects to the same storage volume. This makes StatefulSets the correct primitive for any workload where pod identity matters.

The tradeoff is complexity and slower operations. While Deployments can scale up 10 pods simultaneously, StatefulSets scale one at a time by default. Updates are more cautious, and you need to understand DNS, headless services, and PersistentVolumeClaim lifecycle management.

StatefulSet Fundamentals

StatefulSets introduce several core concepts that differentiate them from Deployments. First, pod naming follows a predictable pattern: <statefulset-name>-<ordinal>. If you create a StatefulSet named postgres with 3 replicas, you get pods named postgres-0, postgres-1, and postgres-2. These names are stable across rescheduling.

Second, StatefulSets require a headless service (a service with clusterIP: None) to provide network identity. This service creates DNS records for each pod, allowing direct addressing of individual pods rather than load-balancing across all replicas.

Third, volumeClaimTemplates automatically create PersistentVolumeClaims for each pod. Unlike Deployments where you’d manually create PVCs and somehow map them to pods, StatefulSets handle this automatically.

Here’s a basic StatefulSet manifest:

apiVersion: v1
kind: Service
metadata:
  name: nginx-headless
spec:
  clusterIP: None
  selector:
    app: nginx
  ports:
  - port: 80
    name: web
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nginx
spec:
  serviceName: nginx-headless
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.21
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: data
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

This creates three nginx pods with stable names and dedicated storage. The headless service enables DNS-based discovery of individual pods.

Persistent Storage with StatefulSets

The volumeClaimTemplates section is where StatefulSets shine. When you define a volumeClaimTemplate, Kubernetes automatically creates a PVC for each pod. If you have 3 replicas, you get 3 PVCs named data-nginx-0, data-nginx-1, and data-nginx-2.

Critically, these PVCs persist even when you delete the StatefulSet. This is intentional—your data outlives the application. If you delete the StatefulSet and recreate it, the pods reconnect to their existing PVCs. To actually delete the data, you must manually delete the PVCs.

Here’s a more sophisticated example using different storage classes and multiple volumes:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mongodb
spec:
  serviceName: mongodb-headless
  replicas: 3
  selector:
    matchLabels:
      app: mongodb
  template:
    metadata:
      labels:
        app: mongodb
    spec:
      containers:
      - name: mongodb
        image: mongo:6.0
        ports:
        - containerPort: 27017
        volumeMounts:
        - name: data
          mountPath: /data/db
        - name: config
          mountPath: /data/configdb
        env:
        - name: MONGO_INITDB_ROOT_USERNAME
          value: admin
        - name: MONGO_INITDB_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mongodb-secret
              key: password
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 10Gi
  - metadata:
      name: config
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: standard
      resources:
        requests:
          storage: 1Gi

This creates two PVCs per pod—one for data using high-performance SSD storage, another for configuration using standard storage. The storage class determines the underlying provisioner and performance characteristics.

Deploying a Stateful Application (Database Example)

Let’s deploy a production-ready PostgreSQL cluster with replication. This demonstrates real-world StatefulSet usage including init containers, configuration management, and proper health checks.

First, the headless service and ConfigMap:

apiVersion: v1
kind: Service
metadata:
  name: postgres-headless
spec:
  clusterIP: None
  selector:
    app: postgres
  ports:
  - port: 5432
    name: postgres
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-config
data:
  primary.conf: |
    wal_level = replica
    max_wal_senders = 3
    max_replication_slots = 3
    hot_standby = on    
  replica.conf: |
    hot_standby = on
    primary_conninfo = 'host=postgres-0.postgres-headless port=5432 user=replicator'    

Now the StatefulSet with init containers:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres-headless
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      initContainers:
      - name: init-postgres
        image: postgres:15
        command:
        - bash
        - -c
        - |
          if [ "$POD_NAME" = "postgres-0" ]; then
            echo "Initializing primary"
            cp /config/primary.conf /var/lib/postgresql/data/postgresql.conf
          else
            echo "Initializing replica"
            cp /config/replica.conf /var/lib/postgresql/data/postgresql.conf
          fi          
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
        - name: config
          mountPath: /config
      containers:
      - name: postgres
        image: postgres:15
        ports:
        - containerPort: 5432
        env:
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: password
        - name: PGDATA
          value: /var/lib/postgresql/data/pgdata
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
        livenessProbe:
          exec:
            command:
            - pg_isready
            - -U
            - postgres
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command:
            - pg_isready
            - -U
            - postgres
          initialDelaySeconds: 5
          periodSeconds: 5
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 20Gi

The init container configures the first pod (postgres-0) as the primary and subsequent pods as replicas. This demonstrates how you can use pod ordinals to implement different roles within a cluster.

Scaling and Updates

Scaling StatefulSets is ordered and sequential. When scaling up from 3 to 5 replicas, Kubernetes creates postgres-3, waits for it to be ready, then creates postgres-4. When scaling down, it deletes in reverse order—postgres-4 first, then postgres-3.

# Scale up
kubectl scale statefulset postgres --replicas=5

# Scale down (deletes postgres-4, then postgres-3)
kubectl scale statefulset postgres --replicas=3

For updates, StatefulSets support two strategies: RollingUpdate (default) and OnDelete. RollingUpdate automatically updates pods in reverse ordinal order. OnDelete requires manual pod deletion to trigger updates.

The partition parameter enables canary deployments:

spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      partition: 2

With partition: 2, only pods with ordinal >= 2 get updated. Pods 0 and 1 stay on the old version. This lets you test updates on higher-ordinal pods before rolling out to the entire StatefulSet.

# Update the image
kubectl set image statefulset/postgres postgres=postgres:16

# Watch the rollout (only postgres-2 updates with partition: 2)
kubectl rollout status statefulset/postgres

# If successful, remove partition to update all pods
kubectl patch statefulset postgres -p '{"spec":{"updateStrategy":{"rollingUpdate":{"partition":0}}}}'

Service Discovery and Networking

The headless service creates DNS A records for each pod following the pattern: <pod-name>.<service-name>.<namespace>.svc.cluster.local. For our postgres example:

  • postgres-0.postgres-headless.default.svc.cluster.local
  • postgres-1.postgres-headless.default.svc.cluster.local
  • postgres-2.postgres-headless.default.svc.cluster.local

This enables applications to connect to specific pods, which is essential for leader election or when clients need to distinguish between primary and replica databases.

Here’s a demo application showing DNS resolution:

apiVersion: v1
kind: Pod
metadata:
  name: dns-test
spec:
  containers:
  - name: busybox
    image: busybox:1.35
    command:
    - sleep
    - "3600"
---
# Test DNS resolution
# kubectl exec -it dns-test -- nslookup postgres-0.postgres-headless
# kubectl exec -it dns-test -- nslookup postgres-headless

The individual pod DNS records return the pod’s IP, while the headless service returns all pod IPs. Applications can use this for service discovery without hardcoding IPs.

Best Practices and Troubleshooting

Always configure proper health checks. Readiness probes prevent traffic to pods that aren’t ready to serve requests, while liveness probes restart unhealthy pods:

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 60
  periodSeconds: 10
  failureThreshold: 3
readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5

For troubleshooting, these kubectl commands are invaluable:

# Check StatefulSet status
kubectl get statefulset postgres
kubectl describe statefulset postgres

# Check individual pods
kubectl get pods -l app=postgres
kubectl logs postgres-0
kubectl logs postgres-0 --previous  # Previous container logs if crashed

# Check PVCs
kubectl get pvc
kubectl describe pvc data-postgres-0

# Debug pod networking
kubectl exec -it postgres-0 -- nslookup postgres-headless
kubectl exec -it postgres-0 -- ping postgres-1.postgres-headless

For backups, implement a sidecar container that periodically snapshots data to object storage. Never rely solely on PersistentVolumes—they can fail. Use VolumeSnapshots or application-level backups.

Monitor PVC usage to prevent pods from running out of disk space. Set up alerts when volumes reach 80% capacity. Unlike Deployments where you can easily swap in larger volumes, StatefulSets require more careful capacity planning since each pod has its own PVC.

Finally, understand the deletion cascade behavior. Deleting a StatefulSet with kubectl delete statefulset postgres removes the pods but preserves PVCs. To delete everything including data, you must explicitly delete the PVCs afterward. This safety mechanism prevents accidental data loss but can lead to orphaned volumes if you’re not careful about cleanup.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.