Kubernetes Deployments: Rolling Updates and Rollbacks

Key Insights

Kubernetes rolling updates enable zero-downtime deployments by gradually replacing old pods with new ones, controlled by maxSurge and maxUnavailable parameters that determine update speed and availability guarantees.
Every deployment change creates a new ReplicaSet and stores revision history, allowing instant rollbacks to any previous version using kubectl rollout undo when issues arise in production.
Proper health checks (readiness and liveness probes) are critical for safe rolling updates—without them, Kubernetes will route traffic to broken pods and consider failed deployments as successful.

Introduction to Kubernetes Deployments

Kubernetes Deployments are the standard way to manage stateless applications in production. They provide declarative updates for Pods and ReplicaSets, handling the complexity of rolling out changes while maintaining application availability. Unlike managing Pods directly, Deployments give you version control, automated rollouts, and the ability to rollback when things go wrong.

The rolling update strategy is what makes Deployments powerful for production systems. Instead of taking down your entire application to deploy new code, Kubernetes incrementally replaces old pods with new ones. This means your users experience zero downtime during deployments.

Here’s a basic Deployment manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
  labels:
    app: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: nginx
        image: nginx:1.21
        ports:
        - containerPort: 80

This creates three replicas of an nginx container. When you update the image tag, Kubernetes automatically performs a rolling update.

Understanding Rolling Update Strategy

Rolling updates work by creating a new ReplicaSet with the updated pod template while gradually scaling down the old ReplicaSet. This process is controlled by two critical parameters:

maxSurge: Maximum number of pods that can be created above the desired replica count during updates
maxUnavailable: Maximum number of pods that can be unavailable during updates

These parameters can be absolute numbers or percentages. The default values are 25% for both, which provides a balanced approach between update speed and resource usage.

Here’s how to configure rolling update behavior:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2
      maxUnavailable: 1
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: app
        image: myapp:v2

With this configuration and 10 replicas, Kubernetes will:

Create up to 2 new pods (maxSurge)
Allow only 1 pod to be unavailable at a time
Ensure at least 9 pods are always running (10 - maxUnavailable)
Never exceed 12 total pods (10 + maxSurge)

To trigger an update, change the image tag:

kubectl set image deployment/web-app app=myapp:v3

# Monitor the rollout
kubectl rollout status deployment/web-app

The update process flows like this: Kubernetes creates new pods, waits for them to pass readiness checks, starts routing traffic to them, then terminates old pods. This cycle repeats until all pods run the new version.

Configuring Update Strategies

Kubernetes supports two deployment strategies: RollingUpdate and Recreate. The Recreate strategy terminates all old pods before creating new ones—simple but causes downtime. Use it only for development environments or when your application can’t run multiple versions simultaneously.

For production, stick with RollingUpdate but configure it properly with health checks:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
spec:
  replicas: 5
  revisionHistoryLimit: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
    spec:
      containers:
      - name: api
        image: api-server:v1.5.0
        ports:
        - containerPort: 8080
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          failureThreshold: 3
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10

The readinessProbe is crucial—it tells Kubernetes when a pod is ready to receive traffic. Without it, new pods get traffic immediately, potentially before your application has finished initializing. The livenessProbe detects crashed or deadlocked containers.

Setting maxUnavailable: 0 guarantees zero downtime but requires extra resources since you’ll have both old and new pods running simultaneously. The revisionHistoryLimit controls how many old ReplicaSets to keep for rollback purposes—10 is a reasonable default.

Monitoring Update Progress

Watching your rollout in real-time prevents surprises. The kubectl rollout status command blocks until the deployment completes or fails:

kubectl rollout status deployment/web-app
# Output: Waiting for deployment "web-app" rollout to finish: 2 out of 5 new replicas have been updated...

For detailed information, use kubectl describe:

kubectl describe deployment web-app

This shows deployment conditions, events, and the state of old and new ReplicaSets. Look for conditions like Progressing and Available. A healthy rollout shows:

Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable

If a rollout gets stuck, you’ll see it stop progressing. Common causes include:

Pods failing readiness checks
Insufficient cluster resources
Image pull errors
Invalid configuration

Check pod status to diagnose issues:

kubectl get pods -l app=web-app
kubectl logs <pod-name>
kubectl describe pod <pod-name>

Performing Rollbacks

Rollbacks are your safety net when deployments go wrong. Kubernetes stores revision history, making rollbacks instantaneous—it just scales up an old ReplicaSet and scales down the current one.

View rollout history:

kubectl rollout history deployment/web-app

# Output:
# REVISION  CHANGE-CAUSE
# 1         <none>
# 2         kubectl set image deployment/web-app app=myapp:v2
# 3         kubectl set image deployment/web-app app=myapp:v3

To add meaningful change descriptions, use the --record flag (deprecated but still useful) or add annotations:

kubectl annotate deployment/web-app kubernetes.io/change-cause="Update to v3 with bug fixes"

Rollback to the previous version:

kubectl rollout undo deployment/web-app

Or rollback to a specific revision:

kubectl rollout undo deployment/web-app --to-revision=2

Rollbacks happen immediately and follow the same rolling update strategy, ensuring zero downtime. Monitor the rollback just like a regular deployment:

kubectl rollout status deployment/web-app

Best Practices and Common Pitfalls

Always set resource requests and limits. During rolling updates, you temporarily run more pods than your replica count. Without resource limits, updates can overwhelm your cluster:

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Use PodDisruptionBudgets (PDBs) to prevent updates from breaking your application’s availability guarantees:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: web-app

This ensures at least 2 pods remain available during voluntary disruptions like updates or node drains.

Test updates in staging first. Your staging environment should mirror production. Deploy there, run integration tests, then promote to production.

Never skip health checks. This is the most common mistake. Without readiness probes, Kubernetes can’t distinguish between healthy and broken pods:

# DON'T DO THIS - no health checks
containers:
- name: app
  image: myapp:latest  # Also don't use 'latest'
  
# DO THIS - proper health checks and versioned images
containers:
- name: app
  image: myapp:v1.2.3
  readinessProbe:
    httpGet:
      path: /ready
      port: 8080
    initialDelaySeconds: 5
    periodSeconds: 3
  livenessProbe:
    httpGet:
      path: /health
      port: 8080
    initialDelaySeconds: 15
    periodSeconds: 10

Here’s a production-ready Deployment with all best practices:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: production-app
  annotations:
    kubernetes.io/change-cause: "Initial deployment v1.0.0"
spec:
  replicas: 5
  revisionHistoryLimit: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: production-app
  template:
    metadata:
      labels:
        app: production-app
        version: v1.0.0
    spec:
      containers:
      - name: app
        image: myregistry/production-app:v1.0.0
        ports:
        - containerPort: 8080
          name: http
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        env:
        - name: LOG_LEVEL
          value: "info"

Conclusion

Rolling updates are Kubernetes’ answer to zero-downtime deployments. By understanding maxSurge and maxUnavailable, you control the speed and safety of your deployments. Proper health checks ensure Kubernetes only routes traffic to healthy pods. Revision history gives you instant rollbacks when problems occur.

For most applications, rolling updates are the right choice. However, if you need more control or want to test new versions with real traffic before fully committing, consider blue-green or canary deployment strategies using tools like Flagger or Argo Rollouts.

The key is starting with solid fundamentals: version your images, define health checks, set resource limits, and test in staging. Master these basics, and you’ll deploy confidently to production every time.