Apache Spark - Spark on Kubernetes Tutorial
Kubernetes has become the dominant deployment platform for Spark workloads, and for good reason. Running Spark on Kubernetes gives you resource efficiency through bin-packing, simplified...
Key Insights
- Spark’s native Kubernetes integration eliminates the need for standalone cluster managers like YARN or Mesos, letting you run Spark workloads on the same infrastructure as your other containerized applications.
- Building custom Spark images and properly configuring RBAC is essential—most production issues stem from permission problems or missing dependencies in container images.
- Dynamic allocation on Kubernetes requires careful configuration of shuffle tracking and pod templates, but gets you elastic scaling that actually works.
Introduction & Prerequisites
Kubernetes has become the dominant deployment platform for Spark workloads, and for good reason. Running Spark on Kubernetes gives you resource efficiency through bin-packing, simplified infrastructure management, and the ability to share compute resources with other applications. No more maintaining separate YARN or Mesos clusters.
Since Spark 2.3, Kubernetes has been a first-class cluster manager. Spark 3.x made it production-ready with features like dynamic allocation, pod templates, and improved shuffle handling. If you’re still running Spark on YARN in 2024, you’re likely paying for infrastructure complexity you don’t need.
Before starting, ensure you have:
- A Kubernetes cluster (1.22+) with at least 8 CPU cores and 16GB RAM available
kubectlconfigured with cluster access- Apache Spark 3.4+ distribution downloaded locally
- Docker installed for building images
- A container registry you can push to (Docker Hub, ECR, GCR, etc.)
Architecture Overview
Spark on Kubernetes follows a straightforward model. When you submit an application, Spark creates a driver pod that runs your main program. The driver then requests executor pods from the Kubernetes API server, which schedules them across available nodes.
In cluster mode, the driver runs inside the cluster as a pod. This is what you want for production jobs—the driver survives network disconnections and can be managed by Kubernetes like any other workload.
In client mode, the driver runs on the machine where you execute spark-submit. Use this for interactive development and debugging, but never for production.
The lifecycle works like this:
spark-submitcreates the driver pod via Kubernetes API- Driver pod starts and initializes SparkContext
- Driver requests executor pods based on your configuration
- Kubernetes schedules executor pods across nodes
- Executors register with the driver and begin processing
- When the job completes, executor pods terminate
- Driver pod enters “Completed” state (or gets cleaned up)
Executor pods are ephemeral. They’re created when needed and destroyed when the job finishes or when dynamic allocation scales down.
Building Spark Docker Images
Spark ships with a script to build container images, but you’ll almost certainly need customization. The base images lack common dependencies, and you’ll want to include your application JARs.
Start with the built-in tool:
cd $SPARK_HOME
./bin/docker-image-tool.sh \
-r your-registry.com/spark \
-t 3.4.1 \
-p ./kubernetes/dockerfiles/spark/Dockerfile \
build
./bin/docker-image-tool.sh \
-r your-registry.com/spark \
-t 3.4.1 \
push
For production, create a custom Dockerfile that extends the base image:
FROM your-registry.com/spark/spark:3.4.1
USER root
# Install additional Python packages for PySpark
RUN pip3 install --no-cache-dir \
pandas==2.0.3 \
pyarrow==12.0.1 \
numpy==1.24.3
# Add custom JARs (Delta Lake, connectors, etc.)
COPY --chown=spark:spark jars/delta-core_2.12-2.4.0.jar /opt/spark/jars/
COPY --chown=spark:spark jars/aws-java-sdk-bundle-1.12.262.jar /opt/spark/jars/
COPY --chown=spark:spark jars/hadoop-aws-3.3.4.jar /opt/spark/jars/
# Add your application code
COPY --chown=spark:spark app/ /opt/spark/app/
USER spark
Build and push your custom image:
docker build -t your-registry.com/spark/spark-custom:3.4.1 .
docker push your-registry.com/spark/spark-custom:3.4.1
Configuring Kubernetes for Spark
Spark needs permissions to create and manage pods. The most common production issue I see is permission errors because someone skipped RBAC setup.
Create a dedicated namespace and service account:
# spark-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: spark-jobs
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: spark
namespace: spark-jobs
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: spark-role
namespace: spark-jobs
rules:
- apiGroups: [""]
resources: ["pods", "services", "configmaps", "persistentvolumeclaims"]
verbs: ["create", "get", "list", "watch", "delete", "patch"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: spark-role-binding
namespace: spark-jobs
subjects:
- kind: ServiceAccount
name: spark
namespace: spark-jobs
roleRef:
kind: Role
name: spark-role
apiGroup: rbac.authorization.k8s.io
Apply the configuration:
kubectl apply -f spark-namespace.yaml
For production clusters, add resource quotas to prevent runaway jobs from consuming all resources:
# spark-quota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: spark-quota
namespace: spark-jobs
spec:
hard:
requests.cpu: "32"
requests.memory: 64Gi
limits.cpu: "64"
limits.memory: 128Gi
pods: "50"
Submitting Spark Applications
With infrastructure ready, submit your first application. Here’s a complete example:
$SPARK_HOME/bin/spark-submit \
--master k8s://https://your-k8s-api-server:6443 \
--deploy-mode cluster \
--name my-spark-job \
--class com.example.MySparkApp \
--conf spark.kubernetes.namespace=spark-jobs \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=your-registry.com/spark/spark-custom:3.4.1 \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.driver.cores=2 \
--conf spark.driver.memory=4g \
--conf spark.executor.instances=3 \
--conf spark.executor.cores=2 \
--conf spark.executor.memory=4g \
--conf spark.kubernetes.driver.request.cores=2 \
--conf spark.kubernetes.executor.request.cores=2 \
local:///opt/spark/app/my-app.jar
Key points:
- The master URL uses
k8s://prefix followed by your API server address - Use
local://for JARs baked into the container image - Set both Spark memory (
spark.executor.memory) and Kubernetes resource requests - The
request.coressettings control Kubernetes scheduling
For PySpark applications:
$SPARK_HOME/bin/spark-submit \
--master k8s://https://your-k8s-api-server:6443 \
--deploy-mode cluster \
--name pyspark-job \
--conf spark.kubernetes.namespace=spark-jobs \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=your-registry.com/spark/spark-custom:3.4.1 \
--conf spark.kubernetes.pyspark.pythonVersion=3 \
--conf spark.driver.memory=4g \
--conf spark.executor.memory=4g \
--conf spark.executor.instances=3 \
local:///opt/spark/app/my_job.py
Advanced Configuration & Tuning
Dynamic allocation lets Spark scale executors based on workload. On Kubernetes, this requires shuffle tracking since there’s no external shuffle service by default:
--conf spark.dynamicAllocation.enabled=true \
--conf spark.dynamicAllocation.shuffleTracking.enabled=true \
--conf spark.dynamicAllocation.minExecutors=1 \
--conf spark.dynamicAllocation.maxExecutors=10 \
--conf spark.dynamicAllocation.executorIdleTimeout=60s \
--conf spark.dynamicAllocation.schedulerBacklogTimeout=10s
For jobs with large shuffles or checkpointing, mount persistent volumes:
# spark-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: spark-shuffle-pvc
namespace: spark-jobs
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: fast-ssd
Reference the PVC in your submit command:
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.shuffle-vol.mount.path=/data/shuffle \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.shuffle-vol.options.claimName=spark-shuffle-pvc \
--conf spark.local.dir=/data/shuffle
Use node selectors to target specific node pools:
--conf spark.kubernetes.node.selector.workload-type=spark \
--conf spark.kubernetes.driver.node.selector.workload-type=spark
Monitoring & Troubleshooting
Access the Spark UI via port-forwarding:
# Find the driver pod
kubectl get pods -n spark-jobs -l spark-role=driver
# Forward the UI port
kubectl port-forward -n spark-jobs spark-job-driver 4040:4040
Then open http://localhost:4040 in your browser.
Check driver logs when jobs fail:
# Stream logs from running driver
kubectl logs -f -n spark-jobs spark-job-driver
# Get logs from completed/failed driver
kubectl logs -n spark-jobs spark-job-driver --previous
Inspect executor issues:
# List all pods for a specific job
kubectl get pods -n spark-jobs -l spark-app-selector=spark-job
# Check why an executor failed
kubectl describe pod -n spark-jobs spark-job-exec-1
# Get executor logs
kubectl logs -n spark-jobs spark-job-exec-1
Common failures and fixes:
ImagePullBackOff: Your container registry requires authentication. Create an image pull secret and reference it with spark.kubernetes.container.image.pullSecrets.
OOMKilled: Executor memory settings don’t account for off-heap usage. Set spark.kubernetes.memoryOverheadFactor=0.4 for memory-intensive jobs.
Pending pods: Insufficient cluster resources or node selector constraints too restrictive. Check kubectl describe pod for scheduling failures.
Permission denied: RBAC misconfiguration. Verify the service account has the role binding and the namespace is correct.
Running Spark on Kubernetes simplifies operations significantly once you get past initial setup. The investment in proper RBAC, custom images, and monitoring pays off quickly when you’re managing dozens of jobs across a shared cluster.