Apache Spark - Cluster Manager Types (Standalone, YARN, Mesos, K8s)

Every Spark application needs somewhere to run. The cluster manager is the component that negotiates resources—CPU cores, memory, executors—between your Spark driver and the underlying cluster...

Key Insights

  • Match your cluster manager to your infrastructure: Standalone for dedicated Spark clusters, YARN for Hadoop shops, Kubernetes for cloud-native environments—don’t fight your existing ecosystem.
  • Kubernetes has won the modern deployment battle: Unless you’re deeply invested in Hadoop, K8s offers the best combination of flexibility, resource efficiency, and operational tooling.
  • Mesos is dead for Spark: Deprecated since Spark 3.2, don’t start new projects on it and plan migrations for existing ones.

Introduction to Spark Cluster Management

Every Spark application needs somewhere to run. The cluster manager is the component that negotiates resources—CPU cores, memory, executors—between your Spark driver and the underlying cluster infrastructure. It decides which machines run your tasks, handles failures, and manages the lifecycle of your distributed computation.

The choice of cluster manager isn’t just a deployment detail. It affects how you package applications, how resources get shared across workloads, what monitoring tools you can use, and how your operations team manages the infrastructure. Choose poorly and you’ll fight your environment for years.

Spark supports four cluster managers: Standalone (built-in), YARN (Hadoop ecosystem), Mesos (deprecated), and Kubernetes (the modern standard). Each serves different organizational contexts and infrastructure realities.

Standalone Mode

Standalone mode is Spark’s native cluster manager. It ships with Spark, requires minimal configuration, and works out of the box. If you’re running a dedicated Spark cluster with no other frameworks competing for resources, Standalone is the fastest path to production.

The architecture is simple: one Master node coordinates resource allocation, multiple Worker nodes register themselves and report available resources. When you submit an application, the Master assigns executors across Workers based on your resource requests.

Start a standalone cluster with the provided scripts:

# On the master node
$SPARK_HOME/sbin/start-master.sh

# On each worker node
$SPARK_HOME/sbin/start-worker.sh spark://master-host:7077

Configure workers via spark-env.sh:

# conf/spark-env.sh
SPARK_WORKER_CORES=16
SPARK_WORKER_MEMORY=64g
SPARK_WORKER_INSTANCES=1
SPARK_MASTER_HOST=master-host

Submit applications against the Standalone cluster:

spark-submit \
  --master spark://master-host:7077 \
  --deploy-mode cluster \
  --executor-memory 8g \
  --executor-cores 4 \
  --num-executors 10 \
  my-spark-app.jar

Standalone mode supports two scheduling policies: FIFO (default) and FAIR. For multi-tenant environments, configure fair scheduling in spark-defaults.conf:

spark.scheduler.mode=FAIR
spark.scheduler.allocation.file=/path/to/fairscheduler.xml

Limitations: Standalone lacks sophisticated resource sharing. If you need to run Spark alongside Flink, Presto, or other frameworks, you’ll need manual resource partitioning. High availability requires ZooKeeper configuration. There’s no container isolation—all executors share the host’s environment.

Use Standalone when you have dedicated hardware for Spark workloads and want minimal operational overhead.

YARN (Hadoop Integration)

If your organization runs Hadoop, YARN is the natural choice. Spark becomes another YARN application alongside MapReduce, Hive, and other Hadoop ecosystem tools. Resources get shared through YARN’s capacity or fair scheduler, and you leverage existing Hadoop security (Kerberos) and monitoring infrastructure.

YARN supports two deploy modes. In client mode, the driver runs on the machine where you submit the job—useful for interactive work. In cluster mode, the driver runs inside a YARN container—better for production pipelines.

# Client mode - driver runs locally
spark-submit \
  --master yarn \
  --deploy-mode client \
  --executor-memory 8g \
  --executor-cores 4 \
  --num-executors 20 \
  my-spark-app.jar

# Cluster mode - driver runs in YARN
spark-submit \
  --master yarn \
  --deploy-mode cluster \
  --executor-memory 8g \
  --executor-cores 4 \
  --num-executors 20 \
  my-spark-app.jar

Configure YARN-specific settings in spark-defaults.conf:

spark.yarn.queue=production
spark.yarn.submit.waitAppCompletion=true
spark.yarn.am.memory=2g
spark.yarn.am.cores=2

Enable dynamic allocation to scale executors based on workload:

spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.minExecutors=5
spark.dynamicAllocation.maxExecutors=100
spark.dynamicAllocation.executorIdleTimeout=60s
spark.shuffle.service.enabled=true

Dynamic allocation requires the external shuffle service. Configure it in yarn-site.xml:

<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>spark_shuffle,mapreduce_shuffle</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
  <value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>

YARN’s strength is resource governance. Use queues to isolate workloads and prevent runaway jobs from starving production pipelines. The downside: YARN adds operational complexity and ties you to the Hadoop ecosystem.

Apache Mesos (Legacy Option)

Mesos was once a compelling option for running multiple frameworks on shared infrastructure. Spark supported two modes: coarse-grained (static executor allocation) and fine-grained (dynamic task-level allocation).

However, Mesos support is deprecated as of Spark 3.2 and removed in Spark 4.0. Don’t start new projects on Mesos.

For existing deployments requiring migration planning, here’s the basic configuration:

spark-submit \
  --master mesos://mesos-master:5050 \
  --deploy-mode cluster \
  --executor-memory 8g \
  --conf spark.mesos.coarse=true \
  my-spark-app.jar

If you’re currently running Spark on Mesos, plan your migration to Kubernetes. The architectural concepts translate reasonably well—both use container-based isolation and dynamic resource allocation.

Kubernetes (Modern Standard)

Kubernetes has become the dominant platform for running Spark in cloud and modern on-premises environments. Native K8s support (added in Spark 2.3, production-ready since 2.4) treats executors as pods, leveraging K8s for scheduling, networking, and container management.

Submit to a Kubernetes cluster:

spark-submit \
  --master k8s://https://k8s-api-server:6443 \
  --deploy-mode cluster \
  --name my-spark-job \
  --conf spark.kubernetes.container.image=my-registry/spark:3.5.0 \
  --conf spark.kubernetes.namespace=spark-jobs \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
  --conf spark.executor.instances=10 \
  --conf spark.executor.memory=8g \
  --conf spark.executor.cores=4 \
  local:///opt/spark/jars/my-spark-app.jar

Key Kubernetes-specific configurations:

# Container configuration
spark.kubernetes.container.image=my-registry/spark:3.5.0
spark.kubernetes.container.image.pullPolicy=Always
spark.kubernetes.container.image.pullSecrets=my-registry-secret

# Resource requests and limits
spark.kubernetes.executor.request.cores=4
spark.kubernetes.executor.limit.cores=4
spark.kubernetes.memoryOverheadFactor=0.1

# Namespace and RBAC
spark.kubernetes.namespace=spark-jobs
spark.kubernetes.authenticate.driver.serviceAccountName=spark

# Node selection
spark.kubernetes.node.selector.workload-type=spark
spark.kubernetes.executor.podTemplateFile=/path/to/executor-template.yaml

Use pod templates for advanced configuration:

# executor-template.yaml
apiVersion: v1
kind: Pod
spec:
  containers:
    - name: spark-executor
      resources:
        requests:
          memory: "8Gi"
          cpu: "4"
        limits:
          memory: "10Gi"
          cpu: "4"
      volumeMounts:
        - name: spark-local-dir
          mountPath: /tmp/spark
  volumes:
    - name: spark-local-dir
      emptyDir:
        sizeLimit: 100Gi
  tolerations:
    - key: "spark-workload"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"

Enable dynamic allocation on Kubernetes:

spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.shuffleTracking.enabled=true
spark.dynamicAllocation.minExecutors=2
spark.dynamicAllocation.maxExecutors=50

Kubernetes advantages: container isolation, namespace-based multi-tenancy, integration with cloud-native tooling (Prometheus, Grafana, service meshes), and portability across cloud providers.

Comparison Matrix & Selection Guide

Factor Standalone YARN Kubernetes
Setup Complexity Low Medium Medium-High
Resource Sharing Limited Excellent Excellent
Container Isolation None Limited Full
Dynamic Allocation Basic Mature Good
Multi-tenancy Weak Strong Strong
Cloud Native No No Yes
Ecosystem Fit Spark-only Hadoop Cloud/Modern

Decision tree:

  1. Existing Hadoop cluster with YARN? → Use YARN
  2. Running Kubernetes already? → Use Kubernetes
  3. Dedicated Spark hardware, minimal ops overhead? → Use Standalone
  4. Greenfield cloud deployment? → Use Kubernetes
  5. Currently on Mesos? → Migrate to Kubernetes

Production Considerations

Monitoring: Standalone exposes metrics via the Spark UI and REST API. YARN integrates with Hadoop’s monitoring stack. Kubernetes works naturally with Prometheus—use the Spark metrics servlet or push to Pushgateway.

Fault tolerance: All cluster managers restart failed executors. For driver failures, cluster deploy mode on YARN and K8s allows automatic driver restart. Standalone requires external supervision (systemd, supervisord).

Security: YARN inherits Hadoop’s Kerberos authentication. Kubernetes uses RBAC and service accounts. Standalone requires manual SSL/authentication configuration.

Migration paths: Moving from Standalone to YARN or K8s is straightforward—your Spark code doesn’t change, only deployment configuration. YARN to K8s migrations require more planning around storage (HDFS access from K8s pods) and security model changes.

The trend is clear: Kubernetes adoption continues accelerating. If you’re making infrastructure decisions today, invest in K8s expertise. Your Spark deployments—and everything else—will benefit.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.