MLOps - Apache Spark Operator
By Bys on February 7, 2025
MLOps
Install Spark(Official)
values.yaml
spark:
jobNamespaces:
- spark
helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm repo update
helm install spark-operator spark-operator/spark-operator -n spark -f /Users/bys/workspace/kubernetes/mlops/spark/values.yaml
helm delete spark-operator -n spark
kubectl apply -f https://raw.githubusercontent.com/kubeflow/spark-operator/refs/heads/master/examples/spark-pi.yaml
Running Spark on Kubernetes 공식문서
Application 이 완료되면 executor 파드들은 종료되고 사라지지만, driver 파드는 logs를 위해 completed 상태로 유지된다. spark-submit cli를 통해서도 확인 가능.
Test
SparkApplication 을 배포하면 Driver 파드가 생성되며 드라이버 파드는 작업 플랜을 계획하고 API 서버로 다시 executors 파드 생성을 요청한다. Executors 파드들은 분산 처리를 하는 작업 단위이며 완료되면 driver 파드에게 보고하고 종료된다.
When the application completes, the executor pods terminate and are cleaned up, but the driver pod persists logs and remains in “completed” state in the Kubernetes API until it’s eventually garbage collected or manually cleaned up.
파이 계산 샘플
SparkApplication
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
name: spark-pi
namespace: spark
spec:
type: Scala
mode: cluster
image: spark:3.5.3
imagePullPolicy: IfNotPresent
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: local:///opt/spark/examples/jars/spark-examples.jar
arguments:
- "5000"
sparkVersion: 3.5.3
driver:
labels:
version: 3.5.3
cores: 1
memory: 512m
serviceAccount: spark-operator-spark
executor:
labels:
version: 3.5.3
instances: 1
cores: 1
memory: 512m
brew install apache-spark
spark-submit --status sparkjob:spark-pi-driver --master k8s://https://364455D087196228AE6E206BF4F48568.gr7.us-east-1.eks.amazonaws.com
mlops
spark
]