0


【云原生之kubernetes实战】在k8s环境下部署Spark分布式计算平台

【云原生之kubernetes实战】在k8s环境下部署Spark分布式计算平台

一、Spark介绍

1.Spark简介

Spark是分布式计算平台,是一个用scala语言编写的计算框架,基于内存的快速、通用、可扩展的大数据分析引擎。

2.Spark作用

Apache Spark 是一个快速的,通用的集群计算系统。它对 Java,Scala,Python 和 R 提供了的高层 API,并有一个经优化的支持通用执行图计算的引擎。它还支持一组丰富的高级工具,包括用于 SQL 和结构化数据处理的 Spark SQL,用于机器学习的 MLlib,用于图计算的 GraphX 和 Spark Streaming。

二、检查本地集群状态

1.检查工作节点状态

  1. [root@master ~]# kubectl get nodes
  2. NAME STATUS ROLES AGE VERSION
  3. master Ready control-plane 19d v1.24.0
  4. node01 Ready <none> 19d v1.24.0
  5. node02 Ready <none> 19d v1.24.0

2.检查k8s版本

  1. [root@master ~]# kubectl version --short
  2. Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
  3. Client Version: v1.24.0
  4. Kustomize Version: v4.5.4
  5. Server Version: v1.24.0

二、安装helm工具

1.下载helm软件包

  1. [root@master mysql]# wget https://get.helm.sh/helm-v3.9.0-linux-amd64.tar.gz
  2. --2022-10-22 19:10:12-- https://get.helm.sh/helm-v3.9.0-linux-amd64.tar.gz
  3. Resolving get.helm.sh (get.helm.sh)... 152.199.39.108, 2606:2800:247:1cb7:261b:1f9c:2074:3c
  4. Connecting to get.helm.sh (get.helm.sh)|152.199.39.108|:443... connected.
  5. HTTP request sent, awaiting response... 200 OK
  6. Length: 13952532(13M)[application/x-tar]
  7. Saving to: helm-v3.9.0-linux-amd64.tar.gz
  8. 100%[========================================================================================>]13,952,532 16.7MB/s in0.8s
  9. 2022-10-22 19:10:17 (16.7 MB/s) - helm-v3.9.0-linux-amd64.tar.gz saved [13952532/13952532]

2.解压压缩包

  1. [root@master mysql]# tar -xzf helm-v3.9.0-linux-amd64.tar.gz[root@master mysql]# ls
  2. helm-v3.9.0-linux-amd64.tar.gz linux-amd64

3.复制二进制文件

  1. [root@master linux-amd64]# ls
  2. helm LICENSE README.md
  3. [root@master linux-amd64]# cp -a helm /usr/bin/[root@master linux-amd64]#

4.检查helm版本

  1. [root@master linux-amd64]# helm version
  2. version.BuildInfo{Version:"v3.9.0", GitCommit:"7ceeda6c585217a19a1131663d8cd1f7d641b2a7", GitTreeState:"clean", GoVersion:"go1.17.5"}

5.helm命令补全

  1. [root@master spark]# helm completion bash > .helmrc && echo "source .helmrc" >> .bashrc[root@master mysql]# source .helmrc[root@master mysql]#

三、安装nfs服务器

1.安装nfs软件

  1. yum install -y nfs-utils

2.创建共享目录

  1. mkdir -p /nfs &&chmod766 -R /nfs

3配置共享目录

  1. echo"/nfs/ *(insecure,rw,sync,no_root_squash)"> /etc/exports

4.使nfs配置生效

  1. exportfs -r

5.设置nfs服务开机自启

  1. systemctl enable --now rpcbind
  2. systemctl enable --now nfs-server

6.其他节点检查nfs共享情况

  1. [root@node01 ~]# showmount -e 192.168.3.90
  2. Export list for192.168.3.90:
  3. /nfs *

四、部署storageclass

1.编辑sc.yaml

  1. [root@master spark]# cat sc.yaml
  2. apiVersion: storage.k8s.io/v1
  3. kind: StorageClass
  4. metadata:
  5. name: nfs-storage
  6. annotations:
  7. storageclass.kubernetes.io/is-default-class: "true"
  8. provisioner: k8s-sigs.io/nfs-subdir-external-provisioner
  9. parameters:
  10. archiveOnDelete: "true"## 删除pv的时候,pv的内容是否要备份
  11. ---
  12. apiVersion: apps/v1
  13. kind: Deployment
  14. metadata:
  15. name: nfs-client-provisioner
  16. labels:
  17. app: nfs-client-provisioner
  18. # replace with namespace where provisioner is deployed
  19. namespace: default
  20. spec:
  21. replicas: 1
  22. strategy:
  23. type: Recreate
  24. selector:
  25. matchLabels:
  26. app: nfs-client-provisioner
  27. template:
  28. metadata:
  29. labels:
  30. app: nfs-client-provisioner
  31. spec:
  32. serviceAccountName: nfs-client-provisioner
  33. containers:
  34. - name: nfs-client-provisioner
  35. image: registry.cn-hangzhou.aliyuncs.com/lfy_k8s_images/nfs-subdir-external-provisioner:v4.0.2
  36. # resources:# limits:# cpu: 10m# requests:# cpu: 10m
  37. volumeMounts:
  38. - name: nfs-client-root
  39. mountPath: /persistentvolumes
  40. env:
  41. - name: PROVISIONER_NAME
  42. value: k8s-sigs.io/nfs-subdir-external-provisioner
  43. - name: NFS_SERVER
  44. value: 192.168.3.90 ## 指定自己nfs服务器地址
  45. - name: NFS_PATH
  46. value: /nfs ## nfs服务器共享的目录
  47. volumes:
  48. - name: nfs-client-root
  49. nfs:
  50. server: 192.168.3.90
  51. path: /nfs
  52. ---
  53. apiVersion: v1
  54. kind: ServiceAccount
  55. metadata:
  56. name: nfs-client-provisioner
  57. # replace with namespace where provisioner is deployed
  58. namespace: default
  59. ---
  60. kind: ClusterRole
  61. apiVersion: rbac.authorization.k8s.io/v1
  62. metadata:
  63. name: nfs-client-provisioner-runner
  64. rules:
  65. - apiGroups: [""]
  66. resources: ["nodes"]
  67. verbs: ["get", "list", "watch"]
  68. - apiGroups: [""]
  69. resources: ["persistentvolumes"]
  70. verbs: ["get", "list", "watch", "create", "delete"]
  71. - apiGroups: [""]
  72. resources: ["persistentvolumeclaims"]
  73. verbs: ["get", "list", "watch", "update"]
  74. - apiGroups: ["storage.k8s.io"]
  75. resources: ["storageclasses"]
  76. verbs: ["get", "list", "watch"]
  77. - apiGroups: [""]
  78. resources: ["events"]
  79. verbs: ["create", "update", "patch"]
  80. ---
  81. kind: ClusterRoleBinding
  82. apiVersion: rbac.authorization.k8s.io/v1
  83. metadata:
  84. name: run-nfs-client-provisioner
  85. subjects:
  86. - kind: ServiceAccount
  87. name: nfs-client-provisioner
  88. # replace with namespace where provisioner is deployed
  89. namespace: default
  90. roleRef:
  91. kind: ClusterRole
  92. name: nfs-client-provisioner-runner
  93. apiGroup: rbac.authorization.k8s.io
  94. ---
  95. kind: Role
  96. apiVersion: rbac.authorization.k8s.io/v1
  97. metadata:
  98. name: leader-locking-nfs-client-provisioner
  99. # replace with namespace where provisioner is deployed
  100. namespace: default
  101. rules:
  102. - apiGroups: [""]
  103. resources: ["endpoints"]
  104. verbs: ["get", "list", "watch", "create", "update", "patch"]
  105. ---
  106. kind: RoleBinding
  107. apiVersion: rbac.authorization.k8s.io/v1
  108. metadata:
  109. name: leader-locking-nfs-client-provisioner
  110. # replace with namespace where provisioner is deployed
  111. namespace: default
  112. subjects:
  113. - kind: ServiceAccount
  114. name: nfs-client-provisioner
  115. # replace with namespace where provisioner is deployed
  116. namespace: default
  117. roleRef:
  118. kind: Role
  119. name: leader-locking-nfs-client-provisioner
  120. apiGroup: rbac.authorization.k8s.io

2.应用sc.yaml文件

  1. [root@master spark]# kubectl apply -f sc.yaml
  2. storageclass.storage.k8s.io/nfs-storage created
  3. deployment.apps/nfs-client-provisioner created
  4. serviceaccount/nfs-client-provisioner created
  5. clusterrole.rbac.authorization.k8s.io/nfs-client-provisioner-runner created
  6. clusterrolebinding.rbac.authorization.k8s.io/run-nfs-client-provisioner created
  7. role.rbac.authorization.k8s.io/leader-locking-nfs-client-provisioner created
  8. rolebinding.rbac.authorization.k8s.io/leader-locking-nfs-client-provisioner created

3.检查storageclass资源对象状态

  1. [root@master spark]# kubectl get storageclasses.storage.k8s.io
  2. NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
  3. nfs-storage (default) k8s-sigs.io/nfs-subdir-external-provisioner Delete Immediate false 81s

五、添加helm仓库源

1.添加helm仓库

  1. [root@master spark]# helm repo add bitnami https://charts.bitnami.com/bitnami/"bitnami" has been added to your repositories

2.查看helm仓库列表

  1. [root@master spark]# helm repo list
  2. NAME URL
  3. bitnami https://charts.bitnami.com/bitnami
  4. azure http://mirror.azure.cn/kubernetes/charts/
  5. incubator https://aliacs-app-catalog.oss-cn-hangzhou.aliyuncs.com/charts-incubator/

3.更新helm仓库

  1. [root@master spark]# helm repo update
  2. Hang tight while we grab the latest from your chart repositories...
  3. ...Successfully got an update from the "incubator" chart repository
  4. ...Successfully got an update from the "azure" chart repository
  5. ...Successfully got an update from the "bitnami" chart repository
  6. Update Complete. Happy Helming!⎈

4.搜索关于Spark的资源

  1. [root@master spark]# helm search repo spark
  2. NAME CHART VERSION APP VERSION DESCRIPTION
  3. azure/spark 1.0.5 1.5.1 DEPRECATED - Fast and general-purpose cluster c...
  4. azure/spark-history-server 1.4.3 2.4.0 DEPRECATED - A Helm chart for Spark History Server
  5. bitnami/spark 6.3.6 3.3.0 Apache Spark is a high-performance engine for l...
  6. incubator/ack-spark-history-server 0.5.0 2.4.5 A Helm chart for Spark History Server
  7. incubator/ack-spark-operator 0.1.16 2.4.5 A Helm chart for Spark on Kubernetes operator
  8. bitnami/dataplatform-bp1 12.0.2 1.0.1 DEPRECATED This Helm chart can be used for the ...
  9. bitnami/dataplatform-bp2 12.0.5 1.0.1 DEPRECATED This Helm chart can be used for the ...
  10. azure/luigi 2.7.8 2.7.2 DEPRECATED Luigi is a Python module that helps ...

六、安装Spark

1.下载chart

  1. [root@master spark]# helm pull bitnami/spark[root@master spark]# ls
  2. spark-6.3.6.tgz
  3. [root@master spark]# tar -xzf spark-6.3.6.tgz [root@master spark]# ls
  4. spark spark-6.3.6.tgz

2.修改values.yaml

修改部分

  1. service:
  2. ## @param service.type Kubernetes Service type##
  3. type: NodePort
  4. ## @param service.ports.http Spark client port for HTTP## @param service.ports.https Spark client port for HTTPS## @param service.ports.cluster Spark cluster port##
  5. ports:
  6. http: 80
  7. https: 443
  8. cluster: 7077## Specify the nodePort(s) value(s) for the LoadBalancer and NodePort service types.## ref: https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport## @param service.nodePorts.http Kubernetes web node port for HTTP## @param service.nodePorts.https Kubernetes web node port for HTTPS## @param service.nodePorts.cluster Kubernetes cluster node port##
  9. nodePorts:

3.helm安装Spark应用

  1. [root@master spark]# helm install myspark ./spark
  2. NAME: myspark
  3. LAST DEPLOYED: Sun Oct 23 00:05:40 2022
  4. NAMESPACE: default
  5. STATUS: deployed
  6. REVISION: 1
  7. TEST SUITE: None
  8. NOTES:
  9. CHART NAME: spark
  10. CHART VERSION: 6.3.6
  11. APP VERSION: 3.3.0
  12. ** Please be patient while the chart is being deployed **
  13. 1. Get the Spark master WebUI URL by running these commands:
  14. exportNODE_PORT=$(kubectl get --namespace default -o jsonpath="{.spec.ports[?(@.name=='http')].nodePort}" services myspark-master-svc)exportNODE_IP=$(kubectl get nodes --namespace default -o jsonpath="{.items[0].status.addresses[0].address}")echo http://$NODE_IP:$NODE_PORT2. Submit an application to the cluster:
  15. To submit an application to the cluster the spark-submit script must be used. That script can be
  16. obtained at https://github.com/apache/spark/tree/master/bin. Also you can use kubectl run.
  17. Run the commands below to obtain the master IP and submit your application.
  18. exportEXAMPLE_JAR=$(kubectl exec -ti --namespace default myspark-worker-0 -- find examples/jars/ -name 'spark-example*\.jar'|tr -d '\r')exportSUBMIT_PORT=$(kubectl get --namespace default -o jsonpath="{.spec.ports[?(@.name=='cluster')].nodePort}" services myspark-master-svc)exportSUBMIT_IP=$(kubectl get nodes --namespace default -o jsonpath="{.items[0].status.addresses[0].address}")
  19. kubectl run --namespace default myspark-client --rm --tty -i --restart='Never'\
  20. --image docker.io/bitnami/spark:3.3.0-debian-11-r40 \
  21. -- spark-submit --master spark://$SUBMIT_IP:$SUBMIT_PORT\
  22. --class org.apache.spark.examples.SparkPi \
  23. --deploy-mode cluster \$EXAMPLE_JAR1000

2.检查pod状态

  1. [root@master spark]# kubectl get pod
  2. NAME READY STATUS RESTARTS AGE
  3. my-tomcat9 1/1 Running 2(5h55m ago) 19d
  4. myspark-master-0 1/1 Running 0 36m
  5. myspark-worker-0 1/1 Running 0 36m
  6. myspark-worker-1 1/1 Running 0 33m
  7. nfs-client-provisioner-8dcd8c766-2bptf 1/1 Running 0 5h16m

3.检查svc

  1. [root@master spark]# kubectl get svc
  2. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  3. kubernetes ClusterIP 10.96.0.1 <none>443/TCP 20d
  4. myspark-headless ClusterIP None <none><none> 36m
  5. myspark-master-svc NodePort 10.96.2.220 <none>7077:32573/TCP,80:31295/TCP 36m

4.删除spark应用

  1. helm delete --purge myspark

七、访问Spark的Web UI

worker 实例当前为 2 个。

在这里插入图片描述

八、新增worker 实例数量

1.修改values.yaml

更改values.yaml中 replicaCount为3

  1. replicaCount: 3## Kubernetes Pods Security Context## https://kubernetes.io/docs/tasks/configure-pod-container/security-context/## @param worker.podSecurityContext.enabled Enable security context## @param worker.podSecurityContext.fsGroup Group ID for the container## @param worker.podSecurityContext.runAsUser User ID for the container## @param worker.podSecurityContext.runAsGroup Group ID for the container## @param worker.podSecurityContext.seLinuxOptions SELinux options for the container##
  2. podSecurityContext:
  3. enabled: true
  4. fsGroup: 1001
  5. runAsUser: 1001
  6. runAsGroup: 0
  7. seLinuxOptions: {}

2.使用helm更新spark应用

  1. [root@master spark]# helm upgrade myspark ./spark
  2. Release "myspark" has been upgraded. Happy Helming!
  3. NAME: myspark
  4. LAST DEPLOYED: Sun Oct 23 00:52:36 2022
  5. NAMESPACE: default
  6. STATUS: deployed
  7. REVISION: 3
  8. TEST SUITE: None
  9. NOTES:
  10. CHART NAME: spark
  11. CHART VERSION: 6.3.6
  12. APP VERSION: 3.3.0
  13. ** Please be patient while the chart is being deployed **
  14. 1. Get the Spark master WebUI URL by running these commands:
  15. exportNODE_PORT=$(kubectl get --namespace default -o jsonpath="{.spec.ports[?(@.name=='http')].nodePort}" services myspark-master-svc)exportNODE_IP=$(kubectl get nodes --namespace default -o jsonpath="{.items[0].status.addresses[0].address}")echo http://$NODE_IP:$NODE_PORT2. Submit an application to the cluster:
  16. To submit an application to the cluster the spark-submit script must be used. That script can be
  17. obtained at https://github.com/apache/spark/tree/master/bin. Also you can use kubectl run.
  18. Run the commands below to obtain the master IP and submit your application.
  19. exportEXAMPLE_JAR=$(kubectl exec -ti --namespace default myspark-worker-0 -- find examples/jars/ -name 'spark-example*\.jar'|tr -d '\r')exportSUBMIT_PORT=$(kubectl get --namespace default -o jsonpath="{.spec.ports[?(@.name=='cluster')].nodePort}" services myspark-master-svc)exportSUBMIT_IP=$(kubectl get nodes --namespace default -o jsonpath="{.items[0].status.addresses[0].address}")
  20. kubectl run --namespace default myspark-client --rm --tty -i --restart='Never'\
  21. --image docker.io/bitnami/spark:3.3.0-debian-11-r40 \
  22. -- spark-submit --master spark://$SUBMIT_IP:$SUBMIT_PORT\
  23. --class org.apache.spark.examples.SparkPi \
  24. --deploy-mode cluster \$EXAMPLE_JAR1000

3.检查pod状态

  1. [root@master spark]# kubectl get pods
  2. NAME READY STATUS RESTARTS AGE
  3. my-tomcat9 1/1 Running 2(6h7m ago) 20d
  4. my-wordpress-9585b7f4d-5lfzn 1/1 Running 1(78m ago) 82m
  5. my-wordpress-mariadb-0 1/1 Running 0 82m
  6. myspark-master-0 1/1 Running 0 48m
  7. myspark-worker-0 1/1 Running 0 48m
  8. myspark-worker-1 1/1 Running 0 45m
  9. myspark-worker-2 1/1 Running 0 82s
  10. nfs-client-provisioner-8dcd8c766-2bptf 1/1 Running 0 5h28m

4.查看spark的Web UI中worker数量

在这里插入图片描述


本文转载自: https://blog.csdn.net/jks212454/article/details/127469862
版权归原作者 江湖有缘 所有, 如有侵权,请联系我们删除。

“【云原生之kubernetes实战】在k8s环境下部署Spark分布式计算平台”的评论:

还没有评论