0


基于容器云提交spark job任务

容器云提交spark job任务

容器云提交Kind=Job类型的spark任务,首先需要申请具有Job任务提交权限的rbac,然后编写对应的yaml文件,通过spark内置的spark-submit命令,提交用户程序(jar包)到集群执行。

1、创建任务job提交权限rbac

创建rbac账户,并分配资源权限,Pod服务账户创建参考,kubernetes api查询命令(kubectl api-resources);

  1. cat> ecc-recommend-rbac.yaml <<EOF
  2. ---
  3. apiVersion: v1
  4. kind: Namespace
  5. metadata:
  6. name: item-dev-recommend
  7. labels:
  8. name: item-dev-recommend
  9. ---
  10. #基于namespace创建服务账户spark-cdp
  11. apiVersion: v1
  12. kind: ServiceAccount
  13. metadata:
  14. name: spark-cdp
  15. namespace: item-dev-recommend
  16. ---
  17. #创建角色资源权限
  18. apiVersion: rbac.authorization.k8s.io/v1
  19. kind: Role
  20. metadata:
  21. name: spark-cdp
  22. namespace: item-dev-recommend
  23. rules:
  24. - apiGroups:
  25. - ""
  26. resources:
  27. - pods
  28. verbs:
  29. - '*'
  30. - apiGroups:
  31. - ""
  32. resources:
  33. - configmaps
  34. verbs:
  35. - '*'
  36. - apiGroups:
  37. - ""
  38. resources:
  39. - services
  40. - secrets
  41. verbs:
  42. - create
  43. - get
  44. - delete
  45. - apiGroups:
  46. - extensions
  47. resources:
  48. - ingresses
  49. verbs:
  50. - create
  51. - get
  52. - delete
  53. - apiGroups:
  54. - ""
  55. resources:
  56. - nodes
  57. verbs:
  58. - get
  59. - apiGroups:
  60. - ""
  61. resources:
  62. - resourcequotas
  63. verbs:
  64. - get
  65. - list
  66. - watch
  67. - apiGroups:
  68. - ""
  69. resources:
  70. - events
  71. verbs:
  72. - create
  73. - update
  74. - patch
  75. - apiGroups:
  76. - apiextensions.k8s.io
  77. resources:
  78. - customresourcedefinitions
  79. verbs:
  80. - create
  81. - get
  82. - update
  83. - delete
  84. - apiGroups:
  85. - admissionregistration.k8s.io
  86. resources:
  87. - mutatingwebhookconfigurations
  88. - validatingwebhookconfigurations
  89. verbs:
  90. - create
  91. - get
  92. - update
  93. - delete
  94. - apiGroups:
  95. - sparkoperator.k8s.io
  96. resources:
  97. - sparkapplications
  98. - scheduledsparkapplications
  99. - sparkapplications/status
  100. - scheduledsparkapplications/status
  101. verbs:
  102. - '*'
  103. - apiGroups:
  104. - scheduling.volcano.sh
  105. resources:
  106. - podgroups
  107. - queues
  108. - queues/status
  109. verbs:
  110. - get
  111. - list
  112. - watch
  113. - create
  114. - delete
  115. - update
  116. - apiGroups:
  117. - batch
  118. resources:
  119. - cronjobs
  120. - jobs
  121. verbs:
  122. - '*'
  123. ---
  124. #服务账户spark-cdp绑定角色
  125. apiVersion: rbac.authorization.k8s.io/v1
  126. kind: RoleBinding
  127. metadata:
  128. name: spark-cdp
  129. namespace: item-dev-recommend
  130. roleRef:
  131. apiGroup: rbac.authorization.k8s.io
  132. kind: Role
  133. name: spark-cdp
  134. subjects:
  135. - kind: ServiceAccount
  136. name: spark-cdp
  137. EOF

2、spark pv,pvc

  • 构建pv 挂载NFS,定义pv访问模式(accessModes)和存储容量(capacity);
  1. cat>ecc-recommend-pv.yaml <<EOF
  2. apiVersion: v1
  3. kind: PersistentVolume
  4. metadata:
  5. name: dev-cdp-pv01
  6. namespace: item-dev-recommend
  7. spec:
  8. capacity:
  9. storage: 10Gi
  10. accessModes:
  11. #访问三种模式:ReadWriteOnce,ReadOnlyMany,ReadWriteMany
  12. - ReadWriteOnce
  13. nfs:
  14. path: /data/nfs
  15. server: 192.168.0.135
  16. EOF
  • 构建pvc
  1. cat>ecc-recommend-pvc.yaml <<EOF
  2. apiVersion: v1
  3. kind: PersistentVolumeClaim
  4. metadata:
  5. name: dev-cdp-pvc01
  6. namespace: item-dev-recommend
  7. spec:
  8. accessModes:
  9. #匹配模式
  10. - ReadWriteOnce
  11. resources:
  12. requests:
  13. storage: 10Gi
  14. EOF

3、spark-submit任务提交

将java/scala程序包开发完成后,通过spark-submit命令提交jar包到集群执行。

  1. cat>ecc-recommend-sparksubmit.yaml <<EOF
  2. ---
  3. apiVersion: batch/v1
  4. kind: Job
  5. metadata:
  6. name: item-recommend-job
  7. namespace: item-dev-recommend
  8. labels:
  9. k8s-app: item-recommend-job
  10. spec:
  11. template:
  12. metadata:
  13. labels:
  14. k8s-app: item-recommend-job
  15. spec:
  16. containers:
  17. name: item-recommend-job
  18. - args:
  19. - /opt/spark/bin/spark-submit
  20. - --class
  21. - com.www.ecc.com.recommend.ItemRecommender
  22. - --master
  23. - k8s://https:/$(KUBERNETES_SERVICE_HOST):$(KUBERNETES_SERVICE_PORT)
  24. - --name
  25. - item-recommend-job
  26. - --jars
  27. - /opt/spark/jars/spark-cassandra-connector_2.11-2.3.4.jar
  28. - --conf
  29. - spark.kubernetes.authenticate.caCertFile=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  30. - --conf
  31. - spark.kubernetes.authenticate.oauthTokenFile=/var/run/secrets/kubernetes.io/serviceaccount/token
  32. - --conf
  33. - spark.kubernetes.driver.limit.cores=3
  34. - --conf
  35. - spark.kubernetes.executor.limit.cores=8
  36. - --conf
  37. - spark.kubernetes.driver.limit.memory=5g
  38. - --conf
  39. - spark.kubernetes.executor.limit.memory=32g
  40. - --conf
  41. - spark.executor.instances=8
  42. - --conf
  43. - spark.sql.crossJoin.enable=true
  44. - --conf
  45. - spark.executor.cores=6
  46. - --conf
  47. - spark.executor.memory=32g
  48. - --conf
  49. - spark.driver.cores=3
  50. - --conf
  51. - spark.dirver.memory=5g
  52. - --conf
  53. - spark.sql.autoBroadcastJoinThreshold=-1
  54. - --conf
  55. - spark.kubernetes.namespace=item-dev-recommend
  56. - --conf
  57. - spark.driver.port=45970
  58. - --conf
  59. - spark.blockManager.port=45980
  60. - --conf
  61. - spark.kubernetes.container.image=acpimagehub.ecc.cn/spark:3.11
  62. - --conf
  63. - spark.executor.extraJavaOptions="-Duser.timezone=GMT+08:00"
  64. - --conf
  65. - spark.driver.extraJavaOptions="-Duser.timezone=GMT+08:00"
  66. - --conf
  67. - spark.default.parallelism=500
  68. - /odsdata/item-recommender-1.0.0-SNAPSHOT.jar
  69. - env:
  70. - name: SPARK_SHUFFLE_PARTITIONS
  71. value: "100"
  72. - name: CASSANDR_HOST
  73. value: "192.168.0.1,192.168.0.2,192.168.0.3"
  74. - name: CASSANDRA_PORT
  75. value: "9042"
  76. - name: AUTH_USERNAME
  77. value: "user"
  78. - name: AUTH_PASSWORD
  79. value: "123456"
  80. image: acpimagehub.ecc.cn/spark:3.11
  81. imagePullPolicy: IfNotPresent
  82. ports:
  83. - containerPort: 9000
  84. name: 9000tcp2
  85. protocol: TCP
  86. resources:
  87. limits:
  88. cpu: "3"
  89. memory: 2Gi
  90. requests:
  91. cpu: "3"
  92. memory: 2Gi
  93. volumeMounts:
  94. - mountPath: /odsdata
  95. name: item-spark-pvc
  96. volumes:
  97. - name: item-spark-pvc
  98. persistentVolumeClaim:
  99. claimName: dev-cdp-pvc01
  100. dnsPolicy: ClusterFirst
  101. restartPolicy: Never
  102. hostname: item-recommend-job
  103. securityContext: {}
  104. serviceAccountName: spark-cdp
  105. ---
  106. apiVersion: v1
  107. kind: Service
  108. metadata:
  109. name: item-recommend-job
  110. namespace: item-dev-recommend
  111. spec:
  112. type: NodePort
  113. ports:
  114. - name: sparkjob-tcp4040
  115. port: 4040
  116. protocol: TCP
  117. targetPort: 4040
  118. #spark driver port
  119. - name: sparkjob-tcp-45970
  120. port: 45970
  121. protocol: TCP
  122. targetPort: 45970
  123. #spark ui
  124. - name: sparkjob-tcp-48080
  125. port: 48080
  126. protocol: TCP
  127. targetPort: 48080
  128. #spark executor port
  129. - name: sparkjob-tcp-45980
  130. port: 45980
  131. protocol: TCP
  132. targetPort: 45980
  133. selector:
  134. k8s-app: item-recommend-job
  135. EOF

4、打包插件小记

  1. <build><resources><resource><directory>src/main/resources</directory><includes><include>*.properties</include></includes><filtering>false</filtering></resource></resources><plugins><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-surefire-plugin</artifactId><configuration><skipTests>true</skipTests></configuration></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-compiler-plugin</artifactId><version>3.6.1</version><configuration><source>${java.version}</source><target>${java.version}</target><encoding>${project.build.sourceEncoding}</encoding></configuration><executions><execution><phase>compile</phase><goals><goal>compile</goal></goals></execution></executions></plugin><plugin><groupId>net.alchim31.maven</groupId><artifactId>scala-maven-plugin</artifactId><version>3.2.2</version><executions><execution><id>scala-compile-first</id><phase>process-resources</phase><goals><goal>add-source</goal><goal>compile</goal><goal>testCompile</goal></goals></execution></executions></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-shade-plugin</artifactId><version>3.2.1</version><executions><execution><phase>package</phase><goals><goal>shade</goal></goals></execution></executions></plugin></plugins></build>
标签: spark scala 大数据

本文转载自: https://blog.csdn.net/software444/article/details/129337814
版权归原作者 茅台技术人 所有, 如有侵权,请联系我们删除。

“基于容器云提交spark job任务”的评论:

还没有评论