Spark spark-submit 提交应用程序

Spark支持三种集群管理方式

Standalone—Spark自带的一种集群管理方式，易于构建集群。
Apache Mesos—通用的集群管理，可以在其上运行Hadoop MapReduce和一些服务应用。
Hadoop YARN—Hadoop2中的资源管理器。

注意：
1、在集群不是特别大，并且没有mapReduce和Spark同时运行的需求的情况下，用Standalone模式效率最高。
2、Spark可以在应用间（通过集群管理器）和应用中（如果一个SparkContext中有多项计算任务）进行资源调度。

Running Spark on YARN

cluster mode

./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
lib/spark-examples*.jar \
10

client mode

./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
lib/spark-examples*.jar \
10

spark-submit 详细参数说明

参数名参数说明—mastermaster 的地址，提交任务到哪里执行，例如 spark://host:port, yarn, local。具体指可参考下面关于Master_URL的列表—deploy-mode在本地 (client) 启动 driver 或在 cluster 上启动，默认是 client—class应用程序的主类，仅针对 java 或 scala 应用—name应用程序的名称—jars用逗号分隔的本地 jar 包，设置后，这些 jar 将包含在 driver 和 executor 的 classpath 下—packages包含在driver 和executor 的 classpath 中的 jar 的 maven 坐标—exclude-packages为了避免冲突而指定不包含的 package—repositories远程 repository—conf PROP=VALUE指定 spark 配置属性的值，例如 -conf spark.executor.extraJavaOptions=”-XX:MaxPermSize=256m”—properties-file加载的配置文件，默认为 conf/spark-defaults.conf—driver-memoryDriver内存，默认 1G—driver-java-options传给 driver 的额外的 Java 选项—driver-library-path传给 driver 的额外的库路径—driver-class-path传给 driver 的额外的类路径—driver-coresDriver 的核数，默认是1。在 yarn 或者 standalone 下使用—executor-memory每个 executor 的内存，默认是1G—total-executor-cores所有 executor 总共的核数。仅仅在 mesos 或者 standalone 下使用—num-executors启动的 executor 数量。默认为2。在 yarn 下使用—executor-core每个 executor 的核数。在yarn或者standalone下使用

Master_URL的值

Master URL含义local使用1个worker线程在本地运行Spark应用程序local[K]使用K个worker线程在本地运行Spark应用程序local使用所有剩余worker线程在本地运行Spark应用程序spark://HOST:PORT连接到Spark Standalone集群，以便在该集群上运行Spark应用程序mesos://HOST:PORT连接到Mesos集群，以便在该集群上运行Spark应用程序yarn-client以client方式连接到YARN集群，集群的定位由环境变量HADOOP_CONF_DIR定义，该方式driver在client运行。yarn-cluster以cluster方式连接到YARN集群，集群的定位由环境变量HADOOP_CONF_DIR定义，该方式driver也在集群中运行。

区分client，cluster，本地模式

下图是典型的client模式，spark的drive在任务提交的本机上。
spark client 运行模式

下图是cluster模式，spark drive在yarn上。
spark cluster 运行模式

三种模式的比较

Yarn ClusterYarn ClientSpark StandaloneDriver在哪里运行Application MasterClientClient谁请求资源Application MasterApplication MasterClient谁启动executor进程Yarn NodeManagerYarn NodeManagerSpark Slave驻内存进程1.Yarn ResourceManager 2.NodeManager1.Yarn ResourceManager 2.NodeManager1.Spark Master 2.Spark Worker是否支持Spark ShellNoYesYes

spark-submit提交应用程序示例

# Run application locally on 8 cores(本地模式8核)
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master local[8] \
  /path/to/examples.jar \
  100
# Run on a Spark standalone cluster in client deploy mode(standalone client模式)
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \
  1000
# Run on a Spark standalone cluster in cluster deploy mode with supervise(standalone cluster模式使用supervise)
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --deploy-mode cluster \
  --supervise \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \
  1000
# Run on a YARN cluster(YARN cluster模式)
export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode cluster \  # can be client for client mode
  --executor-memory 20G \
  --num-executors 50 \
  /path/to/examples.jar \
  1000
# Run on a Mesos cluster in cluster deploy mode with supervise(Mesos cluster模式使用supervise)
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master mesos://207.184.161.138:7077 \
  --deploy-mode cluster \
  --supervise \
  --executor-memory 20G \
  --total-executor-cores 100 \
  http://path/to/examples.jar \
  1000
# Run a Python application on a Spark standalone cluster(standalone cluster模式提交python application)
./bin/spark-submit \
  --master spark://207.184.161.138:7077 \
  examples/src/main/python/pi.py \
  1000

一个例子

spark-submit \--masteryarn\--queue root.sparkstreaming \
--deploy-mode cluster \--supervise\--name spark-job \
--num-executors 20\
--executor-cores 2\
--executor-memory 4g \--confspark.dynamicAllocation.maxExecutors=9\--files commons.xml \--class com.***.realtime.helper.HelperHandle \
BSS-ONSS-Spark-Realtime-1.0-SNAPSHOT.jar 500

标签： spark 大数据分布式

本文转载自: https://blog.csdn.net/king14bhhb/article/details/136990008
版权归原作者 不二人生 所有，如有侵权，请联系我们删除。

Spark spark-submit 提交应用程序