前置环境
Hadoop集群必须部署完成,如果还没有搭建请先前往>>Hadoop全分布搭建笔记
程序版本
scala-2.11.8spark-2.0.0-bin-hadoop2.6
官网下载:Index of /dist/spark
组件介绍
Spark是加州大学伯克利分校AMP实验室(Algorithms, Machines, and People Lab)开发的通用内存并行计算框架。
Spark使用Scala语言进行实现,它是一种面向对象、函数式编程语言,能够像操作本地集合对象一样轻松地操作分布式数据集。
操作流程
上传Scala & Spark
使用FTP工具(xftp)上传Scala和Spark的程序包到master
[root@master ~]# ls
scala-2.11.8.tgz spark-2.0.0-bin-hadoop2.6.tgz
解压Scala & Spark
[root@master ~]# tar xf scala-2.11.8.tgz -C /usr/local/src/
[root@master ~]# tar xf spark-2.0.0-bin-hadoop2.6.tgz -C /usr/local/src/
解压后,cd进入解压目录
[root@master ~]# cd /usr/local/src/
[root@master src]# ls
hadoop jdk scala-2.11.8 spark-2.0.0-bin-hadoop2.6
修改文件夹名称
[root@master src]# mv scala-2.11.8/ scala
[root@master src]# mv spark-2.0.0-bin-hadoop2.6/ spark
[root@master src]# ls
hadoop jdk scala spark
配置环境变量
使环境变量仅对root用户生效,编辑root用户的环境变量
[root@master src]# vi /root/.bash_profile
修改后的文件状态
.bash_profile
Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
User specific environment and startup programs
export JAVA_HOME=/usr/local/src/jdk
export HADOOP_HOME=/usr/local/src/hadoop
export SCALA_HOME=/usr/local/src/scala
export SPARK_HOME=/usr/local/src/spark
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin
export PATH
生效环境变量
[root@master src]# source /root/.bash_profile
配置Spark
进入Spark配置目录:cd /usr/local/src/spark/conf/
复制Spark配置模板:cp spark-env.sh.template spark-env.sh
编辑Spark配置文件:vi spark-env.sh
添加配置文件
提供Java、Hadoop、Scala相关文件位置
export JAVA_HOME=/usr/local/src/jdk
export HADOOP_HOME=/usr/local/src/hadoop
export HADOOP_CONF_DIR=/usr/local/src/hadoop/etc/hadoop
export SCALA_HOME=/usr/local/src/scala
设置主节点名称 (主机名或IP地址都可)
export SPARK_MASTER_HOST=master
设置主节点端口
export SPARK_MASTER_PORT=7077
设置每个Worker使用的核心数 (默认使用所有核心)
export SPARK_WORKER_CORES=1
设置每个Worker使用的内存大小 (默认使用1G内存)
export SPARK_WORKER_MEMORY=1G
参考文献:https://www.cnblogs.com/xupccc/p/9800380.html
配置Slaves文件
[root@master conf]# mv slaves.template slaves
[root@master conf]# vi slaves
文件里有 localhost 的请先删掉
slave1
slave2
拷贝配置好的程序到slave:scp网络拷贝命令 -r 拷贝文件夹
拷贝Scala环境
[root@master ~]# scp -r /usr/local/src/scala/ slave1:/usr/local/src/
……
[root@master ~]# scp -r /usr/local/src/scala/ slave2:/usr/local/src/
……
拷贝Spark环境
[root@master ~]# scp -r /usr/local/src/spark/ slave1:/usr/local/src/
……
[root@master ~]# scp -r /usr/local/src/spark/ slave2:/usr/local/src/
……
拷贝环境变量
[root@master ~]# scp /root/.bash_profile slave1:/root/
.bash_profile 100% 359 35.9KB/s 00:00
[root@master ~]# scp /root/.bash_profile slave2:/root/
.bash_profile 100% 359 38.1KB/s 00:00
切换到slave服务机,生效环境变量
[root@slave1 ~]# source /root/.bash_profile
[root@slave2 ~]# source /root/.bash_profile
启动Spark集群
启动 spark 脚本,由于与 Hadoop 脚本同名,需要进入到目录执行
[root@master ~]# cd /usr/local/src/spark/sbin/
注意脚本名前面有 ./ 不要少打了
[root@master sbin]# ./start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /usr/local/src/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-master-01.out
slave2: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/src/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave2-01.out
slave1: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/src/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave1-01.out
master: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/src/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-master-01.out
查看集群状态
查看Spark进程状态:jps
master
[root@master ~]# jps
3903 SecondaryNameNode
4692 Master
3448 DataNode
4037 ResourceManager
4138 NodeManager
4892 Jps
3328 NameNode
slave1
[root@slave1 ~]# jps
2613 NodeManager
2865 Worker
2918 Jps
2528 DataNode
slave2
[root@slave2 ~]# jps
2253 NodeManager
2440 Jps
2385 Worker
2168 DataNode
测试Spark
[root@master bin]# spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
22/02/16 16:31:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/02/16 16:31:49 WARN spark.SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://172.16.2.11:4040
Spark context available as 'sc' (master = local[*], app id = local-1645047109135).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
// ./_,// //_\ version 2.0.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
查看Spark Web页面状态(master IP地址加上8080端口):172.16.2.11:8080
至此——Spark集群搭建完成
版权归原作者 Grey_hat_cmd 所有, 如有侵权,请联系我们删除。