文章目录
基于Linux的Spark安装与环境配置
1、Hadoop测试
因为Spark是基于Hadoop上工作的,所以当我们使用Spark框架时,必须要确保Hadoop能够正常运行:
1.1 启动hadoop
cd /usr/local/hadoop
./sbin/start-all.sh
有BUG,内容如下:
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
解决方法:SLF4J:Failed to load class org.slf4j.impl.StaticLoggerBinder.
1.2 再次启动hadoop
cd /usr/local/hadoop/sbin
./start-all.sh
1.3 查看是否成功
jps
2、Scala安装与配置
2.1 下载scala
官方网址:https://www.scala-lang.org/download/2.13.10.html
使用
wget
命令下载scala:
wget https://downloads.lightbend.com/scala/2.13.10/scala-2.13.10.tgz
2.2 解压并重命名
sudotar zxvf ~/下载/scala-2.13.10.tgz -C /usr/local/ # 解压cd /usr/local
sudomv scala-2.13.10 scala # 重命名
2.3 配置环境
# 1.编辑环境变量sudovi ~/.bashrc
# 2.使其生效source ~/.bashrc
2.4 测试
scala -version
3、Spark安装与配置
3.1 下载Spark
下载网址:https://archive.apache.org/dist/spark/spark-3.2.2/
使用
wget
命令进行下载:
wget https://archive.apache.org/dist/spark/spark-3.2.2/spark-3.2.2-bin-hadoop3.2.tgz
3.2 解压并重命名
# 1.解压sudotar zxvf ~/下载/spark-3.2.2-bin-hadoop3.2.tgz -C /usr/local
# 2.重命名cd /usr/local
sudomv spark-3.2.2-bin-hadoop3.2 spark
3.3 配置环境
# 1.编辑环境变量sudovi ~/.bashrc
# 2.使其生效source ~/.bashrc
3.4 配置spark-env.sh
进入到配置目录并打开
spark-env.sh
文件:
cd /usr/local/spark/conf
sudocp spark-env.sh.template spark-env.sh
sudovi spark-env.sh
添加以下内容:
exportJAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
exportHADOOP_HOME=/usr/local/hadoop
exportHADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
exportSCALA_HOME=/usr/local/scala
exportSPARK_HOME=/usr/local/spark
exportSPARK_MASTER_IP=192.168.3.134
exportSPARK_MASTER_PORT=7077exportSPARK_MASTER_WEBUI_PORT=8099exportSPARK_WORKER_CORES=3exportSPARK_WORKER_INSTANCES=1exportSPARK_WORKER_MEMORY=5G
exportSPARK_WORKER_WEBUI_PORT=8081exportSPARK_EXECUTOR_CORES=1exportSPARK_EXECUTOR_MEMORY=1G
exportLD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$HADOOP_HOME/lib/native
3.5 配置slaves(好像不需要)
cd /usr/local/spark/conf
sudovi workers.template
发现slaves文件里为localhost即本机地址,当前为伪分布式,因此不用修改!但要执行以下:
sudocp workers.template slaves
3.6 启动(报错)
启动
sbin
目录下的
start-master.sh
以及
start-slaves.sh
(前提是hadoop已启动):
cd /usr/local/spark
sudo ./sbin/start-master.sh
sudo ./sbin/start-slaves.sh
报错!!!
3.7 测试
通过运行Spark自带的示例,验证Spark是否安装成功:
cd /usr/local/spark
./bin/run-example SparkPi
报错信息如下:
2022-11-01 20:49:24,377 WARN util.Utils: Your hostname, leoatliang-virtual-machine resolves to a loopback address: 127.0.1.1; using 192.168.3.134 instead (on interface ens33)
参考博客:Spark启动:WARN util.Utils: Your hostname, … resolves to a loopback address: …; using … instead
修改配置文件,配置
SPARK_LOCAL_IP
变量即可:
cd /usr/local/spark
sudovim conf/spark-env.sh
# 添加以下内容:exportSPARK_LOCAL_IP=192.168.3.134 # 自己输出对应的IP
再次测试:
BUG解决!!!
执行时会输出非常多的运行信息,输出结果不容易找到,可以通过 grep 命令进行过滤:
./bin/run-example SparkPi 2>&1|grep"Pi is"
3.8 查看Spark版本
cd /usr/local/spark
./bin/spark-shell
版权归原作者 LeoATLiang 所有, 如有侵权,请联系我们删除。