hadoop3.3.6集群搭建
一、前置条件
- 服务器:3台,1主2从,centos7IPhostname说明192.168.108.137centos137master192.168.108.138centos138node192.168.108.139centos139node三台服务之间能通过hostname访问
# hostname修改hostnamectl set-hostname centos137#三台修改hosts文件,添加以下命令192.168.108.137 centos137192.168.108.138 centos138192.168.108.139 centos139#重启reboot
- hadoop集群(版本号2.2+),集群中安装有HDFS服务
- JDK1.8+(推荐自己安装JDK,需要JAVA_HOME环境变量)
使用3.3.6版本
二、角色分配
节点部署角色目录
节点ipNNSNNDNRMNM****HScentos137192.168.108.137√√centos138192.168.108.138√√√√centos139192.168.108.139√√
角色说明
HDFSYARNMapReduceNameNode(NN)ResourceManager(RM)HistoryServer(HS)SecondNameNode (SNN)NodeManager(NM)DataNode (DN)
组件默认端口清单
组件端口说明HDFS8020NameNode50010,50020、50075DataNodeYARN8032ResourceManager8088Web界面8040NodeManager协议8042Web界面MapReduce10020HistoryServer协议19888Web界面Hadoop Common49152~65535Inter-Process CommunicationZooKeeper2181Hadoop集群的协调服务Hadoop Web界面9870NameNode Web界面8088ResourceManager Web界面:19888JobHistoryServer Web界面Hadoop RPC8019Remote Procedure Call
安装包:国内镜像地址:Index of /apache/hadoop/common/hadoop-3.3.6 (tsinghua.edu.cn)
2.1软件安装(所有服务器)
ssh免密登录
useradd hadoop
passwd hadoop
忽略提示密码太短的警告
# 切换用户su hadoop
先输入密码自己登录下自己生成.ssh目录
ssh localhost
生成秘钥
ssh-keygen -t rsa -P''-f ~/.ssh/id_rsa
分发密钥(主节点上执行)
#后面是想要免密登录的节点主机名
ssh-copy-id centos137
ssh-copy-id centos138
ssh-copy-id centos139
测试centos137登录各个节点是否免密例如登录centos138
ssh centos138
在所有虚拟机根目录下新建文件夹export,export文件夹中新建data、servers和software文件
mkdir-p /export/data
mkdir-p /export/servers
mkdir-p /export/software
1.解压
tar-zxvf hadoop-3.3.6.tar.gz -C /export/servers/
2.配置环境变量
vi /etc/profile
# 文末追加exportHADOOP_HOME=/export/servers/hadoop-3.3.6
exportPATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
# 环境变量生效source /etc/profile
3.验证
[root@centos137 servers]# hadoop version
Hadoop 3.3.6
Source code repository https://github.com/apache/hadoop.git -r 1be78238728da9266a4f88195058f08fd012bf9c
Compiled by ubuntu on 2023-06-18T08:22Z
Compiled on platform linux-x86_64
Compiled with protoc 3.7.1
From source with checksum 5652179ad55f76cb287d9c633bb53bbd
This command was run using /export/servers/hadoop-3.3.6/share/hadoop/common/hadoop-common-3.3.6.jar
** 注意环境切换 **
2.2主节点配置
进入hadoop安装目录
cd /export/servers/hadoop-3.3.6/etc/hadoop
修改配置hadoop-env.sh
vim hadoop-env.sh
# 添加JAVA_HOMEexportJAVA_HOME=/export/servers/jdk
修改workers
vim workers
添加
centos137
centos138
centos139
内容:
[hadoop@centos138 sbin]$ cat workers
centos137
centos138
centos139
修改core-site.xml
vim core-site.xml
- 配置HDFS的URI和临时目录
- HDFS网页登录使用的静态用户
- 添加以下配置
<configuration><!--setting HDFS--><property><name>fs.defaultFS</name><!--setting namenode--><value>hdfs://centos137:9000</value></property><!--setting temp folder,default:/tem/hadoop-${user.name}--><property><name>hadoop.tmp.dir</name><value>/export/servers/hadoop-3.3.6/tmp</value></property><!-- HDFS web loggin static user --><property><name>hadoop.http.staticuser.user</name><value>hadoop</value></property></configuration>
修改hdfs-site.xml文件
vim hdfs-site.xml
- 指定HDFS的数量
- 配置secondary namenode
- 添加以下配置
<configuration><!--setting HDFS number--><property><name>dfs.replication</name><value>3</value></property><!--setting secondary namenode--><property><name>dfs.namenode.secondary.http-address</name><value>centos138:50090</value></property></configuration>
修改mapred-site.xml文件
vim mapred-site.xml
- 指定MapReduce运行时的框架,这里指定在YARN上,默认在local
- 历史服务器端地址
添加配置
<configuration><!-- 执行MapReduce的方式:yarn/local --><property><name>mapreduce.framework.name</name><value>yarn</value></property><property><name>mapreduce.jobhistory.address</name><value>centos138:10020</value></property><property><name>mapreduce.jobhistory.webapp.address</name><value>centos138:19888</value></property></configuration>
修改yarn-site.xml文件
- 指定YARN集群的管理者(ResourceManager)的地址
- 指定MR走shuffle
- 开启日志聚集功能
- 设置日志聚集服务器地址
- 设置日志保留时间为 7 天
vim yarn-site.xml
<configuration><property><name>yarn.resourcemanager.hostname</name><value>centos137</value></property><property><name>yarn.nodemanager.env-whitelist</name><value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value></property><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><!-- open log aggregation --><property><name>yarn.log-aggregation-enable</name><value>true</value></property><!-- log erver --><property><name>yarn.log.server.url</name><value>http://centos138:19888/jobhistory/logs</value></property><!-- log save days--><property><name>yarn.log-aggregation.retain-seconds</name><value>604800</value></property></configuration>
目录授权
将安装目录的权限赋予hadoop用户
chown-R hadoop:hadoop /export/
2.3文件分发
将master的配置分发到从node
scp-r /export/servers centos138:/export
scp-r /export/servers centos139:/export
2.4启动hadoop集群
格式化NameNode
hdfs namenode -format
格式化NameNode会产生新的集群id,导致DataNode中记录的的集群id和刚生成的NameNode的集群id不 一致,DataNode找不到NameNode。所以,格式化NameNode时,一定要先删除每个节点的data目录和logs日志,然后再格式化NameNode,一般只在搭建初期执行这一次。
在master(centos137)执行
# 启动集群
/export/servers/hadoop-3.3.6/sbin/start-all.sh
# 停止集群
/export/servers/hadoop-3.3.6/sbin/stop-all.sh
[hadoop@centos137 hadoop-3.3.6]$ /export/servers/hadoop-3.3.6/sbin/start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as hadoop in10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [centos137]
Starting datanodes
Starting secondary namenodes [centos138]
Starting resourcemanager
Starting nodemanagers
启动过程有错误输出,核对文件分发后的目录是否正确
启动历史服务(centos138节点)
mapred --daemon start historyserver
或者HDFS和YARN单独启动
# 启动
start-dfs.sh
start-yarn.sh
# 停止
stop-dfs.sh
stop-yarn.sh
集群部署验证
每个节点执行jps命令验证hdfs集群启动的角色是否正确
2.5集群部署验证
- 每个节点执行jps命令验证hdfs集群启动的角色是否正确执行:jpscentos138角色: NN、RM、NM、DN
[hadoop@centos137 hadoop-3.3.6]$ jps34082 ResourceManager34228 NodeManager33638 DataNode33497 NameNode
37790 Jps
centos138角色:SNN、DN、NM、HS
```bash
[hadoop@centos138 sbin]$ jps
32530 SecondaryNameNode
26679 JobHistoryServer
33321 Jps
32733 NodeManager
32383 DataNode
centos139角色:NM、DN
[hadoop@centos139 hadoop]$ jps
51088 NodeManager
51685 Jps
50823 DataNode
根据组件默认端口清单访问WEB UI
34228 NodeManager
33638 DataNode
33497 NameNode
37790 Jps
centos138角色:SNN、DN、NM、HS
```bash
[hadoop@centos138 sbin]$ jps
32530 SecondaryNameNode
26679 JobHistoryServer
33321 Jps
32733 NodeManager
32383 DataNode
centos139角色:NM、DN
[hadoop@centos139 hadoop]$ jps
51088 NodeManager
51685 Jps
50823 DataNode
根据组件默认端口清单访问WEB UI
参考链接:
https://blog.csdn.net/weixin_43655425/article/details/134751084
https://blog.csdn.net/tang5615/article/details/120382513
版权归原作者 admin_cx 所有, 如有侵权,请联系我们删除。