Hadoop 集群

部署准备

可以访问的 web 页面全部绑定了 127.0.0.1。禁止外部访问，需要访问可以使用 nginx 反向代理增加鉴权之后暴露出去。如果不需要可以替换 IP 。

host配置

各个节点设置hostname，并且修改/etc/hosts or 设置 DNS 服务器增加 hostname 到 IP 的映射

# 三个节点分别命名 bigdata1，bigdata2，bigdata3
hostnamectl set-hostname bigdata1
hostnamectl set-hostname bigdata2
hostnamectl set-hostname bigdata3

用户配置

groupadd hadoop
useradd hadoop
useradd hdfs
useradd hive
useraddyarnusermod-a-G hadoop hadoop
usermod-a-G hadoop hdfs
usermod-a-G hadoop hive
usermod-a-G hadoop yarn

免密配置

root

# 生成秘钥
ssh-keygen
# 秘钥复制到目标主机，配置免密登录
ssh-copy-id root@bigdata1

普通用户

需要给普通用户配置 root 权限

# 配置普通用户(nhk)具有root权限，方便后期加sudo执行root权限的命令vim /etc/sudoers

# 在%wheel这行下面添加一行 (大概是在100行左右位置)## Allow root to run any commands anywhere 
root    ALL=(ALL)       ALL 

## Allows members of the 'sys' group to run networking, software, ## service management apps and more.# %sys ALL = NETWORKING, SOFTWARE, SERVICES, STORAGE, DELEGATING, PROCESSES, LOCATE, DRIVERS## Allows people in group wheel to run all commands
%wheel  ALL=(ALL)       ALL 

## Same thing without a password# %wheel        ALL=(ALL)       NOPASSWD: ALL
nhk ALL=(ALL) NOPASSWD: ALL

Java 配置

配置 JAVA_HOME 环境变量。

安装

zookeeper准备

默认已经安装好zookeeper。详情见 zookeeper 安装手册

hadoop 包准备

从官方下在 hadoop 的安装包，并且解压放到软件安装位置。
Apache Hadoop 下载页面

环境变量准备

exportHADOOP_HOME=/usr/bigdata/hadoop-3.3.6
exportHIVE_HOME=/usr/bigdata/hive-3.1.3
exportZK_HOME=/usr/bigdata/apache-zookeeper-3.9.2-bin
exportKAFKA_HOME=/usr/bigdata/kafka_2.12-3.7.0
exportHADOOP_CONF_DIR=/usr/bigdata/hadoop-3.3.6/etc/hadoop
exportPATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ZK_HOME/bin:$KAFKA_HOME/bin

配置文件修改

配置 NameNode（core-site.xml）

路径： hadoop 安装位置/etc/hadoop

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
--><!-- Put site-specific property overrides in this file. --><configuration><property><name>fs.defaultFS</name><value>hdfs://bigdata:8020</value></property><!-- 指定 hadoop 数据的存储目录 --><property><name>hadoop.tmp.dir</name><value>/home/bigdata/hadoop/hdfs/tmp</value></property><property><name>hadoop.proxyuser.root.hosts</name><value>*</value></property><property><name>hadoop.proxyuser.root.users</name><value>*</value></property><!--namenode ha 选举的zk地址 --><property><name>ha.zookeeper.quorum</name><value>bigdata1:2181,bigdata2:2181,bigdata3:2181</value></property></configuration>

配置 HDFS 集群（hdfs-site.xml）

路径： hadoop 安装位置/etc/hadoop

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
--><!-- Put site-specific property overrides in this file. --><configuration><!-- 配置namenode和datanode的工作目录-数据存储目录 --><property><name>dfs.namenode.name.dir</name><value>file:/home/bigdata/hadoop/hdfs/name_node</value></property><property><name>dfs.datanode.data.dir</name><value>file:/home/bigdata/hadoop/hdfs/data_node</value></property><!-- 指定副本数 --><property><name>dfs.replication</name><value>2</value></property><!-- 启用webhdfs --><property><name>dfs.webhdfs.enabled</name><value>true</value></property><!--指定hdfs的nameservice为mycluster，需要和core-site.xml中的保持一致 --><property><name>dfs.nameservices</name><value>bigdata</value></property><!-- mycluster下面有两个NameNode，分别是nn1，nn2 --><property><name>dfs.ha.namenodes.bigdata</name><value>nn1,nn2</value></property><!-- nn1的RPC通信地址 --><property><name>dfs.namenode.rpc-address.bigdata.nn1</name><value>bigdata1:8020</value></property><!-- nn1的http通信地址 --><property><name>dfs.namenode.http-address.bigdata.nn1</name><value>127.0.0.1:9870</value></property><!-- nn2的RPC通信地址 --><property><name>dfs.namenode.rpc-address.bigdata.nn2</name><value>bigdata2:8020</value></property><!-- nn2的http通信地址 --><property><name>dfs.namenode.http-address.bigdata.nn2</name><value>127.0.0.1:9870</value></property><!-- 指定NameNode的edits元数据在JournalNode上的存放位置 --><property><name>dfs.namenode.shared.edits.dir</name><value>qjournal://bigdata1:8485;bigdata2:8485;bigdata3:8485/bigdata</value></property><!-- 指定JournalNode在本地磁盘存放数据的位置 --><property><name>dfs.journalnode.edits.dir</name><value>/home/bigdata/hadoop/hdfs/journaldata</value></property><!-- 开启NameNode失败自动切换 --><property><name>dfs.ha.automatic-failover.enabled</name><value>true</value></property><!-- 指定该集群出故障时，哪个实现类负责执行故障切换 --><property><name>dfs.client.failover.proxy.provider.bigdata</name><value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value></property><!-- 配置隔离机制方法--><property><name>dfs.ha.fencing.methods</name><value>sshfence</value></property><!-- 使用sshfence隔离机制时需要ssh免登陆 --><property><name>dfs.ha.fencing.ssh.private-key-files</name><value>/root/.ssh/id_rsa</value></property><!-- 配置sshfence隔离机制超时时间 --><property><name>dfs.ha.fencing.ssh.connect-timeout</name><value>30000</value></property><property><name>dfs.journalnode.http-bind-host</name><value>127.0.0.1</value></property><property><name>dfs.datanode.http.address</name><value>127.0.0.1:9864</value></property></configuration>

配置 YARN 集群（yarn-site.xml）

路径： hadoop 安装位置/etc/hadoop

<?xml version="1.0"?><!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
--><configuration><!-- 指定 MR 走 shuffle --><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><!-- 指定 ResourceManager 的地址--><property><name>yarn.resourcemanager.hostname</name><value>bigdata2</value></property><!-- 环境变量的继承 --><property><name>yarn.nodemanager.env-whitelist</name><value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CO NF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value></property><!-- 开启日志聚集功能 --><property><name>yarn.log-aggregation-enable</name><value>true</value></property><!-- 设置日志聚集服务器地址 --><property><name>yarn.log.server.url</name><value>http://bigdata1:19888/jobhistory/logs</value></property><!-- 设置日志保留时间为 7 天 --><property><name>yarn.log-aggregation.retain-seconds</name><value>604800</value></property><property><name>yarn.nodemanager.webapp.address</name><value>127.0.0.1:8042</value></property><property><name>yarn.resourcemanager.webapp.address</name><value>127.0.0.1:8088</value></property><property><name>yarn.timeline-service.webapp.address</name><value>127.0.0.1:8188</value></property></configuration>

配置MapReduce任务配置（mapred-site.xml）

路径： hadoop 安装位置/etc/hadoop

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
--><!-- Put site-specific property overrides in this file. --><configuration><!-- 指定 MapReduce 程序运行在 Yarn 上 --><property><name>mapreduce.framework.name</name><value>yarn</value></property><!-- 历史服务器端地址 --><property><name>mapreduce.jobhistory.address</name><value>bigdata1:10020</value></property><!-- 历史服务器 web 端地址 --><property><name>mapreduce.jobhistory.webapp.address</name><value>127.0.0.1:19888</value></property></configuration>

配置 hadoop 集群环境变量

# 如果免密登录的端口不是22 需要通过此配置设置免密操作的端口exportHADOOP_SSH_OPTS="-p 10022"# 设置各个服务启动使用的用户exportHDFS_NAMENODE_USER=root
exportHDFS_DATANODE_USER=root
exportHDFS_SECONDARYNAMENODE_USER=root
exportYARN_RESOURCEMANAGER_USER=root
exportYARN_NODEMANAGER_USER=root
exportHDFS_JOURNALNODE_USER=root
exportHDFS_ZKFC_USER=root
# 按照实际情况填写exportJAVA_HOME="/usr/jdk"

分发配置文件到所有节点

# 复制 hadoop 安装包的解压文件夹到所有的机器scp-r hadoop-3.3.6 [email protected]:$PWD

启动集群

启动journal集群

hdfs --daemon start journalnode

格式化name node

在nn1 节点执行format操作，需要等 journal 集群启动完成且提供服务之后才能够执行

nn1执行：

# 初始化 namenode 数据
hdfs namenode -format# 初始化 zookeeper 数据
hdfs zkfc -formatZK

nn2执行：
# 从节点复制 主节点的初始化数据
hdfs namenode -bootstrapStandby

启动 name node 和 zkfc

在nn1，nn2节点上启动 namenode

# 启动namenode
hdfs --daemon start namenode
# 启动zkfc 先启动 zkfc 的节点 会作为活跃 namenode 
hdfs --daemon start zkfc

如果访问namenode 节点，查看节点状态，所有namenode 都是standy，可以手动强制执行namenode 的active操作
强制切换 namenode 节点（可能会出现脑裂）：
hdfs haadmin -transitionToActive --forcemanual nn2

启动 data node

在所有节点上启动 data

# 启动 datanode
hdfs --daemon start datanode

启动 resourcemanager

在 nn2 节点上启动 resourcemanager

# 启动 resourcemanageryarn--daemon start resourcemanager

启动 nodemanager

在所有节点上启动 nodemanager

# 启动 nodemanageryarn--daemon start nodemanager

启动 timelineserver

在 nn1 上启动 timelineserver

# 启动 datanodeyarn--daemon start timelineserver

启动 jobhistory

在 nn1 上启动 jobhistory

# 启动 datanode
mapred --daemon start historyserver

标签： hadoop 大数据分布式

本文转载自: https://blog.csdn.net/codeforces/article/details/136875638
版权归原作者 codeforces 所有，如有侵权，请联系我们删除。

部署准备

host配置

用户配置

免密配置

root

普通用户

Java 配置

安装

zookeeper准备

hadoop 包准备

环境变量准备

配置文件修改

配置 NameNode（core-site.xml）

配置 HDFS 集群（hdfs-site.xml）

配置 YARN 集群（yarn-site.xml）

配置MapReduce任务配置（mapred-site.xml）

配置 hadoop 集群环境变量

分发配置文件到所有节点

启动集群

启动journal集群

格式化name node

启动 name node 和 zkfc

启动 data node

启动 resourcemanager

启动 nodemanager

启动 timelineserver

启动 jobhistory

发表评论

“Hadoop 集群”的评论:

关于作者

overfit同步小助手

相关阅读

文章导航