介绍
Hadoop Distributed File System简称 HDFS,是一个分布式文件系统。HDFS 有着高容错性(fault-tolerent)的特点,并且设计用来部署在低廉的(low-cost)硬件上。而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序。HDFS 放宽了(relax)POSIX 的要求(requirements)这样可以实现流的形式访问(streaming access)文件系统中的数据。HDFS 开始是为开源的 apache 项目 nutch 的基础结构而创建,HDFS 是 hadoop 项目的一部分,而 hadoop 又是 lucene 的一部分。
官方网站
软件准备
- hadoop-3.3.4.tar.gz
下载地址:https://archive.apache.org/dist/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz
- jdk-8u361-linux-x64.tar.gz
下载地址:https://share.weiyun.com/uwm6F1la
环境列表
master
192.168.199.201
slave0
192.168.199.202
slave1
192.168.199.203
关闭防火墙并禁用开机自启
systemctl disable firewalld --now
关闭seLinux
# 永久关闭重启后生效
sed -i 's/=enforcing/=disabled/g' /etc/selinux/config
# 临时关闭
setenforce 0
修改主机名
# IP:192.168.199.201
hostnamectl set-hostname master
# IP:192.168.199.202
hostnamectl set-hostname slave0
# IP:192.168.199.203
hostnamectl set-hostname slave1
修改hosts文件
cat >> /etc/hosts <<EOF
192.168.199.201 master
192.168.199.202 slave0
192.168.199.203 slave1
EOF
配置免密钥登录
- 生成公钥文件
ssh-keygen -t rsa
- 对master进行免密钥登录,输入对应密码
ssh-copy-id -i /root/.ssh/id_rsa.pub root@master
- 对slave0进行免密钥登录,输入对应密码
ssh-copy-id -i /root/.ssh/id_rsa.pub root@slave0
- 对slave1进行免密钥登录,输入对应密码
ssh-copy-id -i /root/.ssh/id_rsa.pub root@slave1
- 在master服务器输入下面的命令,都不需要输入密码,则配置成功
ssh master
ssh slave0
ssh slave1
安装JDK
- 创建java目录
mkdir /usr/local/java
cd /usr/local/java
- 把准备的jdk-8u361-linux-x64.tar.gz上传到该目录下进行解压
tar xzf jdk-8u361-linux-x64.tar.gz
- 配置环境变量
echo "export JAVA_HOME=/usr/local/java/jdk1.8.0_361" >> /root/.bash_profile
echo "export PATH=\$JAVA_HOME/bin:\$PATH" >> /root/.bash_profile
source /root/.bash_profile
- 验证变量是否生效
[root@master ~]# java -version
java version "1.8.0_361"
Java(TM) SE Runtime Environment (build 1.8.0_361-b10)
Java HotSpot(TM) 64-Bit Server VM (build 25.361-b10, mixed mode)
- 拷贝jdk和.bash_profile文件到slave0、slave1中
scp -r /usr/local/java root@slave0:/usr/local
scp -r /root/.bash_profile root@slave0:/root
ssh root@slave0 "source /root/.bash_profile"
scp -r /usr/local/java root@slave1:/usr/local
scp -r /root/.bash_profile root@slave1:/root
ssh root@slave1 "source /root/.bash_profile"
Hadoop安装与环境配置
1、上传hadoop-3.3.4.tar.gz到/opt目录下,解压并变更属主属组
cd /opt/
tar xzf hadoop-3.3.4.tar.gz
mv hadoop-3.3.4 hadoop
chown -R root:root hadoop
2、创建数据目录
mkdir -p /opt/hadoop/{tmp,hdfs/{name,data}}
3、配置hadoop-env.sh
sed -i 's@# export JAVA_HOME=@export JAVA_HOME=\/usr\/local\/java\/jdk1.8.0_361\/@g' /opt/hadoop/etc/hadoop/hadoop-env.sh
grep JAVA_HOME= /opt/hadoop/etc/hadoop/hadoop-env.sh
4、配置core-site.xml
vim /opt/hadoop/etc/hadoop/core-site.xml
# <configuration>之间添加如下内容:
<!-- 指定Hadoop所使用的文件系统schema(URL),HDFS的老大(NameNode)的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<!-- 指定Hadoop运行时产生文件的储存目录,默认是/tmp/hadoop-${user.name} -->
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/hadoop/tmp</value>
</property>
5、配置hdfs-site.xml
vim /opt/hadoop/etc/hadoop/hdfs-site.xml
#<configuration>之间添加如下内容:
<property>
<name>dfs.replication</name>
<value>1</value>
<description>HDFS数据块的副本存储个数</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop/hdfs/name</value>
<description>为了保证元数据的安全一般配置多个不同目录</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop/hdfs/data</value>
<description>datanode的数据存储目录</description>
</property>
- secondary namenode 运行节点的信息,在namenode不同节点上进行配置
<property>
<name>dfs.secondary.http.address</name>
<value>master:50090</value>
<description>secondarynamenode运行节点的信息和namenode不同节点</description>
</property>
6、配置yarn-site.xml
vim /opt/hadoop/etc/hadoop/yarn-site.xml
#<configuration>之间添加如下内容:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>YARN集群为MapReduce程序提供的shuffle服务</description>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:18030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:18141</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:18088</value>
</property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
7、配置mapred-site.xml
vim /opt/hadoop/etc/hadoop/mapred-site.xml
#<configuration>之间添加如下内容:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
8、配置works
- hadoop-3.0.0版本以前该文件名称为slaves
cat > /opt/hadoop/etc/hadoop/workers <<EOF
master
slave0
slave1
EOF
9、配置Hadoop环境变量
echo "export HADOOP_HOME=/opt/hadoop" >> /root/.bash_profile
echo "export PATH=\$HADOOP_HOME/bin:\$HADOOP_HOME/sbin:\$PATH" >> /root/.bash_profile
echo "export HDFS_NAMENODE_USER=root" >> /root/.bash_profile
echo "export HDFS_DATANODE_USER=root" >> /root/.bash_profile
echo "export HDFS_SECONDARYNAMENODE_USER=root" >> /root/.bash_profile
echo "export YARN_RESOURCEMANAGER_USER=root" >> /root/.bash_profile
echo "export YARN_NODEMANAGER_USER=root" >> /root/.bash_profile
10、复制hadoop相关目录到slave0和slave1
scp -r /opt/hadoop root@slave0:/opt
scp -r /root/.bash_profile root@slave0:/root
ssh root@slave0 "source /root/.bash_profile"
ssh root@slave0 "/bin/bash /opt/hadoop/useradd.sh"
scp -r /opt/hadoop root@slave1:/opt
scp -r /root/.bash_profile root@slave1:/root
ssh root@slave1 "source /root/.bash_profile"
ssh root@slave1 "/bin/bash /opt/hadoop/useradd.sh"
11、格式化文件系统
- 仅master执行该命令,且只能执行一次
source /root/.bash_profile
hadoop namenode -format
12、启动hadoop
[root@master ~]# start-all.sh
Starting namenodes on [master]
Last login: Tue Oct 11 23:18:57 CST 2022 from master on pts/1
Starting datanodes
Last login: Tue Oct 11 23:53:33 CST 2022 on pts/0
slave0: WARNING: /opt/hadoop/logs does not exist. Creating.
slave1: WARNING: /opt/hadoop/logs does not exist. Creating.
Starting secondary namenodes [master]
Last login: Tue Oct 11 23:53:35 CST 2022 on pts/0
Starting resourcemanager
Last login: Tue Oct 11 23:53:44 CST 2022 on pts/0
Starting nodemanagers
Last login: Tue Oct 11 23:54:16 CST 2022 on pts/0
[root@master ~]# jps
2631 SecondaryNameNode
2935 ResourceManager
2280 NameNode
2424 DataNode
3067 NodeManager
3619 Jps
[root@master ~]# ssh slave0 "/usr/local/java/jdk1.8.0_361/bin/jps"
1795 DataNode
1908 NodeManager
2015 Jps
[root@master ~]# ssh slave1 "/usr/local/java/jdk1.8.0_361/bin/jps"
1747 DataNode
1862 NodeManager
1965 Jps
13、关闭hadoop
stop-all.sh
14、historyserver的启动与关闭
mapred --daemon start historyserver
mapred --daemon stop historyserver
15、Web Interfaces
Once the Hadoop cluster is up and running check the web-ui of the components as described below:
Daemon
Web Interface
Notes
NameNode
Default HTTP port is 9870.
ResourceManager
Default HTTP port is 8088.
MapReduce JobHistory Server
Default HTTP port is 19888.
到这里就配置完成了,以上代码和界面均能实现,则hadoop部署成功!
分享、在看与点赞
只要你点,我们就是胖友
版权归原作者 Linux技术宅 所有, 如有侵权,请联系我们删除。