0


Centos 7之Hadoop搭建

介绍

Hadoop Distributed File System简称 HDFS,是一个分布式文件系统。HDFS 有着高容错性(fault-tolerent)的特点,并且设计用来部署在低廉的(low-cost)硬件上。而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序。HDFS 放宽了(relax)POSIX 的要求(requirements)这样可以实现流的形式访问(streaming access)文件系统中的数据。HDFS 开始是为开源的 apache 项目 nutch 的基础结构而创建,HDFS 是 hadoop 项目的一部分,而 hadoop 又是 lucene 的一部分。

官方网站

https://hadoop.apache.org/

软件准备

  • hadoop-3.3.4.tar.gz

下载地址:https://archive.apache.org/dist/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz

  • jdk-8u361-linux-x64.tar.gz

下载地址:https://share.weiyun.com/uwm6F1la

环境列表

master

192.168.199.201

slave0

192.168.199.202

slave1

192.168.199.203

关闭防火墙并禁用开机自启

systemctl disable firewalld --now

关闭seLinux

# 永久关闭重启后生效
sed -i 's/=enforcing/=disabled/g' /etc/selinux/config
# 临时关闭
setenforce 0

修改主机名

# IP:192.168.199.201
hostnamectl set-hostname master

# IP:192.168.199.202
hostnamectl set-hostname slave0

# IP:192.168.199.203
hostnamectl set-hostname slave1

修改hosts文件

cat >> /etc/hosts <<EOF
192.168.199.201 master
192.168.199.202 slave0
192.168.199.203 slave1
EOF

配置免密钥登录

  • 生成公钥文件
ssh-keygen -t rsa
  • 对master进行免密钥登录,输入对应密码
ssh-copy-id -i /root/.ssh/id_rsa.pub root@master
  • 对slave0进行免密钥登录,输入对应密码
ssh-copy-id -i /root/.ssh/id_rsa.pub root@slave0
  • 对slave1进行免密钥登录,输入对应密码
ssh-copy-id -i /root/.ssh/id_rsa.pub root@slave1
  • 在master服务器输入下面的命令,都不需要输入密码,则配置成功
ssh master
ssh slave0
ssh slave1

安装JDK

  • 创建java目录
mkdir /usr/local/java
cd /usr/local/java
  • 把准备的jdk-8u361-linux-x64.tar.gz上传到该目录下进行解压
tar xzf jdk-8u361-linux-x64.tar.gz
  • 配置环境变量
echo "export JAVA_HOME=/usr/local/java/jdk1.8.0_361" >> /root/.bash_profile
echo "export PATH=\$JAVA_HOME/bin:\$PATH" >> /root/.bash_profile
source /root/.bash_profile
  • 验证变量是否生效
[root@master ~]# java -version
java version "1.8.0_361"
Java(TM) SE Runtime Environment (build 1.8.0_361-b10)
Java HotSpot(TM) 64-Bit Server VM (build 25.361-b10, mixed mode)
  • 拷贝jdk和.bash_profile文件到slave0、slave1中
scp -r /usr/local/java root@slave0:/usr/local
scp -r /root/.bash_profile root@slave0:/root
ssh root@slave0 "source /root/.bash_profile"

scp -r /usr/local/java root@slave1:/usr/local
scp -r /root/.bash_profile root@slave1:/root
ssh root@slave1 "source /root/.bash_profile"

Hadoop安装与环境配置

1、上传hadoop-3.3.4.tar.gz到/opt目录下,解压并变更属主属组

cd /opt/
tar xzf hadoop-3.3.4.tar.gz
mv hadoop-3.3.4 hadoop

chown -R root:root hadoop

2、创建数据目录

mkdir -p /opt/hadoop/{tmp,hdfs/{name,data}}

3、配置hadoop-env.sh

sed -i 's@# export JAVA_HOME=@export JAVA_HOME=\/usr\/local\/java\/jdk1.8.0_361\/@g' /opt/hadoop/etc/hadoop/hadoop-env.sh
grep JAVA_HOME= /opt/hadoop/etc/hadoop/hadoop-env.sh

4、配置core-site.xml

vim /opt/hadoop/etc/hadoop/core-site.xml

# <configuration>之间添加如下内容:

<!-- 指定Hadoop所使用的文件系统schema(URL),HDFS的老大(NameNode)的地址 -->
<property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:9000</value>
</property>
<!-- 指定Hadoop运行时产生文件的储存目录,默认是/tmp/hadoop-${user.name} -->
<property>
        <name>hadoop.tmp.dir</name>
        <value>file:/opt/hadoop/tmp</value>
</property>

5、配置hdfs-site.xml

vim /opt/hadoop/etc/hadoop/hdfs-site.xml

#<configuration>之间添加如下内容:

<property>
        <name>dfs.replication</name>
        <value>1</value>
        <description>HDFS数据块的副本存储个数</description>
</property>
<property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/opt/hadoop/hdfs/name</value>
        <description>为了保证元数据的安全一般配置多个不同目录</description>
</property>
<property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/opt/hadoop/hdfs/data</value>
        <description>datanode的数据存储目录</description>
</property>
  • secondary namenode 运行节点的信息,在namenode不同节点上进行配置
<property>
        <name>dfs.secondary.http.address</name>
        <value>master:50090</value>
        <description>secondarynamenode运行节点的信息和namenode不同节点</description>
</property>

6、配置yarn-site.xml

vim /opt/hadoop/etc/hadoop/yarn-site.xml

#<configuration>之间添加如下内容:

<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
        <description>YARN集群为MapReduce程序提供的shuffle服务</description>
</property>
<property>
        <name>yarn.resourcemanager.address</name>
        <value>master:18040</value>
</property>
<property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>master:18030</value>
</property>
<property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>master:18025</value>
</property>
<property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>master:18141</value>
</property>
<property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>master:18088</value>
</property>
<property>
        <name>yarn.nodemanager.pmem-check-enabled</name>
        <value>false</value>
</property>
<property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
</property>

7、配置mapred-site.xml

vim /opt/hadoop/etc/hadoop/mapred-site.xml

#<configuration>之间添加如下内容:
<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
</property>

8、配置works

  • hadoop-3.0.0版本以前该文件名称为slaves
cat > /opt/hadoop/etc/hadoop/workers <<EOF 
master
slave0
slave1
EOF

9、配置Hadoop环境变量

echo "export HADOOP_HOME=/opt/hadoop" >> /root/.bash_profile
echo "export PATH=\$HADOOP_HOME/bin:\$HADOOP_HOME/sbin:\$PATH" >> /root/.bash_profile
echo "export HDFS_NAMENODE_USER=root" >> /root/.bash_profile
echo "export HDFS_DATANODE_USER=root" >> /root/.bash_profile
echo "export HDFS_SECONDARYNAMENODE_USER=root" >> /root/.bash_profile
echo "export YARN_RESOURCEMANAGER_USER=root" >> /root/.bash_profile
echo "export YARN_NODEMANAGER_USER=root" >> /root/.bash_profile

10、复制hadoop相关目录到slave0和slave1

scp -r /opt/hadoop root@slave0:/opt
scp -r /root/.bash_profile root@slave0:/root
ssh root@slave0 "source /root/.bash_profile"
ssh root@slave0 "/bin/bash /opt/hadoop/useradd.sh"

scp -r /opt/hadoop root@slave1:/opt
scp -r /root/.bash_profile root@slave1:/root
ssh root@slave1 "source /root/.bash_profile"
ssh root@slave1 "/bin/bash /opt/hadoop/useradd.sh"

11、格式化文件系统

  • 仅master执行该命令,且只能执行一次
source /root/.bash_profile
hadoop namenode -format

12、启动hadoop

[root@master ~]# start-all.sh
Starting namenodes on [master]
Last login: Tue Oct 11 23:18:57 CST 2022 from master on pts/1
Starting datanodes
Last login: Tue Oct 11 23:53:33 CST 2022 on pts/0
slave0: WARNING: /opt/hadoop/logs does not exist. Creating.
slave1: WARNING: /opt/hadoop/logs does not exist. Creating.
Starting secondary namenodes [master]
Last login: Tue Oct 11 23:53:35 CST 2022 on pts/0
Starting resourcemanager
Last login: Tue Oct 11 23:53:44 CST 2022 on pts/0
Starting nodemanagers
Last login: Tue Oct 11 23:54:16 CST 2022 on pts/0
[root@master ~]# jps
2631 SecondaryNameNode
2935 ResourceManager
2280 NameNode
2424 DataNode
3067 NodeManager
3619 Jps
[root@master ~]# ssh slave0 "/usr/local/java/jdk1.8.0_361/bin/jps"
1795 DataNode
1908 NodeManager
2015 Jps
[root@master ~]# ssh slave1 "/usr/local/java/jdk1.8.0_361/bin/jps"
1747 DataNode
1862 NodeManager
1965 Jps

13、关闭hadoop

stop-all.sh

14、historyserver的启动与关闭

mapred --daemon start historyserver
mapred --daemon stop historyserver

15、Web Interfaces

Once the Hadoop cluster is up and running check the web-ui of the components as described below:

Daemon

Web Interface

Notes

NameNode

http://nn_host:port/

Default HTTP port is 9870.

ResourceManager

http://rm_host:port/

Default HTTP port is 8088.

MapReduce JobHistory Server

http://jhs_host:port/

Default HTTP port is 19888.

http://192.168.199.201:9870

http://192.168.199.201:18088

http://192.168.199.201:19888

到这里就配置完成了,以上代码和界面均能实现,则hadoop部署成功!

分享、在看与点赞
只要你点,我们就是胖友

来自: Centos 7之搭建Hadoophttps://mp.weixin.qq.com/s?__biz=Mzk0NTQ3OTk3MQ==&mid=2247486447&idx=1&sn=02bf9445d7e23023ae75e5f0097df9a3&chksm=c31583a3f4620ab57eab9f8e36e42d9592a92ce48ddd198be4a4fd4a7be96d58dc8267a0245d&token=355315523&lang=zh_CN#rd

标签: centos hadoop linux

本文转载自: https://blog.csdn.net/weixin_45081413/article/details/139046804
版权归原作者 Linux技术宅 所有, 如有侵权,请联系我们删除。

“Centos 7之Hadoop搭建”的评论:

还没有评论