0


hadoop高可用安装

hadoop依赖zookeeper,需提前安装好
我的zookeeper安装在datanode1,datanode2,datanode3

1. 创建用户和组

groupadd hadoop
useradd hadoop -g hadoop
passwd hadoop

2.配置hosts

vim /etc/hosts
在这里插入图片描述

3.配置ssh免密登录

在namenode节点上 切换hadoop用户
执行 ssh-keygen -t rsa
然后连续敲击三次enter 结果如下图
在这里插入图片描述
copy其他机器 ssh-copy-id 用户名@目标服务器的IP
ssh-copy-id hadoop@namenode2
在这里插入图片描述
输入yes 然后输入刚才设置的hadoop用户密码
然后依次copy到其他节点(包括自己)
ssh-copy-id hadoop@datanode1
ssh-copy-id hadoop@datanode2
ssh-copy-id hadoop@datanode3

如下图在namenode节点上执行就可以了(最后一个是验证是否成功)
在这里插入图片描述

4.上传hadoop包解压

安装包下载地址
https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/
上传完成后进行解压
tar -zxvf hadoop-2.10.0.tar.gz
如果解压后目录所属用户不是hadoop 需要授权
chown -R hadoop:hadoop hadoop-2.10.0
在这里插入图片描述
切换root用户,配置hadoop环境变量vim /etc/profile
如果使用hadoop用户,则在用户根目录编辑隐藏文件 vim .bash_profile
在这里插入图片描述
直接翻页到最后
在这里插入图片描述

5.修改文件

修改hdfs-site.xml,core-site.xml,yarn-site.xml,mapred-site.xml四个文件
进入目录 cd /opt/hadoop-2.10.0/etc/hadoop

5.1 mapred-site.xml

<configuration><property><name>mapreduce.framework.name</name><value>yarn</value><description>框架MR使用YARN</description></property><property><name>mapred.compress.map.output</name><value>true</value><description>开启压缩</description></property><property><name>mapred.map.output.compression.codec</name><value>com.hadoop.compression.lzo.LzoCodec</value><description>设定LZO压缩方式</description></property><property><name>mapred.child.env</name><value>LD_LIBRARY_PATH=/usr/local/lib</value><description>LZO本地库目录</description></property><property><name>mapreduce.jobtracker.hosts.exclude.filename</name><value>/home/hadoop/optional/exclude_node</value><description>需要忽略的DataNode列表</description></property><property><name>mapred.tasktracker.map.tasks.maximum</name><value>8</value><description>每一个TaskTracker可同时运行Map任务数</description></property><property><name>mapred.tasktracker.reduce.tasks.maximum</name><value>5</value><description>每一个TaskTracker可同时运行Reduce任务数</description></property></configuration>

5.2 core-site.xml

<configuration><property><name>fs.defaultFS</name><value>hdfs://ns1</value><description>指定namenode集群(nameservice)别名</description></property><property><name>hadoop.tmp.dir</name><value>/home/hadoop/datas/hdfs</value><description>指定hadoop运行时产生文件的默认存储路径</description></property><property><name>io.file.buffer.size</name><value>131072</value></property><property><name>ha.zookeeper.quorum</name><value>datanode1:2181,datanode2:2181,datanode3:2181</value><description>指定zk集群地址</description></property><property><name>io.compression.codecs</name><value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value><description>开启各种压缩方式</description></property><property><name>io.compression.codec.lzo.class</name><value>com.hadoop.compression.lzo.LzoCodec</value><description>设定LZO压缩方式具体实现</description></property><property><name>hadoop.proxyuser.hadoop.hosts</name><value>*</value></property><property><name>hadoop.proxyuser.hadoop.groups</name><value>*</value></property></configuration>

5.3 yarn-site.xml

<configuration><property><name>yarn.resourcemanager.ha.enabled</name><value>true</value><description>开启RM高可靠</description></property><property><name>yarn.resourcemanager.cluster-id</name><value>yrc</value><description>指定resourcemanager集群别名</description></property><property><name>yarn.resourcemanager.ha.rm-ids</name><value>rm1,rm2</value><description>指定各个resourcemanager别名</description></property><property><name>yarn.resourcemanager.hostname.rm1</name><value>namenode1</value><description>rm1具体host</description></property><property><name>yarn.resourcemanager.hostname.rm2</name><value>namenode2</value><description>rm2具体host</description></property><property><name>yarn.resourcemanager.store.class</name><value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value><description>RM状态信息存储方式,基于ZK(ZKStore),另一种基于内存(MemStore)</description></property><property><name>yarn.resourcemanager.zk-address</name><value>datanode1:2181,datanode2:2181,datanode3:2181/hadoop</value><description>指定zk集群地址</description></property><property><name>yarn.resourcemanager.connect.retry-interval.ms</name><value>10000</value><description>rm失联后重新链接的时间</description></property><property><name>yarn.resourcemanager.recovery.enabled</name><value>true</value><description>开启自动恢复功能</description></property><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><property><name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value></property><property><name>yarn.nodemanager.vmem-check-enabled</name><value>false</value><description>是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true</description></property><property><name>yarn.nodemanager.vmem-pmem-ratio</name><value>4</value><description>任务每使用1MB物理内存,最多可使用虚拟内存量,默认是2.1</description></property><property><name>yarn.scheduler.minimum-allocation-mb</name><value>11264</value></property><property><name>yarn.scheduler.maximum-allocation-mb</name><value>33792</value></property><property><name>yarn.nodemanager.resource.memory-mb</name><value>33792</value></property><property><name>yarn.resourcemanager.max-completed-applications</name><value>300</value></property><property><name>yarn.app.mapreduce.am.resource.mb</name><value>11264</value></property><property><name>yarn.app.mapreduce.am.command-opts</name><value>-Xmx13107m</value></property><property><name>yarn.resourcemanager.scheduler.class</name><value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value></property><property><name>yarn.log-aggregation-enable</name><value>true</value></property><property><name>yarn.log-aggregation.retain-seconds</name><value>2592000</value><description>retain 30 days</description></property><property><name>yarn.log-aggregation.retain-check-interval-seconds</name><value>86400</value></property></configuration>

5.4 hdfs-site.xml

<configuration><property><name>dfs.replication</name><value>3</value><description>数据副本数量,非datanode结点数量,默认3份</description></property><property><name>dfs.support.append</name><value>true</value></property><property><name>dfs.client.block.write.replace-datanode-on-failure.policy</name><value>NEVER</value></property><property><name>dfs.client.block.write.replace-datanode-on-failure.enable</name><value>true</value></property><property><name>dfs.permissions</name><value>false</value><description>关闭权限限制</description></property><property><name>dfs.nameservices</name><value>ns1</value><description>指定namenode集群(nameservice)别名,需要和core-site.xml中的保持一致</description></property><property><name>dfs.ha.namenodes.ns1</name><value>nn1,nn2</value><description>指定各个namenode别名</description></property><property><name>dfs.namenode.rpc-address.ns1.nn1</name><value>namenode1:9000</value><description>nn1的RPC通信地址</description></property><property><name>dfs.namenode.http-address.ns1.nn1</name><value>namenode1:50070</value><description>nn1的http通信地址</description></property><property><name>dfs.namenode.rpc-address.ns1.nn2</name><value>namenode2:9000</value><description>nn2的RPC通信地址</description></property><property><name>dfs.namenode.http-address.ns1.nn2</name><value>namenode2:50070</value><description>nn2的http通信地址</description></property><property><name>dfs.namenode.name.dir</name><value>file:/home/hadoop/datas/hdfs/name</value><description>指定namenode元数据存储地址</description></property><property><name>dfs.datanode.data.dir</name><value>file:/home/hadoop/datas/hdfs/data</value><description>指定datanode数据存储地址</description></property><property><name>dfs.journalnode.edits.dir</name><value>/home/hadoop/datas/hdfs/journal</value><description>指定JournalNode在本地磁盘存放数据的位置</description></property><property><name>dfs.ha.automatic-failover.enabled</name><value>true</value><description>开启NameNode失败自动切换</description></property><property><name>dfs.client.failover.proxy.provider.ns1</name><value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value><description>配置失败自动切换实现方式</description></property><property><name>dfs.ha.fencing.methods</name><value>
        sshfence
        shell(/bin/true)</value><description>配置隔离机制方法,多个机制用换行分割,即每个机制占用一行</description></property><property><name>dfs.ha.fencing.ssh.private-key-files</name><value>/home/hadoop/.ssh/id_rsa</value><description>使用sshfence隔离机制时需要ssh免登陆</description></property><property><name>dfs.ha.fencing.ssh.connect-timeout</name><value>30000</value><description>配置sshfence隔离机制超时时间</description></property><property><name>dfs.datanode.max.transfer.threads</name><value>20960</value></property><property><name>dfs.datanode.max.xcievers</name><value>20960</value><description>同时处理文件的上限</description></property><property><name>dfs.hosts.exclude</name><value>/home/hadoop/optional/exclude_node</value><description>需要忽略的DataNode列表</description></property><property><name>dfs.datanode.socket.write.timeout</name><value>600000</value></property><property><name>dfs.client.socket-timeout</name><value>600000</value></property></configuration>

5.5 配置datanode

##目录在hadoop-2.10.0/etc/hadoop下
vim slaves
datanode1
datanode2
datanode3

6.复制分发授权

将配置好的hadoop发送到其他几台机器上
scp -r ./hadoop-2.10.0 namenode2:

     P 
    
   
     W 
    
   
     D 
    
   
     s 
    
   
     c 
    
   
     p 
    
   
     − 
    
   
     r 
    
   
     . 
    
   
     / 
    
   
     h 
    
   
     a 
    
   
     d 
    
   
     o 
    
   
     o 
    
   
     p 
    
   
     − 
    
   
     2.10.0 
    
   
     d 
    
   
     a 
    
   
     t 
    
   
     a 
    
   
     n 
    
   
     o 
    
   
     d 
    
   
     e 
    
   
     1 
    
   
     : 
    
   
  
    PWD scp -r ./hadoop-2.10.0 datanode1: 
   
  
PWDscp−r./hadoop−2.10.0datanode1:PWD

scp -r ./hadoop-2.10.0 datanode2:

     P 
    
   
     W 
    
   
     D 
    
   
     s 
    
   
     c 
    
   
     p 
    
   
     − 
    
   
     r 
    
   
     . 
    
   
     / 
    
   
     h 
    
   
     a 
    
   
     d 
    
   
     o 
    
   
     o 
    
   
     p 
    
   
     − 
    
   
     2.10.0 
    
   
     d 
    
   
     a 
    
   
     t 
    
   
     a 
    
   
     n 
    
   
     o 
    
   
     d 
    
   
     e 
    
   
     3 
    
   
     : 
    
   
  
    PWD scp -r ./hadoop-2.10.0 datanode3: 
   
  
PWDscp−r./hadoop−2.10.0datanode3:PWD

检查其他服务器hadoop目录权限,如果是root则进行授权给hadoop用户
chown -R hadoop:hadoop ./hadoop-2.10.0/
在这里插入图片描述

7.启动

所有节点机器都切换到hadoop用户 su - hadoop

##分别在datanode的三台机器上启动journalnode
./sbin/hadoop-daemon.sh start journalnode

##下面命令直接在namenode上执行即可
#初始化hdfs
hdfs namenode -format

#启动hadoop
./sbin/start-all.sh

在这里插入图片描述
页面访问 hdfs ip换成自己name的ip即可
http://192.168.1.100:50070/
在这里插入图片描述


本文转载自: https://blog.csdn.net/weixin_43877505/article/details/136289105
版权归原作者 weixin_43877505 所有, 如有侵权,请联系我们删除。

“hadoop高可用安装”的评论:

还没有评论