0


15:Zookeeper高可用集群|分布式消息队列Kafka|搭建高可用Hadoop集群

Zookeeper高可用集群|分布式消息队列Kafka|搭建高可用Hadoop集群

Zookeeper集群

Zookeeper是一个开源的分布式应用程序协调服务,是用来保证数据在集群间的事务一致性
应用场景:

  • 集群分布式锁
  • 集群统一命令服务
  • 分布式协调服务

Zookeeper角色与特性

  • Leader:接受所有Follower的提案请求并统一协调发起提案的投票,负责与所有的Follower进行内部数据交换
  • Follower:直接为客户端服务并参与提案的投票,同时与Leader进行数据交换
  • Observer:直接为客户端服务并不参与提案的投票,同时也与Leader进行数据交换

Zookeeper角色与选举

服务在启动时候是没有角色的,角色是通过选举产生的,选举产生一个Leader,剩下的是Follower
选取leader的原则:

  • 集群中超过半数机器投票选择Leader
  • 假设集群中拥有n台服务器,那么Leader必须得到(n/2+1)台服务器的投票

Zookeeper的高可用

如果leader死亡,重新选去leader
如果死亡的机器数量达到一半,则集群挂掉
如果无法得到足够的投票数量,就重新发起投票,如果参与投票的机器不足n/2+1,则集群停止工作
如果Follower死亡过多,剩余机器不足n/2+1,则集群也会停止工作
Observer不计算在投票中设备数量里面

Zookeeper可伸缩扩展性原理与设计

Leader所有写相关操作
Follower读操作与相应Leader提议
在Observer出现之前,Zookeeper的伸缩性由Follower来实现,我们可以通过Follower节点的数量来保证Zookeeper服务的读性能,但是随着Follower节点数量的增加,Zookeeper服务的写性能受到影响

Zookeeper安装

组建 zookeeper 集群
1 个 leader
2 个 follower
1 个 observer
1)编辑/etc/hosts ,所有集群主机可以相互 ping 通(在hadoop1上面配置,同步到node-0001,node-0002,node-0003)

  1. [root@hadoop1 hadoop]# vim /etc/hosts192.168.1.50 hadoop1
  2. 192.168.1.51 node-0001
  3. 192.168.1.52 node-0002
  4. 192.168.1.53 node-0003
  5. 192.168.1.56 newnode
  6. [root@nn01 hadoop]# for i in {52..54} \do\scp /etc/hosts 192.168.1.$i:/etc/ \done //同步配置
  7. hosts 100% 253639.2KB/s 00:00
  8. hosts 100% 253497.7KB/s 00:00
  9. hosts 100% 253662.2KB/s 00:00

2)安装 java-1.8.0-openjdk-devel,由于之前的hadoop上面已经安装过,这里不再安装,若是新机器要安装
3)zookeeper 解压拷贝到 /usr/local/zookeeper

  1. [root@hadoop1 ~]# tar -xf zookeeper-3.4.13.tar.gz [root@hadoop1 ~]# mv zookeeper-3.4.13 /usr/local/zookeeper

4)配置文件改名,并在最后添加配置

  1. [root@hadoop1 ~]# cd /usr/local/zookeeper/conf/[root@hadoop1 conf]# ls
  2. configuration.xsl log4j.properties zoo_sample.cfg
  3. [root@hadoop1 conf]# mv zoo_sample.cfg zoo.cfg[root@hadoop1 conf]# chown root.root zoo.cfg[root@hadoop1 conf]# vim zoo.cfgserver.1=node-0001:2888:3888
  4. server.2=node-0002:2888:3888
  5. server.3=node-0003:2888:3888
  6. server.4=hadoop1:2888:3888:observer

5)拷贝 /usr/local/zookeeper 到其他集群主机

  1. [root@hadoop1 ~]# for i in node-{0001..0003};do rsync -aXSH --delete /usr/local/zookeeper ${i}:/usr/local/ done

6)创建 mkdir /tmp/zookeeper,每一台都要

  1. [root@hadoop1 conf]# mkdir /tmp/zookeeper[root@hadoop1 conf]# ssh node-0001 mkdir /tmp/zookeeper[root@hadoop1 conf]# ssh node-0002 mkdir /tmp/zookeeper[root@hadoop1 conf]# ssh node-0003 mkdir /tmp/zookeeper

7)创建 myid 文件,id 必须与配置文件里主机名对应的 server.(id) 一致

  1. [root@hadoop1 conf]# echo 4 >/tmp/zookeeper/myid[root@hadoop1 conf]# ssh node-0001 'echo 1 >/tmp/zookeeper/myid'[root@hadoop1 conf]# ssh node-0002 'echo 2 >/tmp/zookeeper/myid'[root@hadoop1 conf]# ssh node-0003 'echo 3 >/tmp/zookeeper/myid'

8)启动服务,单启动一台无法查看状态,需要启动全部集群以后才能查看状态,每一台上面都要手工启动(以hadoop1为例子)

  1. [root@hadoop1 conf]# /usr/local/zookeeper/bin/zkServer.sh start
  2. ZooKeeper JMX enabled by default
  3. Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
  4. Starting zookeeper ... STARTED

注意:刚启动zookeeper查看状态的时候报错,启动的数量要保证半数以上,这时再去看就成功了
9)查看状态

  1. [root@hadoop1 conf]# /usr/local/zookeeper/bin/zkServer.sh status
  2. ZooKeeper JMX enabled by default
  3. Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
  4. Mode: observe
  5. [root@hadoop1 conf]# /usr/local/zookeeper/bin/zkServer.sh stop
  6. //关闭之后查看状态其他服务器的角色
  7. ZooKeeper JMX enabled by default
  8. Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
  9. Stopping zookeeper ... STOPPED

zookeeper集群管理

在这里插入图片描述
在这里插入图片描述

Kafka概述

Kafka角色与集群结构
在这里插入图片描述
在这里插入图片描述

在node节点上搭建3台kafka

在node节点上搭建3台kafka
node-0001
node-0002
node-0003
1)解压 kafka 压缩包

  1. Kafkanode-0001node-0002node-0003上面操作即可
  2. [root@node-0001 hadoop]# tar -xf kafka_2.12-2.1.0.tgz

2)把 kafka 拷贝到 /usr/local/kafka 下面

  1. [root@node-0001 ~]# mv kafka_2.12-2.1.0 /usr/local/kafka

3)修改配置文件 /usr/local/kafka/config/server.properties

  1. [root@node-0001 ~]# cd /usr/local/kafka/config[root@node-0001 config]# vim server.propertiesbroker.id=22zookeeper.connect=node-0001:2181,node-0002:2181,node-0003:2181

4)拷贝 kafka 到其他主机,并修改 broker.id ,不能重复

  1. [root@node-0001 config]# for i in 53 54; do rsync -aSH --delete /usr/local/kafka 192.168.1.$i:/usr/local/; done[1]27072[2]27073[root@node-0002 ~]# vim /usr/local/kafka/config/server.properties
  2. //node-0002主机修改
  3. broker.id=23[root@node-0003 ~]# vim /usr/local/kafka/config/server.properties
  4. //node-0003主机修改
  5. broker.id=24

5)启动 kafka 集群(node-0001,node-0002,node-0003启动)

  1. [root@node-0001 local]# /usr/local/kafka/bin/kafka-server-start.sh -daemon /usr/local/kafka/config/server.properties [root@node-0001 local]# jps //出现kafka26483 DataNode
  2. 27859 Jps
  3. 27833 Kafka
  4. 26895 QuorumPeerMain

6)验证配置,创建一个 topic

  1. [root@node-0001 local]# /usr/local/kafka/bin/kafka-topics.sh --create --partitions 1 --replication-factor 1 --zookeeper localhost:2181 --topic mymsg
  2. Created topic "mymsg".
  1. 模拟生产者,发布消息
  1. [root@node-0002 ~]# /usr/local/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic mymsg
  2. //写一个数据
  3. ccc
  4. ddd

9)模拟消费者,接收消息

  1. [root@node-0003 ~]# /usr/local/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic mymsg
  2. //这边会直接同步
  3. ccc
  4. ddd

高可用Hadoop集群

高可用概述

  • NameNode高可用 想实现Hadoop高可用就必须实现NameNode的高可用,NameNode是HDFS的核心,HDFS又是Hadoop核心组件,NameNode在Hadoop集群中至关重要。 NameNode宕机,将导致集群不可用,如果NameNode数据丢失将导致整个集群的数据丢失,而NameNode的数据更新又比较频繁,实现NameNode高可用势在必行。

所有节点
192.168.1.50 hadoop1
192.168.1.56 hadoop2
192.168.1.51 node-0001
192.168.1.52 node-0002
192.168.1.53 node-0003
新机器安装 java-1.8.0-openjdk-devel
新机器配置 /etc/hosts
新机器配置 ssh 免密钥登录
修改配置文件

在这里插入图片描述

高可用架构

在这里插入图片描述

准备环境

在这里插入图片描述
在这里插入图片描述

配置namenode与resourcemanager高可用

1)配置 core-site

  1. [root@hadoop1 .ssh]# vim /usr/local/hadoop/etc/hadoop/core-site.xml<configuration><property><name>fs.defaultFS</name><value>hdfs://nsdcluster</value>
  2. //nsdcluster是随便起的名。相当于一个组,访问的时候访问这个组
  3. </property><property><name>hadoop.tmp.dir</name><value>/var/hadoop</value></property><property><name>ha.zookeeper.quorum</name><value>node-0001:2181,node-0002:2181,node-0003:2181</value> //zookeepe的地址
  4. </property><property><name>hadoop.proxyuser.nfs.groups</name><value>*</value></property><property><name>hadoop.proxyuser.nfs.hosts</name><value>*</value></property></configuration>

2)配置 hdfs-site

  1. [root@hadoop1 ~]# vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml<configuration><property><name>dfs.replication</name><value>2</value></property><property><name>dfs.nameservices</name><value>nsdcluster</value></property><property><name>dfs.ha.namenodes.nsdcluster</name>
  2. //nn1,nn2名称固定,是内置的变量,nsdcluster里面有nn1,nn2
  3. <value>nn1,nn2</value></property><property><name>dfs.namenode.rpc-address.nsdcluster.nn1</name>
  4. //声明nn1 8020为通讯端口,是hadoop1的rpc通讯端口
  5. <value>hadoop1:8020</value></property><property><name>dfs.namenode.rpc-address.nsdcluster.nn2</name>
  6. //声明nn2是谁,hadoop2的rpc通讯端口
  7. <value>hadoop2:8020</value></property><property><name>dfs.namenode.http-address.nsdcluster.nn1</name>
  8. //hadoop1的http通讯端口
  9. <value>hadoop1:50070</value></property><property><name>dfs.namenode.http-address.nsdcluster.nn2</name>
  10. //hadoop1和hadoop2的http通讯端口
  11. <value>hadoop2:50070</value></property><property><name>dfs.namenode.shared.edits.dir</name>
  12. //指定namenode元数据存储在journalnode中的路径
  13. <value>qjournal://node-0001:8485;node-0002:8485;node-0003:8485/nsdcluster</value></property><property><name>dfs.journalnode.edits.dir</name>
  14. //指定journalnode日志文件存储的路径
  15. <value>/var/hadoop/journal</value></property><property><name>dfs.client.failover.proxy.provider.nsdcluster</name>
  16. //指定HDFS客户端连接active namenode的java类
  17. <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value></property><property><name>dfs.ha.fencing.methods</name> //配置隔离机制为ssh
  18. <value>sshfence</value></property><property><name>dfs.ha.fencing.ssh.private-key-files</name> //指定密钥的位置
  19. <value>/root/.ssh/id_rsa</value></property><property><name>dfs.ha.automatic-failover.enabled</name> //开启自动故障转移
  20. <value>true</value></property></configuration>

3)配置yarn-site

  1. [root@hadoop1 ~]# vim /usr/local/hadoop/etc/hadoop/yarn-site.xml<configuration><!-- Site specific YARN configuration properties --><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><property><name>yarn.resourcemanager.ha.enabled</name><value>true</value></property><property><name>yarn.resourcemanager.ha.rm-ids</name> //rm1,rm2代表hadoop1hadoop2
  2. <value>rm1,rm2</value></property><property><name>yarn.resourcemanager.recovery.enabled</name><value>true</value></property><property><name>yarn.resourcemanager.store.class</name><value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value></property><property><name>yarn.resourcemanager.zk-address</name><value>node-0001:2181,node-0002:2181,node-0003:2181</value></property><property><name>yarn.resourcemanager.cluster-id</name><value>yarn-ha</value></property><property><name>yarn.resourcemanager.hostname.rm1</name><value>hadoop1</value></property><property><name>yarn.resourcemanager.hostname.rm2</name><value>hadoop2</value></property></configuration>

启动服务,验证高可用

1)同步到hadoop2,node-0001,node-0002,node-0003

  1. [root@hadoop1 ~]# for i in {51..53} 56; do rsync -aSH --delete /usr/local/hadoop/ 192.168.1.$i:/usr/local/hadoop -e 'ssh' & done[1]25411[2]25412[3]25413[4]25414

2)删除所有机器上面的/user/local/hadoop/logs,方便排错

  1. [root@hadoop1 ~]# for i in {50..53} 56; do ssh 192.168.1.$i rm -rf /usr/local/hadoop/logs ; done

3)同步配置

  1. [root@hadoop1 ~]# for i in {51..53} 56; do rsync -aSH --delete /usr/local/hadoop 192.168.1.$i:/usr/local/hadoop -e 'ssh' & done[1]28235[2]28236[3]28237[4]28238

4)初始化ZK集群

  1. [root@hadoop1 ~]# /usr/local/hadoop/bin/hdfs zkfc -formatZK ...
  2. 18/09/11 15:43:35 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/nsdcluster in ZK //出现Successfully即为成功
  3. ...

5)在node-0001,node-0002,node-0003上面启动journalnode服务(以node-0001为例子)

  1. [root@node-0001 ~]# /usr/local/hadoop/sbin/hadoop-daemon.sh start journalnode
  2. starting journalnode, logging to /usr/local/hadoop/logs/hadoop-root-journalnode-node-0001.out
  3. [root@node-0001 ~]# jps29262 JournalNode
  4. 26895 QuorumPeerMain
  5. 29311 Jps

6)格式化,先在node-0001,node-0002,node-0003上面启动journalnode才能格式化

  1. [root@hadoop1 ~]# /usr/local/hadoop//bin/hdfs namenode -format
  2. //出现Successfully即为成功
  3. [root@hadoop1 hadoop]# ls /var/hadoop/
  4. dfs

7)hadoop2数据同步到本地 /var/hadoop/dfs

  1. [root@hadoop2 ~]# cd /var/hadoop/[root@hadoop2 hadoop]# ls[root@hadoop2 hadoop]# rsync -aSH hadoop1:/var/hadoop/ /var/hadoop/[root@hadoop2 hadoop]# ls
  2. dfs

8)初始化 JNS

  1. [root@hadoop1 hadoop]# /usr/local/hadoop/bin/hdfs namenode -initializeSharedEdits18/09/11 16:26:15 INFO client.QuorumJournalManager: Successfully started new epoch 1 //出现Successfully,成功开启一个节点

9)停止 journalnode 服务(node-0001,node-0002,node-0003)

  1. [root@node-0001 hadoop]# /usr/local/hadoop/sbin/hadoop-daemon.sh stop journalnode
  2. stopping journalnode
  3. [root@node-0001 hadoop]# jps29346 Jps
  4. 26895 QuorumPeerMain

启动集群

1)hadoop1上面操作

  1. [root@hadoop1 hadoop]# /usr/local/hadoop/sbin/start-all.sh //启动所有集群
  2. This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
  3. Starting namenodes on [hadoop1 hadoop2]
  4. hadoop1: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-hadoop1.out
  5. hadoop2: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-hadoop2.out
  6. node-0002: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-node-0002.out
  7. node-0003: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-node-0003.out
  8. node-0001: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-node-0001.out
  9. Starting journal nodes [node-0001 node-0002 node-0003]
  10. node-0001: starting journalnode, logging to /usr/local/hadoop/logs/hadoop-root-journalnode-node-0001.out
  11. node-0003: starting journalnode, logging to /usr/local/hadoop/logs/hadoop-root-journalnode-node-0003.out
  12. node-0002: starting journalnode, logging to /usr/local/hadoop/logs/hadoop-root-journalnode-node-0002.out
  13. Starting ZK Failover Controllers on NN hosts [hadoop1 hadoop2]
  14. hadoop1: starting zkfc, logging to /usr/local/hadoop/logs/hadoop-root-zkfc-hadoop1.out
  15. hadoop2: starting zkfc, logging to /usr/local/hadoop/logs/hadoop-root-zkfc-hadoop2.out
  16. starting yarn daemons
  17. starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-hadoop1.out
  18. node-0002: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-node-0002.out
  19. node-0001: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-node-0001.out
  20. node-0003: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-node-0003.out

2)hadoop2上面操作

  1. [root@hadoop2 hadoop]# /usr/local/hadoop/sbin/yarn-daemon.sh start resourcemanager
  2. starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-hadoop2.out

3)查看集群状态

  1. [root@hadoop1 hadoop]# /usr/local/hadoop/bin/hdfs haadmin -getServiceState nn1
  2. active
  3. [root@hadoop1 hadoop]# /usr/local/hadoop/bin/hdfs haadmin -getServiceState nn2
  4. standby
  5. [root@hadoop1 hadoop]# /usr/local/hadoop/bin/yarn rmadmin -getServiceState rm1
  6. active
  7. [root@hadoop1 hadoop]# /usr/local/hadoop/bin/yarn rmadmin -getServiceState rm2
  8. standby

4)查看节点是否加入

  1. [root@hadoop1 hadoop]# /usr/local/hadoop/bin/hdfs dfsadmin -report...
  2. Live datanodes (3): //会有三个节点
  3. ...
  4. [root@hadoop1 hadoop]# /usr/local/hadoop/bin/yarn node -list
  5. Total Nodes:3
  6. Node-Id Node-State Node-Http-Address Number-of-Running-Containers
  7. node-0002:43307 RUNNING node-0002:8042 0
  8. node-0001:34606 RUNNING node-0001:8042 0
  9. node-0003:36749 RUNNING node-0003:8042 0

访问集群

  1. [root@hadoop1 hadoop]# /usr/local/hadoop/bin/hadoop fs -ls /[root@hadoop1 hadoop]# /usr/local/hadoop/bin/hadoop fs -mkdir /aa //创建aa[root@hadoop1 hadoop]# /usr/local/hadoop/bin/hadoop fs -ls / //再次查看
  2. Found 1 items
  3. drwxr-xr-x - root supergroup 02018-09-11 16:54 /aa
  4. [root@hadoop1 hadoop]# /usr/local/hadoop/bin/hadoop fs -put *.txt /aa[root@hadoop1 hadoop]# /usr/local/hadoop/bin/hadoop fs -ls hdfs://nsdcluster/aa
  5. //也可以这样查看
  6. Found 3 items
  7. -rw-r--r-- 2 root supergroup 864242018-09-11 17:00 hdfs://nsdcluster/aa/LICENSE.txt
  8. -rw-r--r-- 2 root supergroup 149782018-09-11 17:00 hdfs://nsdcluster/aa/NOTICE.txt
  9. -rw-r--r-- 2 root supergroup 13662018-09-11 17:00 hdfs://nsdcluster/aa/README.txt
  10. 验证高可用,关闭 active namenode
  11. [root@hadoop1 hadoop]# /usr/local/hadoop/bin/hdfs haadmin -getServiceState nn1
  12. active
  13. [root@hadoop1 hadoop]# /usr/local/hadoop/sbin/hadoop-daemon.sh stop namenode
  14. stopping namenode
  15. [root@hadoop1 hadoop]# /usr/local/hadoop/bin/hdfs haadmin -getServiceState nn1
  16. //再次查看会报错
  17. [root@hadoop1 hadoop]# /usr/local/hadoop/bin/hdfs haadmin -getServiceState nn2
  18. //hadoop2由之前的standby变为active
  19. active
  20. [root@hadoop1 hadoop]# /usr/local/hadoop/bin/yarn rmadmin -getServiceState rm1
  21. active
  22. [root@hadoop1 hadoop]# /usr/local/hadoop/sbin/yarn-daemon.sh stop resourcemanager
  23. //停止resourcemanager
  24. [root@hadoop1 hadoop]# /usr/local/hadoop/bin/yarn rmadmin -getServiceState rm2
  25. active
  26. 恢复节点
  27. [root@hadoop1 hadoop]# /usr/local/hadoop/sbin/hadoop-daemon.sh start namenode
  28. //启动namenode
  29. [root@hadoop1 hadoop]# /usr/local/hadoop/sbin/yarn-daemon.sh start resourcemanager
  30. //启动resourcemanager
  31. [root@hadoop1 hadoop]# /usr/local/hadoop/bin/hdfs haadmin -getServiceState nn1
  32. //查看
  33. [root@hadoop1 hadoop]# /usr/local/hadoop/bin/yarn rmadmin -getServiceState rm1
  34. //查看

本文转载自: https://blog.csdn.net/shengweiit/article/details/136501874
版权归原作者 桑_榆 所有, 如有侵权,请联系我们删除。

“15:Zookeeper高可用集群|分布式消息队列Kafka|搭建高可用Hadoop集群”的评论:

还没有评论