0


docker搭建hadoop+hbase

参考文档:

1. 官方文档 https://hub.docker.com/r/apache/hadoop

一.创建集群配置

根据文档可知,首先创建一个docker-compose.yaml文件

我的文件内容如下

version: "2"
services:
namenode:
image: apache/hadoop:3.3.6
hostname: namenode
command: ["hdfs", "namenode"]
ports:
- 9870:9870
- 8020:8020
env_file:
- ./config
environment:
ENSURE_NAMENODE_DIR: "/tmp/hadoop-root/dfs/name"
datanode:
image: apache/hadoop:3.3.6
command: ["hdfs", "datanode"]
env_file:
- ./config
resourcemanager:
image: apache/hadoop:3.3.6
hostname: resourcemanager
command: ["yarn", "resourcemanager"]
ports:
- 8088:8088
env_file:
- ./config
volumes:
- ./test.sh:/opt/test.sh
nodemanager:
image: apache/hadoop:3.3.6
command: ["yarn", "nodemanager"]
env_file:
- ./config
zoo1:
image: zookeeper:3.9.2
restart: always
hostname: zoo1
ports:
- 2181:2181
- 2888:2888
- 3888:3888
environment:
ZOO_MY_ID: 1
hbase:
image: hbase:1.1
ports:
- 60010:60010
- 16010:16010
- 16000:16000
- 16020:16020
- 16030:16030
volumes:
- ./logs/hbase:/opt/hbase/logs

在同目录下创建conifg,内容为官方网站提供的如下

CORE-SITE.XML_fs.default.name=hdfs://namenode
CORE-SITE.XML_fs.defaultFS=hdfs://namenode
HDFS-SITE.XML_dfs.namenode.rpc-address=namenode:8020
HDFS-SITE.XML_dfs.replication=1
MAPRED-SITE.XML_mapreduce.framework.name=yarn
MAPRED-SITE.XML_yarn.app.mapreduce.am.env=HADOOP_MAPRED_HOME=$HADOOP_HOME
MAPRED-SITE.XML_mapreduce.map.env=HADOOP_MAPRED_HOME=$HADOOP_HOME
MAPRED-SITE.XML_mapreduce.reduce.env=HADOOP_MAPRED_HOME=$HADOOP_HOME
YARN-SITE.XML_yarn.resourcemanager.hostname=resourcemanager
YARN-SITE.XML_yarn.nodemanager.pmem-check-enabled=false
YARN-SITE.XML_yarn.nodemanager.delete.debug-delay-sec=600
YARN-SITE.XML_yarn.nodemanager.vmem-check-enabled=false
YARN-SITE.XML_yarn.nodemanager.aux-services=mapreduce_shuffle
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.maximum-applications=10000
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.maximum-am-resource-percent=0.1
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.queues=default
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.capacity=100
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.user-limit-factor=1
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.maximum-capacity=100
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.state=RUNNING
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.acl_submit_applications=*
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.acl_administer_queue=*
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.node-locality-delay=40
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.queue-mappings=
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.queue-mappings-override.enable=false

其中的 hbase为我自己制作的镜像

hbase制作镜像流程:

 1. 前往官方Apache HBase – Apache HBase Downloads下载对应的压缩包,我是基于hadoop3制作的,所以下载的版本是hbase-2.6.0-hadoop3-bin.tar.gz 

 2.撰写dockerfile,我的dockfile如下,由于是一开始调试,使用了ubuntu全镜像,如果想继续优化,优化思路如下1.可以基于alpine,或者直接基于java构建。使用aipine需要注意,由于alpine用的是musl libc,可能得添加glibc,这边没测试过。故有兴趣的可以自己尝试下。2.先解压缩,然后直接修改里面的配置文件,复制文件夹而不是压缩包进去,可以稍微加快打包速度。

未优化过的dockerfile如下

使用基础镜像,这里以Ubuntu为例

FROM ubuntu:20.04

更新apt源并安装基本软件包

RUN apt-get update && apt-get install -y openjdk-8-jdk telnet vim && rm -rf /var/lib/apt/lists/*

这里是设置了jdk的环境

ENV JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
ENV PATH="$JAVA_HOME/bin:$PATH"

下载HBase的二进制包,这里以HBase 2.6.0为例

ARG HBASE_VERSION=2.6.0
COPY hbase-${HBASE_VERSION}-hadoop3-bin.tar.gz /tmp/

解压HBase到/opt/hbase目录

RUN tar xzf /tmp/hbase-${HBASE_VERSION}-hadoop3-bin.tar.gz -C /opt/
&& mv /opt/hbase-${HBASE_VERSION}-hadoop3 /opt/hbase
&& rm -f /tmp/hbase-${HBASE_VERSION}-hadoop3-bin.tar.gz

设置HBase的环境变量

ENV PATH="/opt/hbase/bin:$PATH"
ENV HBASE_HOME="/opt/hbase"
ENV HBASE_MANAGES_ZK=false

暴露HBase的端口

EXPOSE 60010 16010

复制模板配置文件

COPY hbase-site.xml /opt/hbase/conf/hbase-site.xml

在容器启动时生成配置文件

CMD ["tail", "-f", "/dev/null"]

    这里还装了telnet和vim用于调试。

 3.hbase配置文件如下
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- HBase Configuration Template -->

<configuration>

  <!-- ZooKeeper quorum -->
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>zoo1:2181</value>
    <!-- Optional ZooKeeper client port -->
    <description>
      A comma-separated list of host:port pairs pointing to the ZooKeeper Quorum.
      The port defaults to 2181 if not specified.
    </description>
  </property>

  <!-- ZooKeeper client port -->
  <property>
    <name>hbase.zookeeper.property.clientPort</name>
    <value>2181</value>
    <description>
      The port at which the clients will connect.
    </description>
  </property>

  <!-- ZooKeeper session timeout -->
  <property>
    <name>hbase.zookeeper.session.timeout.upper.limit</name>
    <value>60000</value>
    <description>
      The maximum value for the ZooKeeper session timeout, in milliseconds.
    </description>
  </property>

  <!-- HBase root directory -->
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://namenode:8020/hbase</value>
    <description>
      The root directory where HBase stores its data.
    </description>
  </property>

  <!-- HFile block cache size -->
  <property>
    <name>hfile.block.cache.size</name>
    <value>0.4</value>
    <description>
      The fraction of heap space to use for the HFile block cache.
    </description>
  </property>

  <!-- RPC timeout -->
  <property>
    <name>hbase.rpc.timeout</name>
    <value>30000</value>
    <description>
      The default timeout for RPC calls, in milliseconds.
    </description>
  </property>

  <!-- HBase master address -->

  <!-- HBase region server handler count -->
  <property>
    <name>hbase.regionserver.handler.count</name>
    <value>60</value>
    <description>
      The number of request handlers per region server.
    </description>
  </property>
<property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.tmp.dir</name>
    <value>./tmp</value>
  </property>
  <property>
    <name>hbase.unsafe.stream.capability.enforce</name>
    <value>false</value>
  </property>
  <!-- More properties can be added as needed -->

</configuration>

接着就是打包 docker build -t hbase:1.1

二.启动集群

docker compose up -d

启动集群后,以交互模式进入hdfs的namenode,修改权限,否则需要以对应用户启动hbase

docker exec -it <namenode的容器id> /bin/bash

这里为了学习用,授予了其他用户写入权限,生产不可以这样,应该引入Kerberos或者管理好用户及用户组

hdfs dfs -chmod o+w /

接着进入hbase容器启动hbase

docker exec -it <hbase的容器id> /bin/bash

sh -c /opt/hbase/bin/start-hbase.sh

三.检查是否启动成功

hbase 管理是 127.0.0.1:16010

hdfs

yarn

四 注意点

1.如果本地idea测试应用,需要往host添加127.0.0.1 容器id

2.集群外容器想访问,应该创建docker 网络,然乎在docker compose使用该网络。
标签: docker hadoop hbase

本文转载自: https://blog.csdn.net/weixin_42486564/article/details/140694787
版权归原作者 Zig zag 所有, 如有侵权,请联系我们删除。

“docker搭建hadoop+hbase”的评论:

还没有评论