参考文档:
1. 官方文档 https://hub.docker.com/r/apache/hadoop
一.创建集群配置
根据文档可知,首先创建一个docker-compose.yaml文件
我的文件内容如下
version: "2"
services:
namenode:
image: apache/hadoop:3.3.6
hostname: namenode
command: ["hdfs", "namenode"]
ports:
- 9870:9870
- 8020:8020
env_file:
- ./config
environment:
ENSURE_NAMENODE_DIR: "/tmp/hadoop-root/dfs/name"
datanode:
image: apache/hadoop:3.3.6
command: ["hdfs", "datanode"]
env_file:
- ./config
resourcemanager:
image: apache/hadoop:3.3.6
hostname: resourcemanager
command: ["yarn", "resourcemanager"]
ports:
- 8088:8088
env_file:
- ./config
volumes:
- ./test.sh:/opt/test.sh
nodemanager:
image: apache/hadoop:3.3.6
command: ["yarn", "nodemanager"]
env_file:
- ./config
zoo1:
image: zookeeper:3.9.2
restart: always
hostname: zoo1
ports:
- 2181:2181
- 2888:2888
- 3888:3888
environment:
ZOO_MY_ID: 1
hbase:
image: hbase:1.1
ports:
- 60010:60010
- 16010:16010
- 16000:16000
- 16020:16020
- 16030:16030
volumes:
- ./logs/hbase:/opt/hbase/logs
在同目录下创建conifg,内容为官方网站提供的如下
CORE-SITE.XML_fs.default.name=hdfs://namenode
CORE-SITE.XML_fs.defaultFS=hdfs://namenode
HDFS-SITE.XML_dfs.namenode.rpc-address=namenode:8020
HDFS-SITE.XML_dfs.replication=1
MAPRED-SITE.XML_mapreduce.framework.name=yarn
MAPRED-SITE.XML_yarn.app.mapreduce.am.env=HADOOP_MAPRED_HOME=$HADOOP_HOME
MAPRED-SITE.XML_mapreduce.map.env=HADOOP_MAPRED_HOME=$HADOOP_HOME
MAPRED-SITE.XML_mapreduce.reduce.env=HADOOP_MAPRED_HOME=$HADOOP_HOME
YARN-SITE.XML_yarn.resourcemanager.hostname=resourcemanager
YARN-SITE.XML_yarn.nodemanager.pmem-check-enabled=false
YARN-SITE.XML_yarn.nodemanager.delete.debug-delay-sec=600
YARN-SITE.XML_yarn.nodemanager.vmem-check-enabled=false
YARN-SITE.XML_yarn.nodemanager.aux-services=mapreduce_shuffle
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.maximum-applications=10000
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.maximum-am-resource-percent=0.1
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.queues=default
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.capacity=100
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.user-limit-factor=1
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.maximum-capacity=100
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.state=RUNNING
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.acl_submit_applications=*
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.acl_administer_queue=*
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.node-locality-delay=40
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.queue-mappings=
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.queue-mappings-override.enable=false
其中的 hbase为我自己制作的镜像
hbase制作镜像流程:
1. 前往官方Apache HBase – Apache HBase Downloads下载对应的压缩包,我是基于hadoop3制作的,所以下载的版本是hbase-2.6.0-hadoop3-bin.tar.gz
2.撰写dockerfile,我的dockfile如下,由于是一开始调试,使用了ubuntu全镜像,如果想继续优化,优化思路如下1.可以基于alpine,或者直接基于java构建。使用aipine需要注意,由于alpine用的是musl libc,可能得添加glibc,这边没测试过。故有兴趣的可以自己尝试下。2.先解压缩,然后直接修改里面的配置文件,复制文件夹而不是压缩包进去,可以稍微加快打包速度。
未优化过的dockerfile如下
使用基础镜像,这里以Ubuntu为例
FROM ubuntu:20.04
更新apt源并安装基本软件包
RUN apt-get update && apt-get install -y openjdk-8-jdk telnet vim && rm -rf /var/lib/apt/lists/*
这里是设置了jdk的环境
ENV JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
ENV PATH="$JAVA_HOME/bin:$PATH"下载HBase的二进制包,这里以HBase 2.6.0为例
ARG HBASE_VERSION=2.6.0
COPY hbase-${HBASE_VERSION}-hadoop3-bin.tar.gz /tmp/解压HBase到/opt/hbase目录
RUN tar xzf /tmp/hbase-${HBASE_VERSION}-hadoop3-bin.tar.gz -C /opt/
&& mv /opt/hbase-${HBASE_VERSION}-hadoop3 /opt/hbase
&& rm -f /tmp/hbase-${HBASE_VERSION}-hadoop3-bin.tar.gz设置HBase的环境变量
ENV PATH="/opt/hbase/bin:$PATH"
ENV HBASE_HOME="/opt/hbase"
ENV HBASE_MANAGES_ZK=false暴露HBase的端口
EXPOSE 60010 16010
复制模板配置文件
COPY hbase-site.xml /opt/hbase/conf/hbase-site.xml
在容器启动时生成配置文件
CMD ["tail", "-f", "/dev/null"]
这里还装了telnet和vim用于调试。
3.hbase配置文件如下
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- HBase Configuration Template -->
<configuration>
<!-- ZooKeeper quorum -->
<property>
<name>hbase.zookeeper.quorum</name>
<value>zoo1:2181</value>
<!-- Optional ZooKeeper client port -->
<description>
A comma-separated list of host:port pairs pointing to the ZooKeeper Quorum.
The port defaults to 2181 if not specified.
</description>
</property>
<!-- ZooKeeper client port -->
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
<description>
The port at which the clients will connect.
</description>
</property>
<!-- ZooKeeper session timeout -->
<property>
<name>hbase.zookeeper.session.timeout.upper.limit</name>
<value>60000</value>
<description>
The maximum value for the ZooKeeper session timeout, in milliseconds.
</description>
</property>
<!-- HBase root directory -->
<property>
<name>hbase.rootdir</name>
<value>hdfs://namenode:8020/hbase</value>
<description>
The root directory where HBase stores its data.
</description>
</property>
<!-- HFile block cache size -->
<property>
<name>hfile.block.cache.size</name>
<value>0.4</value>
<description>
The fraction of heap space to use for the HFile block cache.
</description>
</property>
<!-- RPC timeout -->
<property>
<name>hbase.rpc.timeout</name>
<value>30000</value>
<description>
The default timeout for RPC calls, in milliseconds.
</description>
</property>
<!-- HBase master address -->
<!-- HBase region server handler count -->
<property>
<name>hbase.regionserver.handler.count</name>
<value>60</value>
<description>
The number of request handlers per region server.
</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>./tmp</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
<!-- More properties can be added as needed -->
</configuration>
接着就是打包 docker build -t hbase:1.1
二.启动集群
docker compose up -d
启动集群后,以交互模式进入hdfs的namenode,修改权限,否则需要以对应用户启动hbase
docker exec -it <namenode的容器id> /bin/bash
这里为了学习用,授予了其他用户写入权限,生产不可以这样,应该引入Kerberos或者管理好用户及用户组
hdfs dfs -chmod o+w /
接着进入hbase容器启动hbase
docker exec -it <hbase的容器id> /bin/bash
sh -c /opt/hbase/bin/start-hbase.sh
三.检查是否启动成功
hbase 管理是 127.0.0.1:16010
hdfs
yarn
四 注意点
1.如果本地idea测试应用,需要往host添加127.0.0.1 容器id
2.集群外容器想访问,应该创建docker 网络,然乎在docker compose使用该网络。
版权归原作者 Zig zag 所有, 如有侵权,请联系我们删除。