0


通过 docker-compose 快速部署 Hive 详细教程

文章目录

一、概述

其实通过 docker-compose 部署 hive 是在继上篇文章 Hadoop 部署的基础之上叠加的,Hive 做为最常用的数仓服务,所以是有必要进行集成的,感兴趣的小伙伴请认真阅读我以下内容,通过 docker-compose 部署的服务主要是用最少的资源和时间成本快速部署服务,方便小伙伴学习、测试、验证功能等等~

关于 Hadoop 部署可以查阅我以下几篇文章:

  • 通过 docker-compose 快速部署 Hadoop 集群详细教程
  • 通过 docker-compose 快速部署 Hadoop 集群极简教程

最好是先浏览一下Hadoop 部署的文章,如果不 care 详细过程,就可以只查阅 通过 docker-compose 快速部署 Hadoop 集群极简教程 这篇文章即可~

关于 Hive 的介绍可以查阅我以下文章:大数据Hadoop之——数据仓库Hive

二、前期准备

1)部署 docker

# 安装yum-config-manager配置工具
yum -y install yum-utils

# 建议使用阿里云yum源:(推荐)#yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

# 安装docker-ce版本
yum install -y docker-ce
# 启动并开机启动
systemctl enable --now dockerdocker --version

2)部署 docker-compose

curl -SL https://github.com/docker/compose/releases/download/v2.16.0/docker-compose-linux-x86_64 -o /usr/local/bin/docker-compose

chmod +x /usr/local/bin/docker-compose
docker-compose --version

三、创建网络

# 创建,注意不能使用hadoop_network,要不然启动hs2服务的时候会有问题!!!docker network create hadoop-network

# 查看docker network ls

四、MySQL 部署

1)mysql 镜像

docker pull  mysql:5.7
docker tag  mysql:5.7 registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/mysql:5.7

2)配置

mkdir -p conf/ data/db/

cat>conf/my.cnf<<EOF
[mysqld]
character-set-server=utf8
log-bin=mysql-bin
server-id=1
pid-file        = /var/run/mysqld/mysqld.pid
socket          = /var/run/mysqld/mysqld.sock
datadir         = /var/lib/mysql
sql_mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
symbolic-links=0
secure_file_priv =
wait_timeout=120
interactive_timeout=120
default-time_zone = '+8:00'
skip-external-locking
skip-name-resolve
open_files_limit = 10240
max_connections = 1000
max_connect_errors = 6000
table_open_cache = 800
max_allowed_packet = 40m
sort_buffer_size = 2M
join_buffer_size = 1M
thread_cache_size = 32
query_cache_size = 64M
transaction_isolation = READ-COMMITTED
tmp_table_size = 128M
max_heap_table_size = 128M
log-bin = mysql-bin
sync-binlog = 1
binlog_format = ROW
binlog_cache_size = 1M
key_buffer_size = 128M
read_buffer_size = 2M
read_rnd_buffer_size = 4M
bulk_insert_buffer_size = 64M
lower_case_table_names = 1
explicit_defaults_for_timestamp=true
skip_name_resolve = ON
event_scheduler = ON
log_bin_trust_function_creators = 1
innodb_buffer_pool_size = 512M
innodb_flush_log_at_trx_commit = 1
innodb_file_per_table = 1
innodb_log_buffer_size = 4M
innodb_log_file_size = 256M
innodb_max_dirty_pages_pct = 90
innodb_read_io_threads = 4
innodb_write_io_threads = 4
EOF

3)编排

version: '3'
services:
  db:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/mysql:5.7 #mysql版本
    container_name: mysql
    hostname: mysql
    volumes:
      - ./data/db:/var/lib/mysql
      - ./conf/my.cnf:/etc/mysql/mysql.conf.d/mysqld.cnf
    restart: always
    ports:
      - 13306:3306
    networks:
      - hadoop-network
    environment:
      MYSQL_ROOT_PASSWORD: 123456#访问密码
      secure_file_priv:
    healthcheck:
      test: ["CMD-SHELL", "curl -I localhost:3306 || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3# 连接外部网络
networks:
  hadoop-network:
    external: true

4)部署 mysql

docker-compose -f mysql-compose.yaml up -d
docker-compose -f mysql-compose.yaml ps# 登录容器
mysql -uroot -p123456

在这里插入图片描述

四、Hive 部署

1)下载 hive

下载地址:http://archive.apache.org/dist/hive

# 下载wget http://archive.apache.org/dist/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz

# 解压tar -zxvf apache-hive-3.1.3-bin.tar.gz

2)配置

images/hive-config/hive-site.xml
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><!-- 配置hdfs存储目录 --><property><name>hive.metastore.warehouse.dir</name><value>/user/hive_remote/warehouse</value></property><property><name>hive.metastore.local</name><value>false</value></property><!-- 所连接的 MySQL 数据库的地址,hive_local是数据库,程序会自动创建,自定义就行 --><property><name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://mysql:3306/hive_metastore?createDatabaseIfNotExist=true&amp;useSSL=false&amp;serverTimezone=Asia/Shanghai</value></property><!-- MySQL 驱动 --><property><name>javax.jdo.option.ConnectionDriverName</name><!--<value>com.mysql.cj.jdbc.Driver</value>--><value>com.mysql.jdbc.Driver</value></property><!-- mysql连接用户 --><property><name>javax.jdo.option.ConnectionUserName</name><value>root</value></property><!-- mysql连接密码 --><property><name>javax.jdo.option.ConnectionPassword</name><value>123456</value></property><!--元数据是否校验--><property><name>hive.metastore.schema.verification</name><value>false</value></property><property><name>system:user.name</name><value>root</value><description>user name</description></property><property><name>hive.metastore.uris</name><value>thrift://hive-metastore:9083</value></property><!-- host --><property><name>hive.server2.thrift.bind.host</name><value>0.0.0.0</value><description>Bind host on which to run the HiveServer2 Thrift service.</description></property><!-- hs2端口 默认是10000--><property><name>hive.server2.thrift.port</name><value>10000</value></property><property><name>hive.server2.active.passive.ha.enable</name><value>true</value></property></configuration>

3)启动脚本

#!/usr/bin/env shwait_for(){echo Waiting for$1 to listen on $2...
    while!nc -z $1$2;doecho waiting...;sleep 1s;done}start_hdfs_namenode(){if[! -f /tmp/namenode-formated ];then${HADOOP_HOME}/bin/hdfs namenode -format >/tmp/namenode-formated
        fi${HADOOP_HOME}/bin/hdfs --loglevel INFO --daemon start namenode

        tail -f ${HADOOP_HOME}/logs/*namenode*.log
}start_hdfs_datanode(){

        wait_for $1$2${HADOOP_HOME}/bin/hdfs --loglevel INFO --daemon start datanode

        tail -f ${HADOOP_HOME}/logs/*datanode*.log
}start_yarn_resourcemanager(){${HADOOP_HOME}/bin/yarn --loglevel INFO --daemon start resourcemanager

        tail -f ${HADOOP_HOME}/logs/*resourcemanager*.log
}start_yarn_nodemanager(){

        wait_for $1$2${HADOOP_HOME}/bin/yarn --loglevel INFO --daemon start nodemanager

        tail -f ${HADOOP_HOME}/logs/*nodemanager*.log
}start_yarn_proxyserver(){

        wait_for $1$2${HADOOP_HOME}/bin/yarn --loglevel INFO --daemon start proxyserver

        tail -f ${HADOOP_HOME}/logs/*proxyserver*.log
}start_mr_historyserver(){

        wait_for $1$2${HADOOP_HOME}/bin/mapred --loglevel INFO  --daemon  start historyserver

        tail -f ${HADOOP_HOME}/logs/*historyserver*.log
}start_hive_metastore(){if[! -f ${HIVE_HOME}/formated ];then
                schematool -initSchema -dbType mysql --verbose >${HIVE_HOME}/formated
        fi$HIVE_HOME/bin/hive --service metastore

}start_hive_hiveserver2(){$HIVE_HOME/bin/hive --service hiveserver2
}case$1in
        hadoop-hdfs-nn)
                start_hdfs_namenode
                ;;
        hadoop-hdfs-dn)
                start_hdfs_datanode $2$3;;
        hadoop-yarn-rm)
                start_yarn_resourcemanager
                ;;
        hadoop-yarn-nm)
                start_yarn_nodemanager $2$3;;
        hadoop-yarn-proxyserver)
                start_yarn_proxyserver $2$3;;
        hadoop-mr-historyserver)
                start_mr_historyserver $2$3;;
        hive-metastore)
                start_hive_metastore $2$3;;
        hive-hiveserver2)
                start_hive_hiveserver2 $2$3;;
        *)echo"请输入正确的服务启动命令~";;esac

4)构建镜像 Dockerfile

FROM registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop:v1

COPY hive-config/* ${HIVE_HOME}/conf/

COPY bootstrap.sh /opt/apache/

COPY mysql-connector-java-5.1.49/mysql-connector-java-5.1.49-bin.jar ${HIVE_HOME}/lib/

RUN sudomkdir -p /home/hadoop/ &&sudochown -R hadoop:hadoop /home/hadoop/

#RUN yum -y install which

开始构建镜像

# 构建镜像docker build -t registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1 . --no-cache

# 推送镜像(可选)docker push registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1

### 参数解释# -t:指定镜像名称# . :当前目录Dockerfile# -f:指定Dockerfile路径#  --no-cache:不缓存

5)编排

version: '3'
services:
  hadoop-hdfs-nn:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-nn
    hostname: hadoop-hdfs-nn
    restart: always
    privileged: true
    env_file:
      - .env
    ports:
      - "30070:${HADOOP_HDFS_NN_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-nn"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_NN_PORT} || exit 1"]
      interval: 20s
      timeout: 20s
      retries: 3
  hadoop-hdfs-dn-0:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-dn-0
    hostname: hadoop-hdfs-dn-0
    restart: always
    depends_on:
      - hadoop-hdfs-nn
    env_file:
      - .env
    ports:
      - "30864:${HADOOP_HDFS_DN_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-dn hadoop-hdfs-nn ${HADOOP_HDFS_NN_PORT}"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_DN_PORT} || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 3
  hadoop-hdfs-dn-1:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-dn-1
    hostname: hadoop-hdfs-dn-1
    restart: always
    depends_on:
      - hadoop-hdfs-nn
    env_file:
      - .env
    ports:
      - "30865:${HADOOP_HDFS_DN_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-dn hadoop-hdfs-nn ${HADOOP_HDFS_NN_PORT}"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_DN_PORT} || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 3
  hadoop-hdfs-dn-2:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hadoop-hdfs-dn-2
    hostname: hadoop-hdfs-dn-2
    restart: always
    depends_on:
      - hadoop-hdfs-nn
    env_file:
      - .env
    ports:
      - "30866:${HADOOP_HDFS_DN_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-hdfs-dn hadoop-hdfs-nn ${HADOOP_HDFS_NN_PORT}"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_HDFS_DN_PORT} || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 3
  hadoop-yarn-rm:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-rm
    hostname: hadoop-yarn-rm
    restart: always
    env_file:
      - .env
    ports:
      - "30888:${HADOOP_YARN_RM_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-rm"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_YARN_RM_PORT} || exit 1"]
      interval: 20s
      timeout: 20s
      retries: 3
  hadoop-yarn-nm-0:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-nm-0
    hostname: hadoop-yarn-nm-0
    restart: always
    depends_on:
      - hadoop-yarn-rm
    env_file:
      - .env
    ports:
      - "30042:${HADOOP_YARN_NM_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-nm hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_YARN_NM_PORT} || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 3
  hadoop-yarn-nm-1:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-nm-1
    hostname: hadoop-yarn-nm-1
    restart: always
    depends_on:
      - hadoop-yarn-rm
    env_file:
      - .env
    ports:
      - "30043:${HADOOP_YARN_NM_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-nm hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_YARN_NM_PORT} || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 3
  hadoop-yarn-nm-2:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-nm-2
    hostname: hadoop-yarn-nm-2
    restart: always
    depends_on:
      - hadoop-yarn-rm
    env_file:
      - .env
    ports:
      - "30044:${HADOOP_YARN_NM_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-nm hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "curl --fail http://localhost:${HADOOP_YARN_NM_PORT} || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 3
  hadoop-yarn-proxyserver:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hadoop-yarn-proxyserver
    hostname: hadoop-yarn-proxyserver
    restart: always
    depends_on:
      - hadoop-yarn-rm
    env_file:
      - .env
    ports:
      - "30911:${HADOOP_YARN_PROXYSERVER_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-yarn-proxyserver hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_YARN_PROXYSERVER_PORT} || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 3
  hadoop-mr-historyserver:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hadoop-mr-historyserver
    hostname: hadoop-mr-historyserver
    restart: always
    depends_on:
      - hadoop-yarn-rm
    env_file:
      - .env
    ports:
      - "31988:${HADOOP_MR_HISTORYSERVER_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hadoop-mr-historyserver hadoop-yarn-rm ${HADOOP_YARN_RM_PORT}"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HADOOP_MR_HISTORYSERVER_PORT} || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 3
  hive-metastore:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hive-metastore
    hostname: hive-metastore
    restart: always
    depends_on:
      - hadoop-hdfs-dn-2
    env_file:
      - .env
    ports:
      - "30983:${HIVE_METASTORE_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hive-metastore hadoop-hdfs-dn-2 ${HADOOP_HDFS_DN_PORT}"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HIVE_METASTORE_PORT} || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 5
  hive-hiveserver2:
    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1
    user: "hadoop:hadoop"
    container_name: hive-hiveserver2
    hostname: hive-hiveserver2
    restart: always
    depends_on:
      - hive-metastore
    env_file:
      - .env
    ports:
      - "31000:${HIVE_HIVESERVER2_PORT}"
    command: ["sh","-c","/opt/apache/bootstrap.sh hive-hiveserver2 hive-metastore ${HIVE_METASTORE_PORT}"]
    networks:
      - hadoop-network
    healthcheck:
      test: ["CMD-SHELL", "netstat -tnlp|grep :${HIVE_HIVESERVER2_PORT} || exit 1"]
      interval: 30s
      timeout: 30s
      retries: 5# 连接外部网络
networks:
  hadoop-network:
    external: true

6)开始部署

docker-compose -f docker-compose.yaml up -d

# 查看docker-compose -f docker-compose.yaml ps

在这里插入图片描述
简单测试验证
在这里插入图片描述

【问题】如果出现以下类似的错误,是因为多次启动,之前的数据还在,但是datanode的IP是已经变了的(宿主机部署就不会有这样的问题,因为宿主机的IP是固定的),所以需要刷新节点,当然也可清理之前的旧数据,不推荐清理旧数据,推荐使用刷新节点的方式(如果有对外挂载的情况下,像我这里没有对外挂载,是因为之前旧容器还在,下面有几种解决方案):

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(172.30.0.12:9866, datanodeUuid=f8188476-4a88-4cd6-836f-769d510929e4, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-f998d368-222c-4a9a-88a5-85497a82dcac;nsid=1840040096;c=1680661390829)

在这里插入图片描述
【解决方案】

  1. 删除旧容器重启启动
# 清理旧容器dockerrm`dockerps -a|grep'Exited'|awk'{print $1}'`# 重启启动服务docker-compose -f docker-compose.yaml up -d

# 查看docker-compose -f docker-compose.yaml ps
  1. 登录 namenode 刷新 datanode
dockerexec -it hadoop-hdfs-nn hdfs dfsadmin -refreshNodes
  1. 登录 任意节点刷新 datanode
# 这里以 hadoop-hdfs-dn-0 为例dockerexec -it hadoop-hdfs-dn-0 hdfs dfsadmin -fs hdfs://hadoop-hdfs-nn:9000 -refreshNodes

到此,Hive 的容器化部署就完成了,有任何疑问的小伙伴欢迎给我留言,后续会持续更新相关技术文章,也可关注我的公众号【大数据与云原生技术分享】深入交流技术或私信咨询问题~

标签: docker hive hadoop

本文转载自: https://blog.csdn.net/qq_35745940/article/details/129908728
版权归原作者 大数据老司机 所有, 如有侵权,请联系我们删除。

“通过 docker-compose 快速部署 Hive 详细教程”的评论:

还没有评论