0


Hive-4.0.1版本部署文档

1. 前置要求

  • 操作系统:建议使用 CentOS 7 或 Ubuntu 20.04(本试验使用的是CentOS Linux release 7.9.2009 (Core))
  • Java 环境:建议安装 Java 8 或更高版本。
  • Hadoop:Hive 需要依赖 Hadoop 进行分布式存储,建议安装 Hadoop 3.x 版本(本实验采用的是hadoop3.3.6)。
  • 数据库:Hive Metastore 需要数据库支持,建议使用 MySQL、PostgreSQL 或 Oracle。本实验采用的是MySQL 。
  • 本服务器IP为192.168.128.130

2. 下载与解压 Hive

  1. 下载 Hive 4.0.1 版本的 tar 文件:wget https://downloads.apache.org/hive/hive-4.0.1/apache-hive-4.0.1-bin.tar.gz
  2. 解压文件并移动到合适的安装路径:tar -zxvf apache-hive-4.0.1-bin.tar.gzmv apache-hive-4.0.1-bin /opt/hive
  3. 设置环境变量,在 ~/.bashrc 文件中添加以下行:exportHIVE_HOME=/opt/hiveexportPATH=$PATH:$HIVE_HOME/bin然后使用 source ~/.bashrc 使其生效。

3. 配置 Hive Metastore 数据库

  1. 创建 Hive 的元数据库。以下为 MySQL 配置的示例(安装数据库请参考别的文档):- 启动 MySQL 并登录:mysql -u root -p- 创建数据库:CREATEDATABASE hive_metastore;- 创建用户并授权:CREATEUSER'hive'@'%' IDENTIFIED BY'Hive_123456';GRANTALLPRIVILEGESON hive_metastore.*TO'hive'@'%';FLUSH PRIVILEGES;
  2. 在 Hive 配置中设置数据库连接信息:- 编辑 hive-site.xml 文件,路径为 $HIVE_HOME/conf/hive-site.xml
<configuration>
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
    <description>Location of default Hive warehouse where managed tables are stored.</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://192.168.128.130:3306/hive?createDatabaseIfNotExist=true</value>
    <description>JDBC connection URL to connect to the Hive Metastore database, here with MySQL as the backend database.</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>JDBC driver class name for connecting to the Hive Metastore database.</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
    <description>Username for connecting to the Hive Metastore database.</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>Hive_123456</value>
    <description>Password for connecting to the Hive Metastore database.</description>
  </property>

  <property>
    <name>datanucleus.schema.autoCreateAll</name>
    <value>true</value>
    <description>When set to true, DataNucleus will automatically create tables and columns if they do not already exist in the schema.</description>
  </property>

  <property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
    <description>Disables schema verification, allowing automatic updates of the Metastore schema without manual intervention.</description>
  </property>

  <property>
    <name>hive.server2.enable.doAs</name>
    <value>true</value>
    <description>Enables HiveServer2 to execute queries as the user who submitted the query, rather than the HiveServer2 service user.</description>
  </property>

  <property>
    <name>hive.server2.authentication</name>
    <value>NONE</value>
    <description>Specifies the authentication mode for HiveServer2 connections. Options include NONE, KERBEROS, LDAP, PAM, and CUSTOM.</description>
  </property>
</configuration>
  • 确保数据库驱动已放置在 $HIVE_HOME/lib 目录下:cp /path/to/mysql-connector-java.jar $HIVE_HOME/lib/

4. 初始化 Metastore

使用以下命令初始化 Hive 元数据:

schematool -initSchema -dbType mysql

5. 启动 Hiveserver2

由于4.0.1版本已经废弃hive CLI,所以只能通过beeline连接,上述配置是允许使用未知用户连接

hive --service hiveserver2 &
  • 查看10000端口是否启动成功

6.配置匿名用户登录

修改core-site.xml

<configuration><property><name>fs.defaultFS</name><value>hdfs://master:8020</value></property><property><name>hadoop.tmp.dir</name><value>/var/log/hadoop/tmp</value></property><property><name>hadoop.proxyuser.root.hosts</name><value>*</value></property><property><name>hadoop.proxyuser.root.groups</name><value>*</value></property></configuration>

7. 验证部署

beeline -u jdbc:hive2://192.168.128.130:10000 -n root
[root@master opt]# beeline  -u jdbc:hive2://192.168.128.130:10000 -n root
Connecting to jdbc:hive2://192.168.128.130:10000
Connected to: Apache Hive (version 4.0.1)
Driver: Hive JDBC (version 4.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 4.0.1 by Apache Hive
0: jdbc:hive2://192.168.128.130:10000> create database test1;
INFO  : Compiling command(queryId=root_20241029145312_c6b5e83b-f5a7-488b-b2ca-b3ef3336298a): create database test1
INFO  : Semantic Analysis Completed (retrial =false)
INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=root_20241029145312_c6b5e83b-f5a7-488b-b2ca-b3ef3336298a); Time taken: 2.054 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=root_20241029145312_c6b5e83b-f5a7-488b-b2ca-b3ef3336298a): create database test1
INFO  : Starting task [Stage-0:DDL]in serial mode
INFO  : Completed executing command(queryId=root_20241029145312_c6b5e83b-f5a7-488b-b2ca-b3ef3336298a); Time taken: 0.169 seconds
No rows affected (2.721 seconds)0: jdbc:hive2://192.168.128.130:10000> show databases;
INFO  : Compiling command(queryId=root_20241029145320_63834d7a-1027-4ca4-933e-927dcccbebb8): show databases
INFO  : Semantic Analysis Completed (retrial =false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=root_20241029145320_63834d7a-1027-4ca4-933e-927dcccbebb8); Time taken: 0.236 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=root_20241029145320_63834d7a-1027-4ca4-933e-927dcccbebb8): show databases
INFO  : Starting task [Stage-0:DDL]in serial mode
INFO  : Completed executing command(queryId=root_20241029145320_63834d7a-1027-4ca4-933e-927dcccbebb8); Time taken: 0.11 seconds
+----------------+
| database_name  |
+----------------+
| default        || test1          |
+----------------+
2 rows selected (0.605 seconds)0: jdbc:hive2://192.168.128.130:10000>

7. 两种连接方式

  • 通过hive命令进行连接
[root@master opt]# hive
Beeline version 4.0.1 by Apache Hive
beeline>!connect jdbc:hive2://192.168.128.130:10000 -n root
Connecting to jdbc:hive2://192.168.128.130:10000
Connected to: Apache Hive (version 4.0.1)
Driver: Hive JDBC (version 4.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://192.168.128.130:10000>!quit
Closing: 0: jdbc:hive2://192.168.128.130:10000
  • 通过beeline命令直接连接
[root@master opt]# beeline  -u jdbc:hive2://192.168.128.130:10000 -n root
Connecting to jdbc:hive2://192.168.128.130:10000
Connected to: Apache Hive (version 4.0.1)
Driver: Hive JDBC (version 4.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 4.0.1 by Apache Hive
0: jdbc:hive2://192.168.128.130:10000>

本文转载自: https://blog.csdn.net/xhcx_25/article/details/143328896
版权归原作者 xhcx_25 所有, 如有侵权,请联系我们删除。

“Hive-4.0.1版本部署文档”的评论:

还没有评论