Apache Kyuubi入门与使用

1 安装kyuubi

当前最新版本：1.6.1

wget http://mirrors.ustc.edu.cn/apache/kyuubi/kyuubi-1.6.1-incubating/apache-kyuubi-1.6.1-incubating-bin.tgz

解压缩到指定目录：

tar -zxvf apache-kyuubi-1.6.1-incubating-bin.tgz -C ~/softwares

准备环境：

cd$KYUUBI_HOME/conf
cp kyuubi-env.sh.template kyuubi-env.sh
cp kyuubi-defaults.conf.template kyuubi-defaults.conf

将kyuubi地址设置为localhost,如果不打开该注释，那么使用localhost是无法连接的，需要填写主机的ip地址

kyuubi.fronted.bind.host   localhost

2 配置引擎

2.1 flink引擎

进入kyuubi配置文件目录

cd$KYUUBI_HOME/conf

编辑

kyuubi-defaults.conf

，设置参数，设置flink作为执行引擎

kyuubi.engine.type FLINK_SQL

(这一步是可选的，因为可以在jdbc url中指定该配置动态改变引擎)

在

kyuubi-env.sh

配置

FLINK_HOME

和

FLINK_HADOOP_CLASSPATH

,要求flink版本大于1.14,

FLINK_HADOOP_CLASSPATH

官方文档里没有说明要写，但是如果不写会无法连接使用flink engine,具体的版本号可自行修改为自己的hadoop平台版本号

exportFLINK_HOME=/home/xcchen/software/flink/flink-1.15.3
exportFLINK_HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/client/hadoop-client-runtime-3.3.0.jar:${HADOOP_HOME}/share/hadoop/client/hadoop-client-api-3.3.0.jar

2.2 spark引擎

在

kyuubi-env.sh

配置

SPARK_HOME

，要求spark版本大于3.0.0

# 调整为自己的spark安装路径exportSPARK_HOME=/home/xcchen/softwares/spark/spark-3.1.1

3 启动kyuubi

cd$KYUUBI_HOME
bin/kyuubi start

4 Flink测试使用

为了测试方便，这里使用flink standalone集群运行任务

cd$FLINK_HOME&& bin/start-cluster.sh

使用kyuubi自带的beeline客户端工具

cd$KYUUBI_HOME
bin/beeline -u 'jdbc:hive2://localhost:10009/' -n xcchen

如果你没有在配置文件中设置flink引擎

cd$KYUUBI_HOME
bin/beeline -u 'jdbc:hive2://localhost:10009/?kyuubi.engine.type=FLINK_SQL' -n xcchen

4.1 提交最基础的SQL任务

将下面三条sql一条条的执行

CREATETABLE orders (
    order_number BIGINT,
    price        DECIMAL(32,2),
    buyer        ROW<first_name STRING, last_name STRING>,
    order_time   TIMESTAMP(3))WITH('connector'='datagen','rows-per-second'='1');CREATETABLE target (
    order_number BIGINT,
    price        DECIMAL(32,2),
    buyer        ROW<first_name STRING, last_name STRING>,
    order_time   TIMESTAMP(3))WITH('connector'='print');INSERTINTO target SELECT*FROM orders;

问题：如果提交的是流式任务，客户端会一直阻塞
解决：临时的解决办法是修改代码重新编译，查看这条issue：https://github.com/apache/kyuubi/issues/4446。后续随着kyuubi的迭代会解决这个问题

4.2 交互式查询

select*from orders;

CREATETABLE orders_b (
    order_number BIGINT,
    price        DECIMAL(32,2),
    buyer        ROW<first_name STRING, last_name STRING>,
    order_time   TIMESTAMP(3))WITH('connector'='datagen','rows-per-second'='1','number-of-rows'='20');select*from orders_b;

如果表是有界的，那么就可以打印出结果

如果表是无界的，那么会一直阻塞，等待任务结束才会打印结果

4.3 对接使用hive metastore

使用

create catalog

和

use catalog

语法进行使用，需要将hive连接器放在

$FLINK_HOME/lib

目录下

无法实现默认加载hive metastore

4.4 总结

kyuubi目前版本对于flink支持的不是很好，对于批作业的支持是可以的，但是对于流式作业是不太友好的。

缺点：

对于无界表的select操作会拿不到返回值，即无法实现流式的交互式查询
对于流式任务的提交会一直阻塞(flink1.15版本会这样，1.14版本社区人员说不会这样实际测试下来也会)
流式的select操作会存在重复记录的问题，具体可以参考这条issue：https://github.com/apache/kyuubi/issues/4083

5 Spark测试使用

为了测试方便，这里使用spark standalone集群

cd$SPARK_HOME
sbin/start-all.sh

还需要开启hdfs

start-dfs.sh

使用kyuubi自带的beeline客户端工具

cd$KYUUBI_HOME
bin/beeline -u 'jdbc:hive2://localhost:10009/' -n xcchen

5.1 提交最基础的SQL任务

准备数据文件

echo'id,name\n1,zhangsan\n2,lisi'> /tmp/spark-kyuubi-source-test.csv

-- source tablecreatetemporaryview s using csv options (path 'file:/tmp/spark-kyuubi-source-test.csv',header "true");insert overwrite directory using csv options (path 'file:/tmp/spark-kyuubi-target')select*from s;

5.2 交互式查询

基础查询：

-- basic queryselecttimestamp'2023-03-03';

读取文件：

-- select from csv tableselect*from s;

0: jdbc:hive2://localhost:10009/>select * from s;
+-----+-----------+
|id|   name    |
+-----+-----------+
|1| zhangsan  ||2| lisi      |
+-----+-----------+
2 rows selected (0.072 seconds)

5.3 提交带有依赖的任务

准备mysql数据

dockerexec -it mysql bash 
mysql -uroot -p'xxx'

mysql中建表建库

createdatabase test defaultcharacterset utf8mb4;createuser'kyuubi'@'%' identified by'Kyuubi123~';grantallprivilegeson test.*to'kyuubi'@'%'withgrantoption;
flush privileges;use test;createtable t1 (id intprimarykey,name varchar(20));insertinto t1 values(1,'zhangsan');

进入beeline交互式命令行：

add jar '/home/xcchen/softwares/kyuubi/mysql-connector-java-8.0.22.jar';createtemporaryview m_test 
using org.apache.spark.sql.jdbc
options (url 'jdbc:mysql://localhost:3306/test',dbTable 't1',user'kyuubi',password 'Kyuubi123~');select*from m_test;

添加的jar包是租户级别的，退出命令行后，重新打开命令行建立jdbc连接的时候不需要再添加依赖jar包

5.4 对接使用hive metastore

启动hive metastore

hive --service metastore

配置方式:

在jdbc url中填写hms地址，这是最简单的方式：

bin/beeline -u 'jdbc:hive2://localhost:10009/;#hive.metastore.uris=thrift://localhost:9083' -n xcchen

修改$KYUUBI_HOME/conf/kyuubi-defaults.conf，添加spark.hive.metastore.uris配置项

默认的spark元数据信息只保存在会话级别，当我们在第一个命令行窗口开启会话a并且创建一系列表后，新开一个窗口开启会话b时，使用

show tables

会发现没有表存在。说明默认的spark元数据信息是会话级别的，如果使用了metastore那就另说

6 jdbc使用kyuubi测试

参考git项目地址：http://192.168.118.128:81/xcchen/kyuubi-jdbc-examples#

7 认证

kyuubi的认证只是针对使用kyuubi本身，作用域仅仅是kyuubi这个组件！

kyuubi支持多种验证方式

7.1 JDBC认证

下面我们以mysql来进行测试

拷贝mysql驱动到

$KYUUBI_HOME/jars

cp mysql-connector-java-8.0.22.jar $KYUUBI_HOME/jars

准备mysql表

createdatabase test defaultcharacterset utf8mb4;createtable kyuubi_user (`user`varchar(50)primarykey, password varchar(50));insertinto kyuubi_user values('xcchen',md5(concat('1qazZSE$','xcchen')));

配置

kyuubi-defaults.conf

，设置jdbc认证参数：

kyuubi.authentication=JDBC
kyuubi.authentication.jdbc.driver.class = com.mysql.jdbc.Driver
kyuubi.authentication.jdbc.url = jdbc:mysql://localhost:3306/test
kyuubi.authentication.jdbc.user = root
kyuubi.authentication.jdbc.password = root
kyuubi.authentication.jdbc.query = SELECT 1 FROM kyuubi_user WHERE user=${user} AND password=MD5(CONCAT('1qazZSE$',${password}))

使用beeline连接时填写用户名和密码（其中

-n

代表用户名，

-p

代码密码）：

bin/beeline -u 'jdbc:hive2://localhost:10009/?kyuubi.engine.type=FLINK_SQL' -n xcchen -p xcchen

如果用户名密码未填写正确，报错信息如下：

Connecting to jdbc:hive2://localhost:10009/?kyuubi.engine.type=FLINK_SQL
Unknown HS2 problem when communicating with Thrift server.
Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10009/?kyuubi.engine.type=FLINK_SQL: Peer indicated failure: Error validating the login (state=08S01,code=0)
Beeline version 1.6.1-incubating by Apache Kyuubi (Incubating)

Tips:如果认证配置不生效，请仔细检查query语句和你的mysql表是否对应上了，md5的salt值是否使用正确.

7.2 kerberos认证（推荐的方式）

8 授权

多租户

flink任务没有这个概念，就算使用相同的用户，他们的环境也是隔离的

spark任务相同用户，他们的会话也是不一样的

标签： hadoop 大数据 spark

本文转载自: https://blog.csdn.net/qq_41463207/article/details/129408434
版权归原作者 迷失的Flink民工 所有，如有侵权，请联系我们删除。