环境说明:
主机名:cmcc01为例
操作系统:centos7
安装部署软件版本部署方式centos7zookeeperzookeeper-3.4.10伪分布式hadoophadoop-3.1.3伪分布式hivehive-3.1.3-bin伪分布式clickhouse21.11.10.1-2单节点多实例dolphinscheduler3.0.0单节点kettlepdi-ce-9.3.0.0单节点sqoopsqoop-1.4.7单节点seatunnelseatunnel-incubating-2.1.2单节点sparkspark-2.4.8单节点
整合mysql+hive
1. 下载kettle
官网:https://sourceforge.net/projects/pentaho/files/
2.解压
unzip /opt/package/pdi-ce-9.3.0.0-428.zip -d /opt/software/
3、配置java环境变量
vim ~/.bash_profile
# 添加以下内容
# JDK
export JAVA_HOME=/opt/software/jdk1.8.0_321
export PATH=$PATH:${JAVA_HOME}/bin
使配置生效
source /etc/profile
4.给同组用户赋权
chmod g+x /opt/software/data-integration/kitchen.sh
5.执行命令
[root@cmcc01 data-integration]#
[root@cmcc01 data-integration]#
[root@cmcc01 data-integration]# ./kitchen.sh
#######################################################################
WARNING: no libwebkitgtk-1.0 detected, some features will be unavailable
Consider installing the package with apt-get or yum.
e.g. 'sudo apt-get install libwebkitgtk-1.0-0'
#######################################################################
Options:
-rep = Repository name
-user = Repository username
-trustuser = !Kitchen.ComdLine.RepUsername!
-pass = Repository password
-job = The name of the job to launch
-dir = The directory (dont forget the leading /)
-file = The filename (Job XML) to launch
-level = The logging level (Basic, Detailed, Debug, Rowlevel, Error, Minimal, Nothing)
-logfile = The logging file to write to
-listdir = List the directories in the repository
-listjobs = List the jobs in the specified directory
-listrep = List the available repositories
-norep = Do not log into the repository
-version = show the version, revision and build date
-param = Set a named parameter <NAME>=<VALUE>. For example -param:FILE=customers.csv
-listparam = List information concerning the defined parameters in the specified job.
-export = Exports all linked resources of the specified job. The argument is the name of a ZIP file.
-custom = Set a custom plugin specific option as a String value in the job using <NAME>=<Value>, for example: -custom:COLOR=Red
-maxloglines = The maximum number of log lines that are kept internally by Kettle. Set to 0 to keep all rows (default)
-maxlogtimeout = The maximum age (in minutes) of a log line while being kept internally by Kettle. Set to 0 to keep all rows indefinitely (default)
[root@cmcc01 data-integration]#
[root@cmcc01 data-integration]#
此处有告警
6.解决告警
wget ftp://ftp.pbone.net/mirror/ftp5.gwdg.de/pub/opensuse/repositories/home:/matthewdva:/build:/EPEL:/el7/RHEL_7/x86_64/webkitgtk-2.4.9-1.el7.x86_64.rpm
yum -y install webkitgtk-2.4.9-1.el7.x86_64.rpm
# 再次执行命令,告警消除
[root@cmcc01 package]#
[root@cmcc01 package]# /opt/software/data-integration/kitchen.sh
Options:
-rep = Repository name
-user = Repository username
-trustuser = !Kitchen.ComdLine.RepUsername!
-pass = Repository password
-job = The name of the job to launch
-dir = The directory (dont forget the leading /)
-file = The filename (Job XML) to launch
-level = The logging level (Basic, Detailed, Debug, Rowlevel, Error, Minimal, Nothing)
-logfile = The logging file to write to
-listdir = List the directories in the repository
-listjobs = List the jobs in the specified directory
-listrep = List the available repositories
-norep = Do not log into the repository
-version = show the version, revision and build date
-param = Set a named parameter <NAME>=<VALUE>. For example -param:FILE=customers.csv
-listparam = List information concerning the defined parameters in the specified job.
-export = Exports all linked resources of the specified job. The argument is the name of a ZIP file.
-custom = Set a custom plugin specific option as a String value in the job using <NAME>=<Value>, for example: -custom:COLOR=Red
-maxloglines = The maximum number of log lines that are kept internally by Kettle. Set to 0 to keep all rows (default)
-maxlogtimeout = The maximum age (in minutes) of a log line while being kept internally by Kettle. Set to 0 to keep all rows indefinitely (default)
[root@cmcc01 package]#
[root@cmcc01 package]#
7.测试
# 执行转换
# 编写测试转换,执行如下命令即可
/opt/software/data-integration/pan.sh -file=/opt/kettle-spoon/ktr/test/test1.ktr log=test1.log
# 执行job
/opt/software/data-integration/kitchen.sh -file=/opt/kettle-spoon/ktr/test/SechuldUpdate.kjb log=timeLogUpdate.log
8.kettle整合mysql
此时当前用户下会多一个文件: ~/.kettle/kettle.properties
如果没有可自行创建
(1). 设置MySQL连接信息:
vim ~/.kettle/kettle.properties
添加以下内容:
##MYSQL
MYSQL_HOST=localhost
MYSQL_DB_PORT=3306
MYSQL_DB_USER=root
MYSQL_DB_PASSWORD=123qwe
MYSQL_DB_NAME=flinkcdc
(2)复制驱动到data-integration/lib下
cp /opt/package/mysql-connector-java-8.0.20.jar /opt/software/data-integration/lib
(3)创建数据连接测试(在windows安装的kettle上操作)
(4)创建job kettle_job_test.kjb
(5)上传job执行
# 运行job
/opt/software/data-integration/kitchen.sh -file=/opt/package/kettle_job_test.kjb
9.kettle整合hive
# 创建hive jar包软连接
ln -s /opt/software/hive-3.1.3-bin/lib/*.jar /opt/software/data-integration/lib
可能会报错:File exists,可忽略
创建job测试(在windows安装的kettle上操作)
执行job
/opt/software/data-integration/kitchen.sh -file=/opt/package/kettle_job_hive_test.kjb
版权归原作者 Toroidals 所有, 如有侵权,请联系我们删除。