0


CentOS7部署kettle9.3.0并部署自服器远程提交任务

环境说明:

主机名:cmcc01为例

操作系统:centos7

kettle版本:9.3.0

1. 下载kettle

官网:https://sourceforge.net/projects/pentaho/files/

2.解压

unzip /opt/package/pdi-ce-9.3.0.0-428.zip -d /opt/software/

3、配置java环境变量

vim ~/.bash_profile
# 添加以下内容

# JDK
export JAVA_HOME=/opt/software/jdk1.8.0_321
export PATH=$PATH:${JAVA_HOME}/bin

使配置生效

source /etc/profile

4.给同组用户赋权

chmod g+x /opt/software/data-integration/kitchen.sh

5.执行命令

[root@cmcc01 data-integration]#
[root@cmcc01 data-integration]#
[root@cmcc01 data-integration]# ./kitchen.sh
#######################################################################
WARNING:  no libwebkitgtk-1.0 detected, some features will be unavailable
    Consider installing the package with apt-get or yum.
    e.g. 'sudo apt-get install libwebkitgtk-1.0-0'
#######################################################################

Options:
  -rep            = Repository name
  -user           = Repository username
  -trustuser      = !Kitchen.ComdLine.RepUsername!
  -pass           = Repository password
  -job            = The name of the job to launch
  -dir            = The directory (dont forget the leading /)
  -file           = The filename (Job XML) to launch
  -level          = The logging level (Basic, Detailed, Debug, Rowlevel, Error, Minimal, Nothing)
  -logfile        = The logging file to write to
  -listdir        = List the directories in the repository
  -listjobs       = List the jobs in the specified directory
  -listrep        = List the available repositories
  -norep          = Do not log into the repository
  -version        = show the version, revision and build date
  -param          = Set a named parameter <NAME>=<VALUE>. For example -param:FILE=customers.csv
  -listparam      = List information concerning the defined parameters in the specified job.
  -export         = Exports all linked resources of the specified job. The argument is the name of a ZIP file.
  -custom         = Set a custom plugin specific option as a String value in the job using <NAME>=<Value>, for example: -custom:COLOR=Red
  -maxloglines    = The maximum number of log lines that are kept internally by Kettle. Set to 0 to keep all rows (default)
  -maxlogtimeout  = The maximum age (in minutes) of a log line while being kept internally by Kettle. Set to 0 to keep all rows indefinitely (default)

[root@cmcc01 data-integration]#
[root@cmcc01 data-integration]#

此处有告警

6.解决告警

wget ftp://ftp.pbone.net/mirror/ftp5.gwdg.de/pub/opensuse/repositories/home:/matthewdva:/build:/EPEL:/el7/RHEL_7/x86_64/webkitgtk-2.4.9-1.el7.x86_64.rpm
yum -y install webkitgtk-2.4.9-1.el7.x86_64.rpm

# 再次执行命令,告警消除
[root@cmcc01 package]#
[root@cmcc01 package]# /opt/software/data-integration/kitchen.sh
Options:
  -rep            = Repository name
  -user           = Repository username
  -trustuser      = !Kitchen.ComdLine.RepUsername!
  -pass           = Repository password
  -job            = The name of the job to launch
  -dir            = The directory (dont forget the leading /)
  -file           = The filename (Job XML) to launch
  -level          = The logging level (Basic, Detailed, Debug, Rowlevel, Error, Minimal, Nothing)
  -logfile        = The logging file to write to
  -listdir        = List the directories in the repository
  -listjobs       = List the jobs in the specified directory
  -listrep        = List the available repositories
  -norep          = Do not log into the repository
  -version        = show the version, revision and build date
  -param          = Set a named parameter <NAME>=<VALUE>. For example -param:FILE=customers.csv
  -listparam      = List information concerning the defined parameters in the specified job.
  -export         = Exports all linked resources of the specified job. The argument is the name of a ZIP file.
  -custom         = Set a custom plugin specific option as a String value in the job using <NAME>=<Value>, for example: -custom:COLOR=Red
  -maxloglines    = The maximum number of log lines that are kept internally by Kettle. Set to 0 to keep all rows (default)
  -maxlogtimeout  = The maximum age (in minutes) of a log line while being kept internally by Kettle. Set to 0 to keep all rows indefinitely (default)

[root@cmcc01 package]#
[root@cmcc01 package]#

7.测试

# 执行转换
# 编写测试转换,执行如下命令即可
/opt/software/data-integration/pan.sh -file=/opt/kettle-spoon/ktr/test/test1.ktr log=test1.log

# 执行job
/opt/software/data-integration/kitchen.sh -file=/opt/kettle-spoon/ktr/test/SechuldUpdate.kjb log=timeLogUpdate.log

8.kettle整合mysql、oracle

复制驱动到data-integration/lib下

cp /opt/package/mysql-connector-java-8.0.20.jar /opt/software/data-integration/lib
cp /opt/package/ojdbc6.jar /opt/software/data-integration/lib

9.kettle整合hive

# 创建hive jar包软连接
ln -s /opt/software/hive-3.1.3-bin/lib/*.jar /opt/software/data-integration/lib

可能会报错:File exists,可忽略

创建job测试

10.修改windows端spoon.bat编码

如不进行此操作则在向子服务器提交任务时会报如下错误:Invalid byte 1 of 1-byte UTF-8 sequence

使用文本编辑器打开:\pdi-ce-9.3.0.0-428\data-integration\spoon.bat

在下图红框处添加: "-Dfile.encoding=UTF-8"

11.修改linux端Carte用户名及密码

   使用Carte服务执行作业需要授权。默认情况下,Carte只支持最基本的授权方式,就是将密码保存在kettle.pwd文件中。kettle.pwd文件位于Kettle根目录下的pwd目录下。默认情况下,kettle.pwd的内容如下:
# Please note that the default password (cluster) is obfuscated using the Encr script provided in this release
# Passwords can also be entered in plain text as before
# 
cluster: OBF:1v8w1uh21z7k1ym71z7i1ugo1v9q
     最后一行是唯一有用的一行,定义了一个用户cluster,以及混淆后的密码(这个密码也是cluster)。文件的注释说明了这个混淆的密码是由Encr.bat或encr.sh脚本生成的。
sh /data-integration/encr.sh -carte cluster
# 执行结果
OBF:1v8w1uh21z7k1ym71z7i1ugo1v9q
注意:cluster是你需混淆的密码
    将新生成的混淆后的密码填写入kettle.pwd文件中,如修改后密码未生效,则是carte服务还未重启成功,查看相应进程杀掉重启即可。

12.启动Carte服务

   在linux服务器上切换到kettle跟目录下:cd  /kettle/data-integration/
sh carte.sh 本机ip 端口号
例如:sh carte.sh 192.168.12.250 8888

执行显示如下信息,则说明服务启动成功

13.windows端kettle配置子服务器

14.创建子服务器运行配置

标签: hadoop 大数据 hdfs

本文转载自: https://blog.csdn.net/jsnh307/article/details/128498593
版权归原作者 数据治理狗 所有, 如有侵权,请联系我们删除。

“CentOS7部署kettle9.3.0并部署自服器远程提交任务”的评论:

还没有评论