一个开源的分布式工作流调度系统
Apache Dolphinscheduler
概述
Apache DolphinScheduler 是一个开源的分布式工作流调度系统,主要用于数据处理和任务调度。它支持多种数据源和任务类型,能够帮助用户在大数据环境中进行复杂的工作流管理。
主要特点:
可视化界面:提供友好的用户界面,方便用户创建、管理和监控工作流。
灵活的调度:支持定时任务、依赖任务和动态任务调度。
多种任务类型:支持 Shell、Python、SQL 等多种任务类型,可以与 Hadoop、Spark、Flink 等大数据框架集成。
高可用性:通过集群部署实现高可用性,确保任务的可靠执行。
扩展性:支持插件机制,用户可以根据需要扩展功能。
GitHub地址:
https://github.com/apache/dolphinscheduler
官网:
https://dolphinscheduler.apache.org/zh-cn
安装
下载:
wget https://archive.apache.org/dist/dolphinscheduler/3.1.5/apache-dolphinscheduler-3.1.5-bin.tar.gz
解压安装
tar-zxvf apache-dolphinscheduler-3.1.8-bin.tar.gz
mv apache-dolphinscheduler-3.1.8-bin dolphinscheduler
cd dolphinscheduler
单机部署
准备工作
- 安装JDK1.8,并配置JAVA_HOME环境变量
- DolphinScheduler二进制包
- 安装数据库,如MySQL
- 对应数据库的JDBC Driver
启动DolphinScheduler
bin/dolphinscheduler-daemon.sh start standalone-server
登录DolphinScheduler
访问:
http://node01:12345/dolphinscheduler/ui
默认的用户名和密码:
admin
/
dolphinscheduler123
启停服务命令
启动Standalone Server 服务
bin/dolphinscheduler-daemon.sh start standalone-server
停止 Standalone Server 服务
bin/dolphinscheduler-daemon.sh stop standalone-server
查看 Standalone Server 状态
bin/dolphinscheduler-daemon.sh status standalone-server
配置数据库
Standalone server默认使用H2数据库作为其元数据存储数据,如果想将元数据库存储在MySQL或 PostgreSQL等其他数据库中,必须更改一些配置。
1.下载MySQL驱动JAR
将该JAR包移动到DolphinScheduler的每个模块的libs目录下,具体包括如下目录:
cp mysql-connector-java-8.0.33.jar alert-server/libs/
cp mysql-connector-java-8.0.33.jar api-server/libs/
cp mysql-connector-java-8.0.33.jar master-server/libs/
cp mysql-connector-java-8.0.33.jar worker-server/libs/
cp mysql-connector-java-8.0.33.jar standalone-server/libs/standalone-server/
cp mysql-connector-java-8.0.33.jar tools/libs/
2.创建数据库
CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
3.配置
修改
./bin/env/dolphinscheduler_env.sh
文件
exportDATABASE=${DATABASE:-mysql}exportSPRING_PROFILES_ACTIVE=${DATABASE}exportSPRING_DATASOURCE_URL="jdbc:mysql://node01:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&useSSL=false"exportSPRING_DATASOURCE_USERNAME=root
exportSPRING_DATASOURCE_PASSWORD=123456
初始化数据库
chmod +x tools/bin/upgrade-schema.sh
tools/bin/upgrade-schema.sh
出现异常:
Caused by: java.lang.RuntimeException: Driver org.postgresql.Driver claims to not accept jdbcUrl, jdbc:mysql://node01:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&useSSL=false
解决方案:
修改
./bin/env/dolphinscheduler_env.sh
文件,注释postgresql相关内容
# Database related configuration, set database type, username and password#export DATABASE=${DATABASE:-postgresql}#export SPRING_PROFILES_ACTIVE=${DATABASE}#export SPRING_DATASOURCE_URL#export SPRING_DATASOURCE_USERNAME#export SPRING_DATASOURCE_PASSWORD
接着初始化出现异常:
Caused by: java.lang.IllegalStateException: Cannot load driver class: com.mysql.cj.jdbc.Driver
at org.springframework.util.Assert.state(Assert.java:97) ~[spring-core-5.3.19.jar:5.3.19]
at org.springframework.boot.autoconfigure.jdbc.DataSourceProperties.determineDriverClassName(DataSourceProperties.java:171) ~[spring-boot-autoconfigure-2.7.3.jar:2.7.3]
原因:
官网明确说了支持 8.0.16 及以上的版本,这里使用的是
mysql-connector-java-8.0.16.jar
版本,但是实际目前并不支持!
解决方案:
使用
mysql-connector-java-8.0.16.jar
版本
cp mysql-connector-java-8.0.16.jar alert-server/libs/
cp mysql-connector-java-8.0.16.jar api-server/libs/
cp mysql-connector-java-8.0.16.jar master-server/libs/
cp mysql-connector-java-8.0.16.jar worker-server/libs/
cp mysql-connector-java-8.0.16.jar standalone-server/libs/standalone-server/
cp mysql-connector-java-8.0.16.jar tools/libs/
再次执行初始化命令,将生成如下表:
DolphinScheduler集群模式
准备工作
- 安装JDK1.8,并配置JAVA_HOME环境变量
- DolphinScheduler二进制包
- 安装数据库,如MySQL
- 对应数据库的JDBC Driver
- 搭建注册中心ZooKeeper,并启动
注意:
DolphinScheduler本身不依赖 Hadoop、Hive、Spark,但如果运行的任务需要依赖他们,就
需要有对应的环境支持
修改install_env.sh文件
修改
/bin/env/install_env.sh
,它描述了哪些机器将被安装 DolphinScheduler 以及每台机器对应安装哪些服务。您可以在路径 bin/env/install_env.sh 中找到此文件
## Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements. See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License. You may obtain a copy of the License at## http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.## ---------------------------------------------------------# INSTALL MACHINE# ---------------------------------------------------------# A comma separated list of machine hostname or IP would be installed DolphinScheduler,# including master, worker, api, alert. If you want to deploy in pseudo-distributed# mode, just write a pseudo-distributed hostname# Example for hostnames: ips="ds1,ds2,ds3,ds4,ds5", Example for IPs: ips="192.168.8.1,192.168.8.2,192.168.8.3,192.168.8.4,192.168.8.5"#ips=${ips:-"ds1,ds2,ds3,ds4,ds5"}ips="node01,node02,node03"# Port of SSH protocol, default value is 22. For now we only support same port in all `ips` machine# modify it if you use different ssh port#sshPort=${sshPort:-"22"}sshPort=22# A comma separated list of machine hostname or IP would be installed Master server, it# must be a subset of configuration `ips`.# Example for hostnames: masters="ds1,ds2", Example for IPs: masters="192.168.8.1,192.168.8.2"#masters=${masters:-"ds1,ds2"}masters="node01"# A comma separated list of machine <hostname>:<workerGroup> or <IP>:<workerGroup>.All hostname or IP must be a# subset of configuration `ips`, And workerGroup have default value as `default`, but we recommend you declare behind the hosts# Example for hostnames: workers="ds1:default,ds2:default,ds3:default", Example for IPs: workers="192.168.8.1:default,192.168.8.2:default,192.168.8.3:default"#workers=${workers:-"ds1:default,ds2:default,ds3:default,ds4:default,ds5:default"}workers="node01:default,node02:default,node03:default"# A comma separated list of machine hostname or IP would be installed Alert server, it# must be a subset of configuration `ips`.# Example for hostname: alertServer="ds3", Example for IP: alertServer="192.168.8.3"#alertServer=${alertServer:-"ds3"}alertServer="node02"# A comma separated list of machine hostname or IP would be installed API server, it# must be a subset of configuration `ips`.# Example for hostname: apiServers="ds1", Example for IP: apiServers="192.168.8.1"#apiServers=${apiServers:-"ds1"}apiServers="node03"# The directory to install DolphinScheduler for all machine we config above. It will automatically be created by `install.sh` script if not exists.# Do not set this configuration same as the current path (pwd). Do not add quotes to it if you using related path.#installPath=${installPath:-"/tmp/dolphinscheduler"}installPath="/usr/local/program/dolphinscheduler/"# The user to deploy DolphinScheduler for all machine we config above. For now user must create by yourself before running `install.sh`# script. The user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled than the root directory needs# to be created by this user#deployUser=${deployUser:-"dolphinscheduler"}deployUser="root"# The root of zookeeper, for now DolphinScheduler default registry server is zookeeper.zkRoot=${zkRoot:-"/dolphinscheduler"}
修改dolphinscheduler_env.sh文件
这里注意操作:
1.注释postgresql配置
2.指定JDK路径
3.配置Zookeeper信息
4.配置MySQL数据
## Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements. See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License. You may obtain a copy of the License at## http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.## JAVA_HOME, will use it to start DolphinScheduler server#export JAVA_HOME=${JAVA_HOME:-/opt/java/openjdk}exportJAVA_HOME="/usr/local/program/jdk8"# Database related configuration, set database type, username and password#export DATABASE=${DATABASE:-postgresql}#export SPRING_PROFILES_ACTIVE=${DATABASE}#export SPRING_DATASOURCE_URL#export SPRING_DATASOURCE_USERNAME#export SPRING_DATASOURCE_PASSWORD# DolphinScheduler server related configurationexportSPRING_CACHE_TYPE=${SPRING_CACHE_TYPE:-none}exportSPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-UTC}exportMASTER_FETCH_COMMAND_NUM=${MASTER_FETCH_COMMAND_NUM:-10}# Registry center configuration, determines the type and link of the registry centerexportREGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}#export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-localhost:2181}exportREGISTRY_ZOOKEEPER_CONNECT_STRING="node01:2181,node02:2181,node03:2181"# Tasks related configurations, need to change the configuration if you use the related tasks.exportHADOOP_HOME=${HADOOP_HOME:-/opt/soft/hadoop}exportHADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/opt/soft/hadoop/etc/hadoop}exportSPARK_HOME1=${SPARK_HOME1:-/opt/soft/spark1}exportSPARK_HOME2=${SPARK_HOME2:-/opt/soft/spark2}exportPYTHON_HOME=${PYTHON_HOME:-/opt/soft/python}exportHIVE_HOME=${HIVE_HOME:-/opt/soft/hive}exportFLINK_HOME=${FLINK_HOME:-/opt/soft/flink}exportDATAX_HOME=${DATAX_HOME:-/opt/soft/datax}exportSEATUNNEL_HOME=${SEATUNNEL_HOME:-/opt/soft/seatunnel}exportCHUNJUN_HOME=${CHUNJUN_HOME:-/opt/soft/chunjun}exportPATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$SEATUNNEL_HOME/bin:$CHUNJUN_HOME/bin:$PATHexportDATABASE=${DATABASE:-mysql}exportSPRING_PROFILES_ACTIVE=${DATABASE}exportSPRING_DATASOURCE_URL="jdbc:mysql://node01:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&useSSL=false"exportSPRING_DATASOURCE_USERNAME=root
exportSPRING_DATASOURCE_PASSWORD=123456
初始化数据库
创建数据库
CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
将该MySQL驱动JAR包移动到DolphinScheduler的每个模块的libs目录下,具体包括如下目录:
cp mysql-connector-java-8.0.16.jar alert-server/libs/
cp mysql-connector-java-8.0.16.jar api-server/libs/
cp mysql-connector-java-8.0.16.jar master-server/libs/
cp mysql-connector-java-8.0.16.jar worker-server/libs/
cp mysql-connector-java-8.0.16.jar standalone-server/libs/standalone-server/
cp mysql-connector-java-8.0.16.jar tools/libs/
执行初始化命令
chmod +x tools/bin/upgrade-schema.sh
tools/bin/upgrade-schema.sh
部署
执行如下命令进行部署,它会自动将相关服务部署到配置的节点机器上,部署后的运行日志将存放在 logs 文件夹内
./bin/install.sh
访问
部署完成后,会自动启动相关服务,然后可以访问Web界面进行相关操作。
注意:
由于配置
apiServers="node03"
,因此,应该访问
http://node03:12345/dolphinscheduler/ui
。
默认的用户名和密码:admin/dolphinscheduler123
启动、停止命令
# 一键开启集群所有服务
./bin/start-all.sh
# 一键停止集群所有服务
./bin/stop-all.sh
# 启停 Master
./bin/dolphinscheduler-daemon.sh stop master-server
./bin/dolphinscheduler-daemon.sh start master-server
# 启停 Worker
./bin/dolphinscheduler-daemon.sh start worker-server
./bin/dolphinscheduler-daemon.sh stop worker-server
# 启停 Api
./bin/dolphinscheduler-daemon.sh start api-server
./bin/dolphinscheduler-daemon.sh stop api-server
# 启停 Alert
./bin/dolphinscheduler-daemon.sh start alert-server
./bin/dolphinscheduler-daemon.sh stop alert-server
使用
创建项目
在项目管理项,创建一个项目
进入该demo项目
定义工作流
定义一个工作流
定义Shell工作流类型
执行脚本:
创建3个Shell脚本
将3个Shell脚本连接起来,他们会依次执行
启动工作流
工作流定义保存后,点击上线
然后点击运行
这个时候可能提示:
没有合适的租户,请选择可用的租户
。
解决方案:
创建用户,创建租户,为用户分配租户
1.创建租户
2.为当前登录用户分配一个租户
工作流实例
当工作流启动后,会产生一个工作流实例
点击工作流实例名称进入可以查看详细信息
任务
可以在任务项的任务定义中查看任务
也可以在任务项的任务实例查看任务执行情况
定时任务
修改定义的工作流,添加定时参数
定时参数添加后,需要进入定时管理界面上线
查看定时任务执行情况
参数
本地/局部传参
在任务定义页面配置的参数,默认作用域仅限该任务,如果配置了参数传递则可将该参数作用到下游任务中。
全局传参
全局参数是指针对整个工作流的所有任务节点都有效的参数,在工作流定义页面配置。
有2种方式设置:
保存工作流定义时、启动工作流定义时
参数传递
DolphinScheduler 允许在任务间进行参数传递,目前传递方向仅支持上游单向传递给下游。
node01节点定义输出参数
node02节点接收参数
内置参数
1.基础内置参数
变量名声明方式含义system.biz.date${system.biz.date}日常调度实例定时的定时时间前一天,格式为 yyyyMMddsystem.biz.curdate${system.biz.curdate}日常调度实例定时的定时时间,格式为 yyyyMMddsystem.datetime${system.datetime}日常调度实例定时的定时时间,格式为 yyyyMMddHHmmss
2.衍生内置参数
支持代码中自定义变量名,声明方式:
${变量名}
。
使用:
变量命
IN/OUT
$[yyyy-MM-dd]
$[]中的日期可以任意分解组合
资源中心
资源中心通常用于上传文件、UDF 函数,以及任务组管理等操作。
资源中心可以对接本地文件系统、分布式文件存储系统或者MinIO集群,也可以对接远端的对象存储,如阿里云OSS等。
对接HDFS存储系统
当需要使用资源中心进行相关文件的创建或者上传操作时,所有的文件和资源都会被存储在分布式文件系统HDFS
配置common.properties文件
需要对
api-server/conf/common.properties
和
worker-server/conf/common.properties
配置
编辑修改:
vim api-server/conf/common.properties
# resource storage type: HDFS, S3, OSS, NONE#resource.storage.type=NONEresource.storage.type=HDFS
# resource store on HDFS/S3 path, resource file will store to this base path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended#resource.storage.upload.base.path=/dolphinschedulerresource.storage.upload.base.path=hdfs://node01:9000/dolphinscheduler
# if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path#resource.hdfs.root.user=hdfsresource.hdfs.root.user=root
# if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir#resource.hdfs.fs.defaultFS=hdfs://mycluster:8020resource.hdfs.fs.defaultFS=hdfs://node01:9000
修改完成后复制覆盖
cp api-server/conf/common.properties worker-server/conf/
集群环境还需要将这两项配置进行分发
sync.sh api-server/conf/common.properties
sync.sh worker-server/conf/common.properties
重启Dolphinscheduler
# 一键开启集群所有服务
./bin/start-all.sh
# 一键停止集群所有服务
./bin/stop-all.sh
创建资源
在资源中心创建
sh
目录
进入
sh
文件夹,新建
test.sh
脚本
使用资源
在定义工作流程时,使用资源
告警
钉钉告警
在安全中心创建告警实例
钉钉机器人配置如下:
创建钉钉告警实例
参数配置
Webhook:https://oapi.dingtalk.com/robot/send?access_token=XXXXXX
Keyword:安全设置的自定义关键词
Secret:安全设置的加签
消息类型:支持 text 和 markdown 两种类型
创建告警组,将钉钉告警加入其中
启动任务,选择通知策略与告警组
查看钉钉:
Email告警
首先需要开启邮箱的
POP3/SMTP/IMAP
服务
新增一个授权码
使用提供的邮件服务器地址
邮件告警配置如下:
注意:请求认证下方:用户是发件邮箱,密码是授权码
启动任务,执行测试,邮箱收件内容如下:
版权归原作者 CodeDevMaster 所有, 如有侵权,请联系我们删除。