0


Kafka connect

这里以 mysql - kafka connect - oracle 实现upsert 全量同步为例:

启动zookeeper 、 kafka 等组件后

编写kafka/config/connect-distributed.properties文件

  1. ##
  2. # Licensed to the Apache Software Foundation (ASF) under one or more
  3. # contributor license agreements. See the NOTICE file distributed with
  4. # this work for additional information regarding copyright ownership.
  5. # The ASF licenses this file to You under the Apache License, Version 2.0
  6. # (the "License"); you may not use this file except in compliance with
  7. # the License. You may obtain a copy of the License at
  8. #
  9. # http://www.apache.org/licenses/LICENSE-2.0
  10. #
  11. # Unless required by applicable law or agreed to in writing, software
  12. # distributed under the License is distributed on an "AS IS" BASIS,
  13. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  14. # See the License for the specific language governing permissions and
  15. # limitations under the License.
  16. ##
  17. # This file contains some of the configurations for the Kafka Connect distributed worker. This file is intended
  18. # to be used with the examples, and some settings may differ from those used in a production system, especially
  19. # the `bootstrap.servers` and those specifying replication factors.
  20. # A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.
  21. bootstrap.servers=localhost:9092
  22. # unique name for the cluster, used in forming the Connect cluster group. Note that this must not conflict with consumer group IDs
  23. group.id=connect-cluster
  24. # The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will
  25. # need to configure these based on the format they want their data in when loaded from or stored into Kafka
  26. key.converter=org.apache.kafka.connect.json.JsonConverter
  27. value.converter=org.apache.kafka.connect.json.JsonConverter
  28. # Converter-specific settings can be passed in by prefixing the Converter's setting with the converter we want to apply
  29. # it to
  30. key.converter.schemas.enable=true
  31. value.converter.schemas.enable=true
  32. # Topic to use for storing offsets. This topic should have many partitions and be replicated and compacted.
  33. # Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
  34. # the topic before starting Kafka Connect if a specific topic configuration is needed.
  35. # Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
  36. # Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able
  37. # to run this example on a single-broker cluster and so here we instead set the replication factor to 1.
  38. offset.storage.topic=connect-offsets
  39. offset.storage.replication.factor=1
  40. #offset.storage.partitions=25
  41. # Topic to use for storing connector and task configurations; note that this should be a single partition, highly replicated,
  42. # and compacted topic. Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
  43. # the topic before starting Kafka Connect if a specific topic configuration is needed.
  44. # Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
  45. # Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able
  46. # to run this example on a single-broker cluster and so here we instead set the replication factor to 1.
  47. config.storage.topic=connect-configs
  48. config.storage.replication.factor=1
  49. # Topic to use for storing statuses. This topic can have multiple partitions and should be replicated and compacted.
  50. # Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
  51. # the topic before starting Kafka Connect if a specific topic configuration is needed.
  52. # Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
  53. # Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able
  54. # to run this example on a single-broker cluster and so here we instead set the replication factor to 1.
  55. status.storage.topic=connect-status
  56. status.storage.replication.factor=1
  57. #status.storage.partitions=5
  58. # Flush much faster than normal, which is useful for testing/debugging
  59. offset.flush.interval.ms=10000
  60. # List of comma-separated URIs the REST API will listen on. The supported protocols are HTTP and HTTPS.
  61. # Specify hostname as 0.0.0.0 to bind to all interfaces.
  62. # Leave hostname empty to bind to default interface.
  63. # Examples of legal listener lists: HTTP://myhost:8083,HTTPS://myhost:8084"
  64. listeners=HTTP://collector:8083
  65. # The Hostname & Port that will be given out to other workers to connect to i.e. URLs that are routable from other servers.
  66. # If not set, it uses the value for "listeners" if configured.
  67. rest.advertised.host.name=my-connect-worker-host
  68. rest.advertised.port=8083
  69. # rest.advertised.listener=
  70. # Set to a list of filesystem paths separated by commas (,) to enable class loading isolation for plugins
  71. # (connectors, converters, transformations). The list should consist of top level directories that include
  72. # any combination of:
  73. # a) directories immediately containing jars with plugins and their dependencies
  74. # b) uber-jars with plugins and their dependencies
  75. # c) directories immediately containing the package directory structure of classes of plugins and their dependencies
  76. # Examples:
  77. # plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors,
  78. plugin.path=/home/connect/confluentinc-kafka-connect-jdbc-10.7.4/lib/,/home/connect/debezium-connector-oracle

注:要确保8083端口没被占用

**启动 connect **

./bin/connect-distributed.sh ./config/connect-distributed.properties

注: 这里窗口会被占用,不想被占用,用 nohup 启动

编写mysql-source文件

  1. {
  2. "name": "mysql-source",
  3. "config": {
  4. "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
  5. "connection.url": "jdbc:mysql://collector:3306/test?user=root&password=123456",
  6. "mode": "bulk",
  7. "table.whitelist": "student",
  8. "topic.prefix": "student-"
  9. }
  10. }

注:这里 mode 要写为bulk 才能实现全量同步,incrementing 是增量

编写oracle-sink 文件

  1. {
  2. "name": "oracle-sink",
  3. "config": {
  4. "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
  5. "connection.url": "jdbc:oracle:thin:@collector:1521:orcl",
  6. "db.hostname": "collector",
  7. "tasks.max": "1",
  8. "connection.user": "dbzuser",
  9. "connection.password": "dbz2023",
  10. "db.fetch.size": "1",
  11. "topics": "student-student",
  12. "multitenant": "false",
  13. "table.name.format": "t1",
  14. "dialect.name": "OracleDatabaseDialect",
  15. "auto.evolve": "true",
  16. "pk.mode": "record_value",
  17. "pk.fields": "id",
  18. "insert.mode": "upsert"
  19. }
  20. }

注:这里的topic 是提前创建好的student-student,也可以不创建,他自己生成,但指定的时候要去指定前缀。

同时,还需要对应的mysql 、 oracle 驱动,这里用的mysql 8.0.26 、ojdbc8-23.3.0.23.09,

connector驱动类就写

  1. io.confluent.connect.jdbc.JdbcSourceConnectorJdbcSinkConnector

向8083端口注册(curl 请求)

curl -i -X POST -H "Accept: application/json" -H "Content-Type: application/json" http://collector:8083/connectors/ -d @/opt/installs/kafka/connector/mysql-source.json

curl -i -X POST -H "Accept: application/json" -H "Content-Type: application/json" http://collector:8083/connectors/ -d @/opt/installs/kafka/connector/oracle-sink.json

查看目前的connect连接

curl http://collector:8083/connectors

测试操作

切换oracle用户启动oracle

[root@collector connector]# su oracle

[oracle@collector connector]$ lsnrctl start

[oracle@collector connector]$ sqlplus /nolog

SQL> conn /as sysdba

SQL> startup

mysql 源表添加一条记录

去查oracle 目标表

curl操作
REST API描述GET /查看Kafka集群版本信息GET /connectors查看当前活跃的连接器列表,显示连接器的名字POST /connectors根据指定配置,创建一个新的连接器GET /connectors/{name}查看指定连接器的信息GET /connectors/{name}/config查看指定连接器的配置信息PUT /connectors/{name}/config修改指定连接器的配置信息GET /connectors/{name}/status查看指定连接器的状态POST /connectors/{name}/restart重启指定的连接器PUT /connectors/{name}/pause暂停指定的连接器GET /connectors/{name}/tasks查看指定连接器正在运行的TaskPOST /connectors/{name}/tasks修改Task的配置GET /connectors/{name}/tasks/{taskId}/status查看指定连接器中指定Task的状态POST /connectors/{name}/tasks/{tasked}/restart重启指定连接器中指定的TaskDELETE /connectors/{name}/删除指定的连接器
参考文档:Kafka——Kafka Connect详解_kafka-connect-CSDN博客

标签: kafka 大数据

本文转载自: https://blog.csdn.net/Hermes333/article/details/135224850
版权归原作者 Hermes333 所有, 如有侵权,请联系我们删除。

“Kafka connect”的评论:

还没有评论