Flink 1.14 的 mysql CDC 2.2实时增量同步使用

CDC 简介

    CDC 即**Change Data Capture **变更数据捕获，我们可以通过 CDC 得知数据源表的更新内容（包含Insert Update 和 Delete），并将这些更新内容作为数据流发送到下游系统。捕获到的数据操作具有一个标识符，分别对应数据的增加，修改和删除。

flink mysql cdc 官网传送门>>

+I：新增数据。
-U：一条数据的修改会产生两个U标识符数据。其中-U含义为修改前数据。
+U：修改之后的数据。
-D：删除的数据。

step 1 配置 mysql 开启binlog

** ** flink mysql cdc 的依赖于mysql的binlog日志的监听，所以我们要对MySQL开启binlog日志

修改我们的配置文件**

my.cnf

**，增加：

    server_id=1
    log_bin=mysql-bin
    binlog_format=ROW
    expire_logs_days=30

重启 mysql

 service mysql restart

然后查询是否开启成功

show variables like '%log_bin%'

step 2 flink 测试 mysql cdc

    尝试实时同步mysql的数据，会先同步历史数据，然后再根据binlog进行实时增量同步。

    1、下载对应版本的 flink cdc 2.2插件到 ${FLINK_HONE}/lib 目录。插件连接传送门>>>

           下载sql版本flink-sql-connector-mysql-cdc.jar，不要下载错非sql版本了。

            关于版本支持，目前cdc 2.2好像才刚刚支持flink1.14，具体看github介绍。

    2、创建 mysql 表

create database if not exists test;

drop table if exists test.product_info;

CREATE TABLE test.`product_info` (
  `id_` int NOT NULL,
  `product_id` int DEFAULT NULL,
  `product_name` varchar(60) DEFAULT NULL,
  PRIMARY KEY (`id_`)
) 

insert into test.product_info (id_,product_id,product_name) values 
(1,5,'华为'),
(2,9,'apple')，
(3,8,'服务器内存'),
(5,6,'卫衣'),
(7,2,'风扇');

    3、flink 创建临时表       

    CDC 2.0 支持了无锁算法，支持并发读取，为了保证全量数据 + 增量数据的顺序性，需要依赖Flink 的 checkpoint机制，所以作业需要配置 checkpoint，不然只有在1个并行度下才会更新数据。 SQL 作业中配置方式：

 Flink SQL> SET 'execution.checkpointing.interval' = '3s';

DataStream 作业配置方式：

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(3000);

SET 'execution.checkpointing.interval' = '3s';
-- sql 客户端要设置3s一个checkpoint，不然不会更新数据。

drop table if EXISTS test.dim_product_info;

create table if not EXISTS test.dim_product_info(
    id_ int ,
    product_id int,
    product_name string,
    PRIMARY KEY(id_) NOT ENFORCED
) WITH (
 'connector' = 'mysql-cdc',
 'hostname' = '192.168.45.1',
 'port' = '3306',
 'username' = 'root',
 'password' = '123456',
 'database-name' = 'test',
 'table-name' = 'product_info' 
 );

select * from test.dim_product_info;

    4、往mysql表分别做插入、更新、删除操作，可实时查看数据变更。

delete from test.product_info where id_=1;

update test.product_info set product_name='路由器' where id_=5;

insert into test.product_info select 8,6,'书本';

Step 3 flink cdc 实时ETL 实例

-- 设置cp路径时间间隔
SET 'state.checkpoint.path' = 'file:///tmp/flink/checkpoint';
-- 设置cp存储类型
set 'state.checkpoint-storage' ='filesystem'
-- 设置cp时间间隔
SET 'execution.checkpointing.interval' = '3s';

drop table if EXISTS ods_dataGen_order;

-- 创建一个订单表，用datagen生成数据
create table if not EXISTS ods_dataGen_order(
    id_ int ,
    price decimal(8,3) ,
    product_id int,
    user_id int,
    order_date TIMESTAMP ,
    PRIMARY KEY(id_) NOT ENFORCED
)WITH (
 'connector' = 'datagen',
 'rows-per-second'='3',
 'number-of-rows'='1000',
 'fields.id_.kind'='sequence',
 'fields.id_.start'='1',
 'fields.id_.end'='1000',
 
 'fields.price.kind'='random',
 'fields.price.min'='10',
 'fields.price.max'='1000',
 
 'fields.product_id.kind'='random',
 'fields.product_id.min'='1',
 'fields.product_id.max'='10',
 
 'fields.user_id.kind'='random',
 'fields.user_id.min'='1',
 'fields.user_id.max'='50'
);

drop table if EXISTS dim_product_info;

create table if not EXISTS dim_product_info(
    id_ int ,
    product_id int,
    product_name string,
    PRIMARY KEY(id_) NOT ENFORCED
) WITH (
 'connector' = 'mysql-cdc',
 'hostname' = '192.168.45.1',
 'port' = '3306',
 'username' = 'root',
 'password' = '123456',
 'database-name' = 'test',
 'table-name' = 'product_info'
 
 );

-- drop table if EXISTS ads_restul;
-- create table if not EXISTS ads_restul(
--     id_ int ,
--     price decimal(8,3) ,
--     product_id int,
--     user_id int,
--     order_date TIMESTAMP ,
--     id_2 int ,
--     product_id_2 int,
--     product_name string,
--     PRIMARY KEY(id_) NOT ENFORCED
-- ) WITH ('connector' = 'print');

-- INSERT INTO  ads_restul
-- 实时ETL ，订单流水关联产品档
select  a.id_  , a.price  , a.product_id , a.user_id , a.order_date  , b.id_  , b.product_id , b.product_name 
 from ods_dataGen_order a,dim_product_info b where a.product_id=b.product_id;

end

    关于其他的疑问类似于

    1、主从库，我是否可以监听从库

    2、我不要同步存量，只同步增量等等。。。。

    可以看github的解答FAQ(ZH) · ververica/flink-cdc-connectors Wiki · GitHub

标签： flink 大数据

本文转载自: https://blog.csdn.net/qq_44326412/article/details/127064981
版权归原作者 AG南山 所有，如有侵权，请联系我们删除。

Flink 1.14 的 mysql CDC 2.2实时增量同步使用

CDC 简介

step 1 配置 mysql 开启binlog

step 2 flink 测试 mysql cdc

Step 3 flink cdc 实时ETL 实例

end

发表评论

“Flink 1.14 的 mysql CDC 2.2实时增量同步使用”的评论:

关于作者

overfit同步小助手

相关阅读

文章导航