简要对比
spark structured stream
Structured Streaming Programming Guide - Spark 3.5.1 Documentation
flink
Apache Flink Documentation | Apache Flink
Apache Flink CDC | Apache Flink CDC
sourcefile sourceAPI: readStream.format("csv")...flink SQLsourcekafka sourceAPI: readStreammWriter.format("kafka")...flink SQLsourceredis sourcekAPI: readStream.format("redis")...没有基于stream;可以做batch/dim表https://github.com/jeff-zou/flink-connector-redissourcejdbc sourceN/Aflink SQL CDCspark structured streamflinksinkfile
*sdf.writeStream.format(...)
appendappendsinkkafka
kafka upsert*sdf.writeStream.format(...)
Append,
Update,
Complete
(at-least-once)
相当于没有主键的K,V表;
所有mode,都是insert into基于SQL语义自动判断:
- 当simple source to sink ETL, append mode, 可以写KAFKA
- 当agg, 有update语义,可以写upsert-kafkasinkredis sink支持A,U,C; 但需要通过foreach自定义实现(间接调用普通的df.write)
source:基于redis v5+, stream api (xadd,xread)
sink: foreachBatch,基于hset hget没有基于stream;可以做batch/dim表h
版权归原作者 weixin_46449024 所有, 如有侵权,请联系我们删除。