0


20240901 大数据流式计算 - SPRAK3.5与FLINK1.19(入门)

简要对比

spark structured stream

Structured Streaming Programming Guide - Spark 3.5.1 Documentation

flink

Apache Flink Documentation | Apache Flink
Apache Flink CDC | Apache Flink CDC

sourcefile sourceAPI: readStream.format("csv")...flink SQLsourcekafka sourceAPI: readStreammWriter.format("kafka")...flink SQLsourceredis sourcekAPI: readStream.format("redis")...没有基于stream;可以做batch/dim表https://github.com/jeff-zou/flink-connector-redissourcejdbc sourceN/Aflink SQL CDCspark structured streamflinksinkfile
*sdf.writeStream.format(...)
append
appendsinkkafka
kafka upsert*sdf.writeStream.format(...)
Append,
Update,
Complete
(at-least-once)
相当于没有主键的K,V表;
所有mode,都是insert into
基于SQL语义自动判断:

  1. 当simple source to sink ETL, append mode, 可以写KAFKA
  2. 当agg, 有update语义,可以写upsert-kafkasinkredis sink支持A,U,C; 但需要通过foreach自定义实现(间接调用普通的df.write)

source:基于redis v5+, stream api (xadd,xread)
sink: foreachBatch,基于hset hget没有基于stream;可以做batch/dim表h

标签: 大数据 spark scala

本文转载自: https://blog.csdn.net/weixin_46449024/article/details/141761576
版权归原作者 weixin_46449024 所有, 如有侵权,请联系我们删除。

“20240901 大数据流式计算 - SPRAK3.5与FLINK1.19(入门)”的评论:

还没有评论