Kafka之offset位移

首先回顾下 offset 的定义

offset ：在 Apache Kafka 中，offset 是一个用来唯一标识消息在分区中位置的数字。每个分区中的消息都会被分配一个唯一的 offset 值，用来表示该消息在该分区中的位置。消费者可以通过记录自己消费的最后一个 offset 值来跟踪自己消费消息的进度，确保不会漏掉消息或者重复消费消息。通过管理 offset，Kafka 实现了高效的消息传递和消费处理。

在 kafka0.9版本之前，consumer 默认将 offset 保存在 Zookeeper 中，但在0.9版本之后，offset被保存在 Kafka 一个内置的 topic 中，该 topic 为 __consumer_offsets

__consumer_offsets ：采用 KV 键值对的方式存储，key：group.id + topic + 分区号，value ：当前 offset 的值

__consumer_offsets 既然作为一个 topic 存在与 Kafka 中，那么它也可以通过消费者消费数据的方式进行消费。

自动提交 offset

在 Kafka 所提供的API中，enable.auto.commit 参数的值表示是否开启自动提交 offset，默认为 true，消费者会自动周期性地向服务器提交偏移量，而 auto.commit.interval.ms 则表示自动提交 offset 的时间间隔，默认是5s。

核心代码的实现

public class CustomConsumerAutoOffset {

    public static void main(String[] args) {

        // 0 配置
        Properties properties = new Properties();

        // 连接
        properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "hadoop102:9092,hadoop103:9092");

        // 反序列化
        properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());

        // 配置消费者组id
        properties.put(ConsumerConfig.GROUP_ID_CONFIG, "test");

        // 自动提交
        properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,true);

        // 提交时间间隔
        properties.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG,1000);

        // 1 创建一个消费者
        KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<>(properties);

        // 2 订阅主题first
        ArrayList<String> topics = new ArrayList<>();
        topics.add("first");
        kafkaConsumer.subscribe(topics);

        // 3 消费数据
        while (true) {

            ConsumerRecords<String, String> consumerRecords = kafkaConsumer.poll(Duration.ofSeconds(1));

            for (ConsumerRecord<String, String> consumerRecord : consumerRecords) {
                System.out.println(consumerRecord);
            }
        }

    }
}

手动提交 offset

相比起自动提交，手动提交可以让开发者更加容易的把握提交时机，同样手动提交也分为同步提交（commitSync）和异步提交（commitAsync）

commitSync（同步提交）：必须等待offset提交完毕，再去消费下一批数据。

commitAsync（异步提交）：发送完提交offset请求后，就开始消费下一批数据了。

首先将 enable.auto.commit 参数的值改为 false

// 手动提交
properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,false);

然后在消费数据的代码中，加入手动提交 offset

        // 3 消费数据
        while (true) {

            ConsumerRecords<String, String> consumerRecords = kafkaConsumer.poll(Duration.ofSeconds(1));

            for (ConsumerRecord<String, String> consumerRecord : consumerRecords) {
                System.out.println(consumerRecord);
            }

            // 手动提交offset（同步提交）
//            kafkaConsumer.commitSync();
            // 异步提交(主要)
            kafkaConsumer.commitAsync();
        }

指定 offset 消费

当消费者组第一次消费或者服务器上不再存在当前偏移量时，可以通过设置 auto.offset.reset 参数来指定偏移量的重置策略。

如果设置为 earliest ，则会将偏移量重置为最早的可用偏移量，相当于从最早的消息开始消费（即 --from-beginning）。
如果设置为 latest（默认值），则会将偏移量重置为最新的偏移量，即从最新的消息开始消费。
如果设置为 none，当未找到消费者组的先前偏移量时，会向消费者抛出异常。
也可以通过任意指定 offset 位移来开始消费。

主要介绍下通过任意指定 offset 位移来开始消费。在消费者代码的基础上，指定所要消费的位置，以及指定 offset

        // 指定位置进行消费
        Set<TopicPartition> assignment = kafkaConsumer.assignment();

        // 保证分区分配方案已经指定完毕
        while (assignment.size() == 0){
            kafkaConsumer.poll(Duration.ofSeconds(1));

            assignment = kafkaConsumer.assignment();
        }

        // 指定offset
        for (TopicPartition topicPartition : assignment) {
            kafkaConsumer.seek(topicPartition, 600);
        }

指定时间消费

除了上面说的指定 offset 进行消费，也可以指定时间进行消费，比如指定消费前一天以后的数据

核心思路是，将想要指定的时间转换为对应的 offset 值，会用到 Kafka 所提供的 API：offsetsForTimes （这里的逻辑比较饶，这里只做介绍）

核心代码如下：

        // 指定位置进行消费
        Set<TopicPartition> assignment = kafkaConsumer.assignment();

        // 保证分区分配方案已经指定完毕
        while (assignment.size() == 0) {
            kafkaConsumer.poll(Duration.ofSeconds(1));

            assignment = kafkaConsumer.assignment();
        }

        // 希望将时间转换为对应的offset
        HashMap<TopicPartition, Long> topicPartitionLongHashMap = new HashMap<>();

        // 封装对应集合
        for (TopicPartition topicPartition : assignment) {
            topicPartitionLongHashMap.put(topicPartition, System.currentTimeMillis() - 1 * 24 * 3600 * 1000);
        }

            Map<TopicPartition, OffsetAndTimestamp> topicPartitionOffsetAndTimestampMap = kafkaConsumer.offsetsForTimes(topicPartitionLongHashMap);

        // 指定offset
        for (TopicPartition topicPartition : assignment) {

            OffsetAndTimestamp offsetAndTimestamp = topicPartitionOffsetAndTimestampMap.get(topicPartition);

            kafkaConsumer.seek(topicPartition, offsetAndTimestamp.offset());

        }

标签： kafka 分布式

本文转载自: https://blog.csdn.net/weixin_62926228/article/details/136408251
版权归原作者 之沐（沉淀ing） 所有，如有侵权，请联系我们删除。

Kafka之offset位移

首先回顾下 offset 的定义

自动提交 offset

手动提交 offset

指定 offset 消费

指定时间消费

发表评论

“Kafka之offset位移”的评论:

关于作者

overfit同步小助手

相关阅读

文章导航