- Building Data Streaming Applications with Apache Kafka
- Manish Kumar Chanchal Singh
- 663字
- 2022-07-12 10:38:16
Understanding the responsibilities of Kafka consumers
On the same lines of the previous chapter on Kafka producers, we will start by understanding the responsibilities of Kafka consumers apart from consuming messages from Kafka queues.
Let's look at them one by one:
- Subscribing to a topic: Consumer operations start with subscribing to a topic. If consumer is part of a consumer group, it will be assigned a subset of partitions from that topic. Consumer process would eventually read data from those assigned partitions. You can think of topic subscription as a registration process to read data from topic partitions.
- Consumer offset position: Kafka, unlike any other queues, does not maintain message offsets. Every consumer is responsible for maintaining its own consumer offset. Consumer offsets are maintained by consumer APIs and you do not have to do any additional coding for this. However, in some use cases where you may want to have more control over offsets, you can write custom logic for offset commits. We will cover such scenarios in this chapter.
- Replay / rewind / skip messages: Kafka consumer has full control over starting offsets to read messages from a topic partition. Using consumer APIs, any consumer application can pass the starting offsets to read messages from topic partitions. They can choose to read messages from the beginning or from some specific integer offset value irrespective of what the current offset value of a partition is. In this way, consumers have the capability of replaying or skipping messages as per specific business scenarios.
- Heartbeats: It is the consumer's responsibility to ensure that it sends regular heartbeat signals to the Kafka broker (consumer group leader) to confirm their membership and ownership of designated partitions. If heartbeats are not received by the group leader in a certain time interval, then the partition's ownership would be reassigned to some other consumer in the consumer group.
- Offset commits: Kafka does not track positions or offsets of the messages that are read from consumer applications. It is the responsibility of the consumer application to track their partition offset and commit it. This has two advantages--this improves broker performance as they do not have to track each consumer offset and this gives flexibility to consumer applications in managing their offsets as per their specific scenarios. They can commit offsets after they finish processing a batch or they can commit offsets in the middle of very large batch processing to reduce side-effects of rebalancing.
- Deserialization: Kafka producers serialize objects into byte arrays before they are sent to Kafka. Similarly, Kafka consumers deserialize these Java objects into byte arrays. Kafka consumer uses the deserializers that are the same as serializers used in the producer application.
Now that you have a fair idea of the responsibilities of a consumer, we can talk about consumer data flows.
The following image depicts how data is fetched from Kafka consumers:

The first step toward consuming any messages from Kafka is topic subscription. Consumer applications first subscribe to one or more topics. After that, consumer applications poll Kafka servers to fetch records. In general terms, this is called poll loop. This loop takes care of server co-ordinations, record retrievals, partition rebalances, and keeps alive the heartbeats of consumers.
Consumers, on receiving metadata, would start polling respective brokers for partitions assigned to them. If new records are found, they are retrieved and deserialized. They are finally processed and after performing some basic validations, they are stored in some external storage systems.
In very few cases, they are processed at runtime and passed to some external applications. Finally, consumers commit offsets of messages that are successfully processed. The poll loop also sends periodic keep-alive heartbeats to Kafka servers to ensure that they receive messages without interruption.
- ASP.NET Core 5.0開發入門與實戰
- Python for Secret Agents:Volume II
- Apache Hive Essentials
- React.js Essentials
- 區塊鏈:以太坊DApp開發實戰
- 薛定宇教授大講堂(卷Ⅳ):MATLAB最優化計算
- 算法訓練營:提高篇(全彩版)
- Visual C#通用范例開發金典
- Solr Cookbook(Third Edition)
- R語言數據可視化:科技圖表繪制
- jQuery技術內幕:深入解析jQuery架構設計與實現原理
- 程序員的成長課
- Java Web開發基礎與案例教程
- RESTful Web API Design with Node.js(Second Edition)
- GO語言編程從入門到實踐