書名： Building Data Streaming Applications with Apache Kafka
作者名： Manish Kumar Chanchal Singh
本章字數： 663字
更新時間： 2022-07-12 10:38:16

Understanding the responsibilities of Kafka consumers

On the same lines of the previous chapter on Kafka producers, we will start by understanding the responsibilities of Kafka consumers apart from consuming messages from Kafka queues.

Let's look at them one by one:

Subscribing to a topic: Consumer operations start with subscribing to a topic. If consumer is part of a consumer group, it will be assigned a subset of partitions from that topic. Consumer process would eventually read data from those assigned partitions. You can think of topic subscription as a registration process to read data from topic partitions.
Consumer offset position: Kafka, unlike any other queues, does not maintain message offsets. Every consumer is responsible for maintaining its own consumer offset. Consumer offsets are maintained by consumer APIs and you do not have to do any additional coding for this. However, in some use cases where you may want to have more control over offsets, you can write custom logic for offset commits. We will cover such scenarios in this chapter.
Replay / rewind / skip messages: Kafka consumer has full control over starting offsets to read messages from a topic partition. Using consumer APIs, any consumer application can pass the starting offsets to read messages from topic partitions. They can choose to read messages from the beginning or from some specific integer offset value irrespective of what the current offset value of a partition is. In this way, consumers have the capability of replaying or skipping messages as per specific business scenarios.
Heartbeats: It is the consumer's responsibility to ensure that it sends regular heartbeat signals to the Kafka broker (consumer group leader) to confirm their membership and ownership of designated partitions. If heartbeats are not received by the group leader in a certain time interval, then the partition's ownership would be reassigned to some other consumer in the consumer group.
Offset commits: Kafka does not track positions or offsets of the messages that are read from consumer applications. It is the responsibility of the consumer application to track their partition offset and commit it. This has two advantages--this improves broker performance as they do not have to track each consumer offset and this gives flexibility to consumer applications in managing their offsets as per their specific scenarios. They can commit offsets after they finish processing a batch or they can commit offsets in the middle of very large batch processing to reduce side-effects of rebalancing.
Deserialization: Kafka producers serialize objects into byte arrays before they are sent to Kafka. Similarly, Kafka consumers deserialize these Java objects into byte arrays. Kafka consumer uses the deserializers that are the same as serializers used in the producer application.

Now that you have a fair idea of the responsibilities of a consumer, we can talk about consumer data flows.

The following image depicts how data is fetched from Kafka consumers:

Consumer flows

The first step toward consuming any messages from Kafka is topic subscription. Consumer applications first subscribe to one or more topics. After that, consumer applications poll Kafka servers to fetch records. In general terms, this is called poll loop. This loop takes care of server co-ordinations, record retrievals, partition rebalances, and keeps alive the heartbeats of consumers.

For new consumers that are reading data for the first time, poll loop first registers the consumer with the respective consumer group and eventually receives partition metadata. The partition metadata mostly contains partition and leader information of each topic.

Consumers, on receiving metadata, would start polling respective brokers for partitions assigned to them. If new records are found, they are retrieved and deserialized. They are finally processed and after performing some basic validations, they are stored in some external storage systems.

In very few cases, they are processed at runtime and passed to some external applications. Finally, consumers commit offsets of messages that are successfully processed. The poll loop also sends periodic keep-alive heartbeats to Kafka servers to ensure that they receive messages without interruption.

官术网_书友最值得收藏!

Building Data Streaming Applications with Apache Kafka

Understanding the responsibilities of Kafka consumers