官术网_书友最值得收藏!

Understanding the responsibilities of Kafka consumers

On the same lines of the previous chapter on Kafka producers, we will start by understanding the responsibilities of Kafka consumers apart from consuming messages from Kafka queues.

Let's look at them one by one:

  • Subscribing to a topic: Consumer operations start with subscribing to a topic. If consumer is part of a consumer group, it will be assigned a subset of partitions from that topic. Consumer process would eventually read data from those assigned partitions. You can think of topic subscription as a registration process to read data from topic partitions.
  • Consumer offset position: Kafka, unlike any other queues, does not maintain message offsets. Every consumer is responsible for maintaining its own consumer offset. Consumer offsets are maintained by consumer APIs and you do not have to do any additional coding for this. However, in some use cases where you may want to have more control over offsets, you can write custom logic for offset commits. We will cover such scenarios in this chapter.
  • Replay / rewind / skip messages: Kafka consumer has full control over starting offsets to read messages from a topic partition. Using consumer APIs, any consumer application can pass the starting offsets to read messages from topic partitions. They can choose to read messages from the beginning or from some specific integer offset value irrespective of what the current offset value of a partition is. In this way, consumers have the capability of replaying or skipping messages as per specific business scenarios.
  • Heartbeats: It is the consumer's responsibility to ensure that it sends regular heartbeat signals to the Kafka broker (consumer group leader) to confirm their membership and ownership of designated partitions. If heartbeats are not received by the group leader in a certain time interval, then the partition's ownership would be reassigned to some other consumer in the consumer group.
  • Offset commits: Kafka does not track positions or offsets of the messages that are read from consumer applications. It is the responsibility of the consumer application to track their partition offset and commit it. This has two advantages--this improves broker performance as they do not have to track each consumer offset and this gives flexibility to consumer applications in managing their offsets as per their specific scenarios. They can commit offsets after they finish processing a batch or they can commit offsets in the middle of very large batch processing to reduce side-effects of rebalancing.
  • Deserialization: Kafka producers serialize objects into byte arrays before they are sent to Kafka. Similarly, Kafka consumers deserialize these Java objects into byte arrays. Kafka consumer uses the deserializers that are the same as serializers used in the producer application.

Now that you have a fair idea of the responsibilities of a consumer, we can talk about consumer data flows.

The following image depicts how data is fetched from Kafka consumers:

Consumer flows

The first step toward consuming any messages from Kafka is topic subscription. Consumer applications first subscribe to one or more topics. After that, consumer applications poll Kafka servers to fetch records. In general terms, this is called poll loop. This loop takes care of server co-ordinations, record retrievals, partition rebalances, and keeps alive the heartbeats of consumers.

For new consumers that are reading data for the first time, poll loop first registers the consumer with the respective consumer group and eventually receives partition metadata. The partition metadata mostly contains partition and leader information of each topic.

Consumers, on receiving metadata, would start polling respective brokers for partitions assigned to them. If new records are found, they are retrieved and deserialized. They are finally processed and after performing some basic validations, they are stored in some external storage systems.

In very few cases, they are processed at runtime and passed to some external applications. Finally, consumers commit offsets of messages that are successfully processed. The poll loop also sends periodic keep-alive heartbeats to Kafka servers to ensure that they receive messages without interruption.

主站蜘蛛池模板: 峨眉山市| 井冈山市| 庆云县| 鹰潭市| 双流县| 黔江区| 商洛市| 隆德县| 黔西| 深泽县| 仙居县| 扶沟县| 平原县| 许昌县| 樟树市| 应城市| 前郭尔| 都兰县| 安乡县| 临泉县| 宁陵县| 屏南县| 鄯善县| 株洲县| 白城市| 丹凤县| 鹰潭市| 合山市| 绥化市| 吐鲁番市| 深圳市| 大同市| 玉山县| 平利县| 旌德县| 东台市| 临沂市| 泾阳县| 南康市| 咸阳市| 留坝县|