- Building Data Streaming Applications with Apache Kafka
- Manish Kumar Chanchal Singh
- 507字
- 2022-07-12 10:38:17
Additional configuration
You have learned a few mandatory parameters in the beginning. Kafka consumer has lots of properties and in most cases, some of them do not require any modification. There are a few parameters that can help you increase performance and availability of consumers:
- enable.auto.commit: If this is configured to true, then consumer will automatically commit the message offset after the configured interval of time. You can define the interval by setting auto.commit.interval.ms. However, the best idea is to set it to false in order to have control over when you want to commit the offset. This will help you avoid duplicates and miss any data to process.
- fetch.min.bytes: This is the minimum amount of data in bytes that the Kafka server needs to return for a fetch request. In case the data is less than the configured number of bytes, the server will wait for enough data to accumulate and then send it to consumer. Setting the value greater than the default, that is, one byte, will increase server throughput but will reduce latency of the consumer application.
- request.timeout.ms: This is the maximum amount of time that consumer will wait for a response to the request made before resending the request or failing when the maximum number of retries is reached.
- auto.offset.reset: This property is used when consumer doesn't have a valid offset for the partition from which it is reading the value.
- latest: This value, if set to latest, means that the consumer will start reading from the latest message from the partition available at that time when consumer started.
- earliest: This value, if set to earliest, means that the consumer will start reading data from the beginning of the partition, which means that it will read all the data from the partition.
- none: This value, if set to none, means that an exception will be thrown to the consumer.
- session.timeout.ms: Consumer sends a heartbeat to the consumer group coordinator to tell it that it is alive and restrict triggering the rebalancer. The consumer has to send heartbeats within the configured period of time. For example, if timeout is set for 10 seconds, consumer can wait up to 10 seconds before sending a heartbeat to the group coordinator; if it fails to do so, the group coordinator will treat it as dead and trigger the rebalancer.
- max.partition.fetch.bytes: This represents the maximum amount of data that the server will return per partition. Memory required by consumer for the ConsumerRecord object must be bigger then numberOfParition*valueSet. This means that if we have 10 partitions and 1 consumer, and max.partition.fetch.bytes is set to 2 MB, then consumer will need 10*2 =20 MB for consumer record.
Remember that before setting this, we must know how much time consumer takes to process the data; otherwise, consumer will not be able to send heartbeats to the consumer group and the rebalance trigger will occur. The solution could be to increase session timeout or decrease partition fetch size to low so that consumer can process it as fast as it can.
推薦閱讀
- Oracle從入門到精通(第3版)
- HTML5+CSS3王者歸來
- MySQL數(shù)據(jù)庫管理實戰(zhàn)
- Learning C# by Developing Games with Unity 2020
- Programming ArcGIS 10.1 with Python Cookbook
- Data Analysis with Stata
- Mastering Apache Spark 2.x(Second Edition)
- Drupal 8 Configuration Management
- Android Native Development Kit Cookbook
- 計算機應用基礎(chǔ)實踐教程
- 21天學通C++(第5版)
- Spring+Spring MVC+MyBatis從零開始學
- Scratch·愛編程的藝術(shù)家
- Apache Camel Developer's Cookbook
- Python網(wǎng)絡(luò)爬蟲技術(shù)與應用