書名： Building Data Streaming Applications with Apache Kafka
作者名： Manish Kumar Chanchal Singh
本章字數： 541字
更新時間： 2022-07-12 10:38:12

Message topics

If you are into software development and services, I am sure you will have heard terms such as database, tables, records, and so on. In a database, we have multiple tables; let's say, Items, Price, Sales, Inventory, Purchase, and many more. Each table contains data of a specific category. There will be two parts in the application: one will be inserting records into these tables and the other will be reading records from these tables. Here, tables are the topics in Kafka, applications that are inserting data into tables are producers, and applications that are reading data are consumers.

In a messaging system, messages need to be stored somewhere. In Kafka, we store messages into topics. Each topic belongs to a category, which means that you may have one topic storing item information and another may store sales information. A producer who wants to send a message may send it to its own category of topics. A consumer who wants to read these messages will simply subscribe to the category of topics that he is interested in and will consume it. Here are a few terms that we need to know:

Retention Period: The messages in the topic need to be stored for a defined period of time to save space irrespective of throughput. We can configure the retention period, which is by default seven days, to whatever number of days we choose. Kafka keeps messages up to the defined period of time and then ultimately deletes them.
Space Retention Policy: We can also configure Kafka topics to clear messages when the size reaches the threshold mentioned in the configuration. However, this scenario may occur if you haven't done enough capacity planning before deploying Kafka into your organization.
Offset: Each message in Kafka is assigned with a number called as an offset. Topics consist of many partitions. Each partition stores messages in the sequence in which they arrive. Consumers acknowledge messages with an offset, which means that all the messages before that message offset are received by the consumer.
Partition: Each Kafka topic consists of a fixed number of partitions. During topic creation in Kafka, you need to configure the number of partitions. Partitions are distributed and help in achieving high throughput.
Compaction: Topic compaction was introduced in Kafka 0.8. There is no way to change previous messages in Kafka; messages only get deleted when the retention period is over. Sometimes, you may get new Kafka messages with the same key that includes a few changes, and on the consumer side, you only want to process the latest data. Compaction helps you achieve this goal by compacting all messages with the same key and creating a map offset for key: offset. It helps in removing duplicates from a large number of messages.
Leader: Partitions are replicated across the Kafka cluster based on the replication factor specified. Each partition has a leader broker and followers and all the read write requests to the partition will go through the leader only. If the leader fails, another leader will get elected and the process will resume.
Buffering: Kafka buffers messages both at the producer and consumer side to increase throughput and reduce Input/Output (IO). We will talk about it in detail later.

官术网_书友最值得收藏!

Building Data Streaming Applications with Apache Kafka

Message topics