- Building Data Streaming Applications with Apache Kafka
- Manish Kumar Chanchal Singh
- 554字
- 2022-07-12 10:38:12
Replication and replicated logs
Replication is one of the most important factors in achieving reliability for Kafka systems. Replicas of message logs for each topic partition are maintained across different servers in a Kafka cluster. This can be configured for each topic separately. What it essentially means is that for one topic, you can have the replication factor as 3 and for another, you can use 5. All the reads and writes happen through the leader; if the leader fails, one of the followers will be elected as leader.
Generally, followers keep a copy of the leader's log, which means that the leader does not make the message as committed until it receives acknowledgment from all the followers. There are different ways that the log replication algorithm has been implemented; it should ensure that, if leader tells the producer that the message is committed, it must be available for the consumer to read it.
To maintain such replica consistency, there are two approaches. In both approaches, there will be a leader through which all the read and write requests will be processed. There is a slight difference in replica management and leader election:
- Quorum-based approach: In this approach, the leader will mark messages committed only when the majority of replicas have an acknowledged receiving the message. If the leader fails, the election of the new a leader will only happen with coordination between followers. There are many algorithms that exist for electing leader and going to depth of those algorithm is beyond the scope of this book. Zookeeper follows a quorum-based approach for leader election.
- Primary backup approach: Kafka follows a different approach to maintaining replicas; the leader in Kafka waits for an acknowledgement from all the followers before marking the message as committed. If the leader fails, any of the followers can take over as leader.
This approach can cost you more in terms of latency and throughput but this will guarantee better consistency for messages or data. Each leader records an in-sync replica set abbreviated to in sync replica (ISR). This means that for each partition, we will have a leader and ISR stored in Zookeeper. Now the writes and reads will happen as follows:
-
- Write: All the leaders and followers have their own local log where they maintain the log end offset that represents the tail of the log. The last committed message offset is called the High Watermark. When a client requests to write a message to partition, it first picks the leader of the partition from Zookeeper and creates a write request. The leader writes a message to the log and subsequently waits for the followers in ISR to return an acknowledgement. Once acknowledgement is received, it simply increases the pointer to High Watermark and sends an acknowledgment to the client. If any followers present in ISR fail, the leader simply drops them from ISR and continues its operation with other followers. Once failed followers come back, they catch up with a leader by making the logs sync. Now, the leader adds this follower to ISR again.
- Read: All the reads happen through the leader only. The message that is acknowledged successfully by the leader will be available for the client to read.
Here is the diagram that will clear the Kafka Log Implementation:

- Android項(xiàng)目開發(fā)入門教程
- 自己動(dòng)手實(shí)現(xiàn)Lua:虛擬機(jī)、編譯器和標(biāo)準(zhǔn)庫(kù)
- Leap Motion Development Essentials
- Vue.js快速入門與深入實(shí)戰(zhàn)
- Network Automation Cookbook
- Learning SAP Analytics Cloud
- 差分進(jìn)化算法及其高維多目標(biāo)優(yōu)化應(yīng)用
- Hands-On Automation Testing with Java for Beginners
- Yii Project Blueprints
- Mastering Bootstrap 4
- Node.js實(shí)戰(zhàn):分布式系統(tǒng)中的后端服務(wù)開發(fā)
- 創(chuàng)新工場(chǎng)講AI課:從知識(shí)到實(shí)踐
- 你必須知道的.NET(第2版)
- Node.js Web Development
- MATLAB語(yǔ)言及編程實(shí)踐:生物數(shù)學(xué)模型應(yīng)用