書名： Building Data Streaming Applications with Apache Kafka
作者名： Manish Kumar Chanchal Singh
本章字數： 703字
更新時間： 2022-07-12 10:38:12

Kafka's architecture

This section introduces you to Kafka architecture. By the end of this section, you will have a clear understanding of both the logical and physical architecture of Kafka. Let's see how Kafka components are organized logically.

Every message in Kafka topics is a collection of bytes. This collection is represented as an array. Producers are the applications that store information in Kafka queues. They send messages to Kafka topics that can store all types of messages. Every topic is further differentiated into partitions. Each partition stores messages in the sequence in which they arrive. There are two major operations that producers/consumers can perform in Kafka. Producers append to the end of the write-ahead log files. Consumers fetch messages from these log files belonging to a given topic partition. Physically, each topic is spread over different Kafka brokers, which host one or two partitions of each topic.

Ideally, Kafka pipelines should have a uniform number of partitions per broker and all topics on each machine. Consumers are applications or processes that subscribe to a topic or receive messages from these topics.

The following diagram shows you the conceptual layout of a Kafka cluster:

Kafka's logical architecture

The preceding paragraphs explain the logical architecture of Kafka and how different logical components coherently work together. While it is important to understand how Kafka architecture is divided logically, you also need to understand what Kafka's physical architecture looks like. This will help you in later chapters as well. A Kafka cluster is basically composed of one or more servers (nodes). The following diagram depicts how a multi-node Kafka cluster looks:

Kafka's physical architecture

A typical Kafka cluster consists of multiple brokers. It helps in load-balancing message reads and writes to the cluster. Each of these brokers is stateless. However, they use Zookeeper to maintain their states. Each topic partition has one of the brokers as a leader and zero or more brokers as followers. The leaders manage any read or write requests for their respective partitions. Followers replicate the leader in the background without actively interfering with the leader's working. You should think of followers as a backup for the leader and one of those followers will be chosen as the leader in the case of leader failure.

Each server in a Kafka cluster will either be a leader for some of the topic's partitions or a follower for others. In this way, the load on every server is equally balanced. Kafka broker leader election is done with the help of Zookeeper.

Zookeeper is an important component of a Kafka cluster. It manages and coordinates Kafka brokers and consumers. Zookeeper keeps track of any new broker additions or any existing broker failures in the Kafka cluster. Accordingly, it will notify the producer or consumers of Kafka queues about the cluster state. This helps both producers and consumers in coordinating work with active brokers. Zookeeper also records which broker is the leader for which topic partition and passes on this information to the producer or consumer to read and write the messages.

At this juncture, you must be familiar with producer and consumer applications with respect to the Kafka cluster. However, it is beneficial to touch on these briefly so that you can verify your understanding. Producers push data to brokers. At the time of publishing data, producers search for the elected leader (broker) of the respective topic partition and automatically send a message to that leader broker server. Similarly, the consumer reads messages from brokers.

The consumer records its state with the help of Zookeeper as Kafka brokers are stateless. This design helps in scaling Kafka well. The consumer offset value is maintained by Zookeeper. The consumer records how many messages have been consumed by it using partition offset. It ultimately acknowledges that message offset to Zookeeper. It means that the consumer has consumed all prior messages.

This brings us to an end of our section on Kafka architecture. Hopefully, by this time, you are well versed with Kafka architecture and understand all logical and physical components. The next sections cover each of these components in detail. However, it is imperative that you understand the overall Kafka architecture before delving into each of the components.

官术网_书友最值得收藏!

Building Data Streaming Applications with Apache Kafka

Kafka's architecture