官术网_书友最值得收藏!

Taping data from source to the processor - expectations and caveats

In this section, we will discuss the expectations of log streaming tools in terms of performance, reliability, and scalability. The reliability of the system can be identified by message delivery semantics. There are three types of delivery semantics:

  • At most once: Messages are immediately transferred. If the transfer succeeds, the message is never sent out again. However, many failure scenarios can cause lost messages.
  • At least once: Each message is delivered at least once. In failure cases, messages may be delivered twice.
  • Exactly once: Each message is delivered once and only once.

Performance consists of I/O, CPU, and RAM usage and impact. By definition, scalability is the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged in order to accommodate that growth. So, we will identify whether tools are scalable to handle increased loads or not. Scalability can be achieved horizontally and vertically. Horizontally means adding more computing machines and distributing the work, while vertically means increasing the capacity of a single machine in terms of CPU, RAM, or IOPS.

Let's start with NiFi. It is a guaranteed delivery processing engine (exactly once) by default, which maintains write-ahead logs and a content repository to achieve this. Performance depends on the reliability that we choose. In the case of NiFi guaranteed message delivery, all messages are written to the disk and then read from there. It will be slow, but you have to pay in terms of performance if you don't want to lose even a single message. We can create a cluster of NiFi controlled by the NiFi cluster manager. Internally, it is managed by zookeeper to sync all of the nodes. The model is master and slave, but if the master dies then all nodes continue to operate. A restriction will be that no new nodes can join the cluster and you can't change the NiFi flow. So, NiFi is scalable enough to handle the cluster.

Fluentd provides At most once and At least once delivery semantics. Reliability and performance is achieved by using the Buffer plugin. Memory Buffer structure contains a queue of chunks. When the top chunk exceeds the specified size or time limit, a new empty chunk is pushed to the top of the queue. The bottom chunk is written out immediately when a new chunk is pushed. File Buffer provides a persistent buffer implementation. It uses files to store buffer chunks on a disk. As per its documentation, Fluentd is a well-scalable product where M*N, is resolved by M+N where M is the number of input plugins and N is the number of output plugins. By configuring multiple log forwarders and log aggregators, we can achieve scalability.

Logstash reliability and performance is achieved by using third party message brokers. Logstash fits best with Redis. Other input and output plugins are available to integrate with message brokers, like Kafka, RabbitMQ, and so on. Using Filebeat as leaf nodes, we can get scalability with Logstash to read from multiple sources, from the same source with a different log directory, or from the same source and the same directory with a different file filter.

Flumes get its reliability using channels and syncs between sink and channel. The sink removes an event from the channel only after the event is stored into the channel of the next agent or stored in the terminal repository. This is single hop message delivery semantics. Flume uses a transactional approach to guarantee the reliable delivery of the events. To read data over a network, Flume integrates with Avro and Thrift.

主站蜘蛛池模板: 镇安县| 神农架林区| 盐边县| 桑日县| 礼泉县| 康定县| 孟村| 平安县| 澄江县| 同江市| 夏河县| 乡城县| 浠水县| 五寨县| 台南县| 扶沟县| 嘉定区| 民县| 股票| 桓台县| 武山县| 新昌县| 甘洛县| 正镶白旗| 杭锦旗| 沙洋县| 德令哈市| 武川县| 永嘉县| 逊克县| 邢台市| 铜陵市| 开封县| 资中县| 大竹县| 金阳县| 荔波县| 乌什县| 西盟| 井冈山市| 内江市|