官术网_书友最值得收藏!

Flume

Flume is the most famous project of Apache for log processing. To download it, refer to the following link: https://flume.apache.org/download.html. Download the apache-flume-1.7.0-bin.tar.gz Flume setup file and unzip it, as follows:

cp apache-flume-1.7.0-bin.tar.gz ~/demo/
tar -xvf ~/demo/apache-flume-1.7.0-bin.tar.gz

The extracted folders and files will be as per the following screenshot:

We will demonstrate the same example that we executed for the previous tools, involving reading from a file and pushing to a Kafka topic. First, let's configure the Flume file:

    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    a1.sources.r1.type = TAILDIR
    a1.sources.r1.positionFile = /home/ubuntu/demo/flume/tail_dir.json
    a1.sources.r1.filegroups = f1
    a1.sources.r1.filegroups.f1 = /home/ubuntu/demo/files/test
    
    a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
    a1.sinks.k1.kafka.topic = flume-example
    a1.sinks.k1.kafka.bootstrap.servers = localhost:9092
    
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 6
    
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1
  

Flume has three components that define flow. The first are sources, from which the logs or events come. There are multiple sources available in Flume to define the flow. A few are kafka, TAILDIR, and HTTP, and you can also define your own custom source. The second component is sink, which is the destination of events where it will be consumed. The third is channels, which defines the medium between source and sink. The most commonly used channels are Memory, File, and Kafka, but there are also many more. Here, we will use TAILDIR as source, Kafka as sink, and Memory as channel. As of previously configuration a1 is the agent name, r1 is the source, k1 is the sink, and c1 is the channel.

Let's start with source configuration. First of all, you have to define the type of source using <agent-name>.<sources/sinks/channels>.<alias name>.type. The next parameter is positionFile which is required to keep track of the tailing file. filegroups indicates a set of files to be tailed. filegroups.<filegroup-name> is the absolute path of the file directory. Sink configuration is simple and straightforward. Kafka requires bootstrap servers and topic names. Channels configuration is long, but here we used only the most important ones. Capacity is the maximum number of events stored in the channel and transaction Capacity is the maximum number of events the channel will take from a source or give to a sink per transaction.

Now, start the Flume agent using the following command:

    bin/flume-ng agent --conf conf --conf-file conf/flume-conf.properties --name a1 -Dflume.root.logger=INFO,console

It will be started and the output will be as follows:

Create a Kafka topic and name it flume-example:

bin/kafka-topics.sh --create --topic flume-example --zookeeper localhost:2181 --partitions 1 --replication-factor 1

Next, start the Kafka console consumer:

bin/kafka-console-consumer.sh --topic flume-example --bootstrap-server localhost:9092

Now, push some messages in the file /home/ubuntu/demo/files/test as in the following screenshot:

The output from Kafka will be as seen in the following screenshot:

主站蜘蛛池模板: 肃北| 郁南县| 壤塘县| 宝清县| 大荔县| 镇康县| 黎川县| 海兴县| 麻栗坡县| 驻马店市| 同江市| 改则县| 广德县| 永济市| 泽普县| 阿鲁科尔沁旗| 龙岩市| 禄丰县| 运城市| 秦皇岛市| 新津县| 滦平县| 宣汉县| 达日县| 吉木萨尔县| 武安市| 金乡县| 九江市| 遂溪县| 百色市| 望谟县| 钦州市| 宿松县| 山东省| 鄂托克旗| 泰安市| 新巴尔虎右旗| 涟水县| 武乡县| 布尔津县| 铜梁县|