官术网_书友最值得收藏!

Flume

Flume is the most famous project of Apache for log processing. To download it, refer to the following link: https://flume.apache.org/download.html. Download the apache-flume-1.7.0-bin.tar.gz Flume setup file and unzip it, as follows:

cp apache-flume-1.7.0-bin.tar.gz ~/demo/
tar -xvf ~/demo/apache-flume-1.7.0-bin.tar.gz

The extracted folders and files will be as per the following screenshot:

We will demonstrate the same example that we executed for the previous tools, involving reading from a file and pushing to a Kafka topic. First, let's configure the Flume file:

    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    a1.sources.r1.type = TAILDIR
    a1.sources.r1.positionFile = /home/ubuntu/demo/flume/tail_dir.json
    a1.sources.r1.filegroups = f1
    a1.sources.r1.filegroups.f1 = /home/ubuntu/demo/files/test
    
    a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
    a1.sinks.k1.kafka.topic = flume-example
    a1.sinks.k1.kafka.bootstrap.servers = localhost:9092
    
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 6
    
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1
  

Flume has three components that define flow. The first are sources, from which the logs or events come. There are multiple sources available in Flume to define the flow. A few are kafka, TAILDIR, and HTTP, and you can also define your own custom source. The second component is sink, which is the destination of events where it will be consumed. The third is channels, which defines the medium between source and sink. The most commonly used channels are Memory, File, and Kafka, but there are also many more. Here, we will use TAILDIR as source, Kafka as sink, and Memory as channel. As of previously configuration a1 is the agent name, r1 is the source, k1 is the sink, and c1 is the channel.

Let's start with source configuration. First of all, you have to define the type of source using <agent-name>.<sources/sinks/channels>.<alias name>.type. The next parameter is positionFile which is required to keep track of the tailing file. filegroups indicates a set of files to be tailed. filegroups.<filegroup-name> is the absolute path of the file directory. Sink configuration is simple and straightforward. Kafka requires bootstrap servers and topic names. Channels configuration is long, but here we used only the most important ones. Capacity is the maximum number of events stored in the channel and transaction Capacity is the maximum number of events the channel will take from a source or give to a sink per transaction.

Now, start the Flume agent using the following command:

    bin/flume-ng agent --conf conf --conf-file conf/flume-conf.properties --name a1 -Dflume.root.logger=INFO,console

It will be started and the output will be as follows:

Create a Kafka topic and name it flume-example:

bin/kafka-topics.sh --create --topic flume-example --zookeeper localhost:2181 --partitions 1 --replication-factor 1

Next, start the Kafka console consumer:

bin/kafka-console-consumer.sh --topic flume-example --bootstrap-server localhost:9092

Now, push some messages in the file /home/ubuntu/demo/files/test as in the following screenshot:

The output from Kafka will be as seen in the following screenshot:

主站蜘蛛池模板: 白朗县| 宝应县| 长武县| 牟定县| 新兴县| 清苑县| 绿春县| 襄垣县| 黄骅市| 屏东市| 额济纳旗| 辽阳县| 临猗县| 宣武区| 雷波县| 繁昌县| 宕昌县| 沙坪坝区| 柳林县| 聂拉木县| 鄄城县| 沙湾县| 新昌县| 黎平县| 安徽省| 五华县| 寿阳县| 中牟县| 潞西市| 桂东县| 金塔县| 神木县| 澄江县| 定兴县| 阿克| 迭部县| 芦溪县| 视频| 丹棱县| 虹口区| 敦煌市|