官术网_书友最值得收藏!

Flume

Flume is the most famous project of Apache for log processing. To download it, refer to the following link: https://flume.apache.org/download.html. Download the apache-flume-1.7.0-bin.tar.gz Flume setup file and unzip it, as follows:

cp apache-flume-1.7.0-bin.tar.gz ~/demo/
tar -xvf ~/demo/apache-flume-1.7.0-bin.tar.gz

The extracted folders and files will be as per the following screenshot:

We will demonstrate the same example that we executed for the previous tools, involving reading from a file and pushing to a Kafka topic. First, let's configure the Flume file:

    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    a1.sources.r1.type = TAILDIR
    a1.sources.r1.positionFile = /home/ubuntu/demo/flume/tail_dir.json
    a1.sources.r1.filegroups = f1
    a1.sources.r1.filegroups.f1 = /home/ubuntu/demo/files/test
    
    a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
    a1.sinks.k1.kafka.topic = flume-example
    a1.sinks.k1.kafka.bootstrap.servers = localhost:9092
    
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 6
    
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1
  

Flume has three components that define flow. The first are sources, from which the logs or events come. There are multiple sources available in Flume to define the flow. A few are kafka, TAILDIR, and HTTP, and you can also define your own custom source. The second component is sink, which is the destination of events where it will be consumed. The third is channels, which defines the medium between source and sink. The most commonly used channels are Memory, File, and Kafka, but there are also many more. Here, we will use TAILDIR as source, Kafka as sink, and Memory as channel. As of previously configuration a1 is the agent name, r1 is the source, k1 is the sink, and c1 is the channel.

Let's start with source configuration. First of all, you have to define the type of source using <agent-name>.<sources/sinks/channels>.<alias name>.type. The next parameter is positionFile which is required to keep track of the tailing file. filegroups indicates a set of files to be tailed. filegroups.<filegroup-name> is the absolute path of the file directory. Sink configuration is simple and straightforward. Kafka requires bootstrap servers and topic names. Channels configuration is long, but here we used only the most important ones. Capacity is the maximum number of events stored in the channel and transaction Capacity is the maximum number of events the channel will take from a source or give to a sink per transaction.

Now, start the Flume agent using the following command:

    bin/flume-ng agent --conf conf --conf-file conf/flume-conf.properties --name a1 -Dflume.root.logger=INFO,console

It will be started and the output will be as follows:

Create a Kafka topic and name it flume-example:

bin/kafka-topics.sh --create --topic flume-example --zookeeper localhost:2181 --partitions 1 --replication-factor 1

Next, start the Kafka console consumer:

bin/kafka-console-consumer.sh --topic flume-example --bootstrap-server localhost:9092

Now, push some messages in the file /home/ubuntu/demo/files/test as in the following screenshot:

The output from Kafka will be as seen in the following screenshot:

主站蜘蛛池模板: 宝山区| 乌兰察布市| 翁牛特旗| 太仓市| 十堰市| 临桂县| 惠水县| 修文县| 壤塘县| 青川县| 福清市| 建瓯市| 石景山区| 柘荣县| 舒城县| 景宁| 宜宾市| 大丰市| 尖扎县| 峡江县| 习水县| 遵化市| 宁陵县| 子长县| 嘉兴市| 秦安县| 龙游县| 隆安县| 资阳市| 莎车县| 海林市| 岑溪市| 开远市| 搜索| 宜春市| 仪征市| 平谷区| 锡林郭勒盟| 大名县| 五家渠市| 游戏|