flume安装及配置介绍(二)

摘要:
注: 环境: skylin-linuxFlume的下载方式:  wgethttp://www.apache.org/dyn/closer.lua/flume/1.6.0/apache-flume-1.6.0-bin.tar.下载完成之后,使用tar进行解压tar-zvxfapache-flume-1.6..0-bin.tar.进入flume的conf配置包中,使用命令touchflume.conf

注: 环境: skylin-linux

Flume的下载方式:  

wget http://www.apache.org/dyn/closer.lua/flume/1.6.0/apache-flume-1.6.0-bin.tar.

下载完成之后,使用tar进行解压

tar -zvxf  apache-flume-1.6..0-bin.tar.

进入flume的conf配置包中,使用命令touch flume.conf,然后cp flume-conf.properties.template flume.conf

使vim/gedit flume.conf 编辑配置文件,需要说明的的是,Flume conf文件用的是Java版的property文件的key-value键值对模式.

在Flume配置文件中,我们需要

1. 需要命名当前使用的Agent的名称.

2. 命名Agent下的source的名字.

3. 命名Agent下的channal的名字.

4. 命名Agent下的sink的名字.

5. 将source和sink通过channal绑定起来.

一般来说,在Flume中会存在着多个Agent,所以我们需要给它们分别取一个名字来区分它们,注意名字不要相同,名字保持唯一!

例如:

#Agent取名为 agent_name
#source 取名为 source_name ,一次类推
agent_name.source =source_name
agent_name.channels =channel_name
agent_name.sinks = sink_name

上图对应的是单个Agent,单个sink,单个channel情况,如下图

flume安装及配置介绍(二)第1张如果我们需要在一个Agent上配置n个sink,m个channel(n>1, m>1),

那么只需要这样配置即可:

#Agent取名为 agent_name
#source 取名为 source_name ,一次类推
agent_name.source =source_name ,source_name1
agent_name.channels =channel_name,channel_name1
agent_name.sinks = sink_name,sink_name1

上面的配置就表示一个Agent中有两个 source,sink,channel的情况,如图所示

flume安装及配置介绍(二)第2张

以上是对多sink,channel,source情况,对于 多个Agent,只需要给每个Agent取一个独一无二的名字即可!

Flume支持各种各样的sources,sinks,channels,它们支持的类型如下:

SourcesChannelsSinks
  • Avro Source
  • Thrift Source
  • Exec Source
  • JMS Source
  • Spooling Directory Source
  • Twitter 1% firehose Source
  • Kafka Source
  • NetCat Source
  • Sequence Generator Source
  • Syslog Sources
  • Syslog TCP Source
  • Multiport Syslog TCP Source
  • Syslog UDP Source
  • HTTP Source
  • Stress Source
  • Legacy Sources
  • Thrift Legacy Source
  • Custom Source
  • Scribe Source
  • Memory Channel
  • JDBC Channel
  • Kafka Channel
  • File Channel
  • Spillable Memory Channel
  • Pseudo Transaction Channel
  • HDFS Sink
  • Hive Sink
  • Logger Sink
  • Avro Sink
  • Thrift Sink
  • IRC Sink
  • File Roll Sink
  • Null Sink
  • HBaseSink
  • AsyncHBaseSink
  • MorphlineSolrSink
  • ElasticSearchSink
  • Kite Dataset Sink
  • Kafka Sink

以上的类型,你可以根据自己的需求来搭配组合使用,当然如果你愿意,你可以为所欲为的搭配.比如我们使用Avro source类型,采用Memory channel,使用HDFS sink存储,那我们的配置可以接着上的配置这样写

#Agent取名为 agent_name
#source 取名为 source_name ,一次类推
agent_name.source =Avro
agent_name.channels =MemoryChannel
agent_name.sinks = HDFS

当你命名好Agent的组成部分后,你还需要对Agent的组成sources , sinks, channles去一一描述. 下面我们来逐一的细说;

Source的配置

注: 需要特别说明,在Agent中对于存在的N(N>1)个source,其中的每一个source都需要单独进行配置,首先我们需要对source的type进行设置,然后在对每一个type进行对应的属性设置.其通用的模式如下:

agent_name.sources. source_name.type =value 
agent_name.sources. source_name.property2 =value 
agent_name.sources. source_name.property3 = value 

具体的例子,比如我们Source选用的是Avro模式

#Agent取名为 agent_name
#source 取名为 source_name ,一次类推
agent_name.source =Avro
agent_name.channels =MemoryChannel
agent_name.sinks =HDFS
#——————————sourcec配置——————————————#
agent_name.source.Avro.type =avro
agent_name.source.Avro.bind =localhost
agent_name.source.Avro.port = 9696
#将source绑定到MemoryChannel管道上
agent_name.source.Avro.channels = MemoryChannel 

Channels的配置

Flume在source和sink配间提供各种管道(channels)来传递数据.因而和source一样,它也需要配置属性,同source一样,对于N(N>0)个channels,

需要单个对它们注意设置属性,它们的通用模板为:

agent_name.channels.channel_name.type =value 
agent_name.channels.channel_name. property2 =value 
agent_name.channels.channel_name. property3 = value 

具体的例子,假如我们选用memory channel类型,那么我先要配置管道的类型

agent_name.channels.MemoryChannel.type = memory

但是我们现在只是设置好了管道自个儿属性,我们还需要将其和sink,source链接起来,也就是绑定,绑定设置如下,我们可以分别写在source,sink处,也可以集中写在channel处

agent_name.sources.Avro.channels =MemoryChannel
agent_name.sinks.HDFS.channels =  MemoryCHannel

Sink的配置

sink的配置和Source配置类似,它的通用格式:

agent_name.sinks. sink_name.type =value 
agent_name.sinks. sink_name.property2 =value 
agent_name.sinks. sink_name.property3 = value

具体例子,比如我们设置Sink类型为HDFS ,那么我们的配置单就如下:

agent_name.sinks.HDFS.type =hdfs
agent_name.sinks.HDFS.path = HDFS‘s path

以上就是对Flume的配置文件详细介绍,下面在补全一张完整的配置图:

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work foradditional information
# regarding copyright ownership.  The ASF licenses thisfile
# to you under the Apache License, Version 2.0(the
# "License"); you may not use thisfile except in compliance
# with the License.  You may obtain a copy of the License at
#
#  http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS"BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License forthe
# specific language governing permissions and limitations
# under the License.
# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per agent, 
# in this case called 'agent'
#define agent
agent.sources =seqGenSrc
agent.channels =memoryChannel
agent.sinks =loggerSink kafkaSink
#
# For each one of the sources, the type is defined
#默认模式 agent.sources.seqGenSrc.type = seq / netcat /avro
agent.sources.seqGenSrc.type =avro
agent.sources.seqGenSrc.bind =localhost
agent.sources.seqGenSrc.port = 9696
#####数据来源####
#agent.sources.seqGenSrc.coommand = tail -F /home/gongxijun/Qunar/data/data.log
# The channel can be defined as follows.
agent.sources.seqGenSrc.channels =memoryChannel
#+++++++++++++++定义sink+++++++++++++++++++++#
# Each sink's type must be defined

agent.sinks.loggerSink.type =logger
agent.sinks.loggerSink.type =hbase   
agent.sinks.loggerSink.channel =memoryChannel
#表名
agent.sinks.loggerSink.table =flume
#列名
agent.sinks.loggerSink.columnFamily=gxjun
agent.sinks.loggerSink.serializer =org.apache.flume.sink.hbase.MyHbaseEventSerializer 
#agent.sinks.loggerSink.serializer  =org.apache.flume.sink.hbase.RegexHbaseEventSerializer
agent.sinks.loggerSink.zookeeperQuorum=localhost:2181
agent.sinks.loggerSink.znodeParent= /hbase
#Specify the channel the sink should use
agent.sinks.loggerSink.channel =memoryChannel 
# Each channel's type is defined.
#memory
agent.channels.memoryChannel.type =memory
agent.channels.memortChhannel.keep-alive = 10
# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
#agent.channels.memoryChannel.checkpointDir = /home/gongxijun/Qunar/data
#agent.channels.memoryChannel.dataDirs = /home/gongxijun/Qunar/data , /home/gongxijun/Qunar/tmpData
agent.channels.memoryChannel.capacity = 10000000
agent.channels.memoryChannel.transactionCapacity = 10000
#define the sink2 kafka
#+++++++++++++++定义sink+++++++++++++++++++++#
# Each sink's type must be defined

agent.sinks.kafkaSink.type =logger
agent.sinks.kafkaSink.type =org.apache.flume.sink.kafka.KafkaSink
agent.sinks.kafkaSink.channel =memoryChannel
#agent.sinks.kafkaSink.server=localhost:9092
agent.sinks.kafkaSink.topic= kafka-topic
agent.sinks.kafkaSink.batchSize = 20
agent.sinks.kafkaSink.brokerList = localhost:9092
#Specify the channel the sink should use
agent.sinks.kafkaSink.channel = memoryChannel 

该配置类型如下如所示:

flume安装及配置介绍(二)第3张

参考资料:

http://www.tutorialspoint.com/apache_flume/apache_flume_configuration.htm

作者: 龚细军

引用请注明出处:http://www.cnblogs.com/gongxijun/p/5661037.html

免责声明:文章转载自《flume安装及配置介绍(二)》仅用于学习参考。如对内容有疑问,请及时联系本站处理。

上篇IDEA 配置Gradle编译工具springboot 内置tomcat设置下篇

宿迁高防,2C2G15M,22元/月;香港BGP,2C5G5M,25元/月 雨云优惠码:MjYwNzM=

相关文章

flume 1.7在windows下的安装部署与测试运行

一、安装 安装java,配置环境变量。 安装flume,下载地址,下载后直接解压即可。 二、运行 创建配置文件:在解压后的文件 apache-flume-1.7.0-binconf下创建一个example.conf,内容如下 1 # example.conf: A single-node Flume configuration 2 3 # Nam...

flume使用之httpSource

flume自带很长多的source,如:exe、kafka...其中有一个非常简单的source——httpsource,使用httpSource,flume启动后会拉起一个web服务来监听指定的ip和port。常用的使用场景:对于有些应用环境中,不能部署Flume SDK及其依赖项,可以在代码中通过HTTP而不是Flume的PRC发送数据的情况,此时HT...

Flume下载安装

下载 可以apache官网下载flume的安装包 下载时注意,flume具有两个版本,0.9.x和1.x,两个版本并不兼容,我们用最新的1.x版本,也叫flume-ng版本。 安装 解压到指定目录即可        ...

Flume(一) —— 启动与基本使用

基础架构 Flume is a distributed, reliable(可靠地), and available service for efficiently(高效地) collecting, aggregating, and moving large amounts of log data. It has a simple and flexible...

flume 使用手册

以下jie皆来自官网: 1:首先版本是flume 1.8 查看版本: bin/flume-ng version 2:配置与启动 https://flume.apache.org/FlumeUserGuide.html#configuration Defining the flow # list the sources, sinks and channel...

Flume日志收集

一、Flume介绍 Flume是一个分布式、可靠、和高可用的海量日志聚合的系统,支持在系统中定制各类数据发送方,用于收集数据;同时,Flume提供对数据进行简单处理,并写到各种数据接受方(可定制)的能力。 设计目标: (1) 可靠性 当节点出现故障时,日志能够被传送到其他节点上而不会丢失。Flume提供了三种级别的可靠性保障,从强到弱依次分别为:end-t...