摘要:Type–类型必须为“memory”capacity100通道事务中存储的最大事件数capacity100每个事务中保持活动的最大事件数量3添加或删除操作字节的超时CapacityBufferPercentage 20定义缓冲字节的百分比Capacity和通道中所有事件的估计总大小,考虑到数据的标头。请参阅下面的字节容量描述。此通道中允许的最大内存总字节数。实现只计算Eventbody,这也是提供byCapacityBufferPercentage配置参数的原因。默认值为JVM可用最大内存的80%。请注意,如果您在一个JVM上有多个内存通道,并且它们恰好保存相同的物理事件,那么主机事件大小可能会按容量购买通道而加倍计算。将此值设置为0将导致此值回落到大约200GB的硬内部限制。案例:请参阅入门案例2。JDBC通道事件存储在可靠的数据库中。Type–Type,必须是“file”checkpointDir~/Flume/file channel/checkpoint检查点文件的位置使用DualCheckpointfalseBackupCheckpointDir如果这是真的,则BackupCheckpoint Dir必须是BackupCheckPointDir–检查点备份到的目录。此目录不能是数据目录或检查点目录dataDirs~/。水槽/文件通道/数据的逗号分隔目录列表用于存储日志文件。内存存储通过嵌入式文件通道进行管理。
!!!1.Memory Channel 内存通道
事件将被存储在内存中的具有指定大小的队列中。
非常适合那些需要高吞吐量但是失败是会丢失数据的场景下。
属性说明:
!type – 类型,必须是“memory”
capacity 100 事件存储在信道中的最大数量
transactionCapacity 100 每个事务中的最大事件数
keep-alive 3 添加或删除操作的超时时间
byteCapacityBufferPercentage 20 Defines the percent of buffer between byteCapacity and the estimated total size of all events in the channel, to account for data in headers. See below.
byteCapacity see description Maximum total bytes of memory allowed as a sum of all events in this channel. The implementation only counts the Event body, which is the reason for providing the byteCapacityBufferPercentage configuration parameter as well. Defaults to a computed value equal to 80% of the maximum memory available to the JVM (i.e. 80% of the -Xmx value passed on the command line). Note that if you have multiple memory channels on a single JVM, and they happen to hold the same physical events (i.e. if you are using a replicating channel selector from a single source) then those event sizes may be double-counted for channel byteCapacity purposes. Setting this value to 0 will cause this value to fall back to a hard internal limit of about 200 GB.
案例:参看入门案例
2.JDBC Channel
事件被持久存储在可靠的数据库中。目前支持嵌入式的Derby数据库。如果可恢复性非常的重要可以使用这种方式。
!!!3.File Channel
性能会比较低下,但是即使程序出错数据不会丢失
属性说明:
!type – 类型,必须是“file”
checkpointDir ~/.flume/file-channel/checkpoint 检查点文件存放的位置
useDualCheckpoints false Backup the checkpoint. If this is set to true, backupCheckpointDir must be set
backupCheckpointDir – The directory where the checkpoint is backed up to. This directory must not be the same as the data directories or the checkpoint directory
dataDirs ~/.flume/file-channel/data 逗号分隔的目录列表,用以存放日志文件。使用单独的磁盘上的多个目录可以提高文件通道效率。
transactionCapacity 10000 The maximum size of transaction supported by the channel
checkpointInterval 30000 Amount of time (in millis) between checkpoints
maxFileSize 2146435071 一个日志文件的最大尺寸
minimumRequiredSpace 524288000 Minimum Required free space (in bytes). To avoid data corruption, File Channel stops accepting take/put requests when free space drops below this value
capacity 1000000 Maximum capacity of the channel
keep-alive 3 Amount of time (in sec) to wait for a put operation
use-log-replay-v1 false Expert: Use old replay logic
use-fast-replay false Expert: Replay without using queue
checkpointOnClose true Controls if a checkpoint is created when the channel is closed. Creating a checkpoint on close speeds up subsequent startup of the file channel by avoiding replay.
encryption.activeKey – Key name used to encrypt new data
encryption.cipherProvider – Cipher provider type, supported types: AESCTRNOPADDING
encryption.keyProvider – Key provider type, supported types: JCEKSFILE
encryption.keyProvider.keyStoreFile – Path to the keystore file
encrpytion.keyProvider.keyStorePasswordFile – Path to the keystore password file
encryption.keyProvider.keys – List of all keys (e.g. history of the activeKey setting)
encyption.keyProvider.keys.*.passwordFile – Path to the optional key password file
!!!4.Spillable Memory Channel -- 内存溢出通道
事件被存储在内存队列和磁盘中。
内存队列作为主存储,而磁盘作为溢出内容的存储。
内存存储通过embedded File channel来进行管理。
当内存队列已满时,后续的事件将被存储在文件通道中。
这个通道适用于正常操作期间适用内存通道已期实现高效吞吐,而在高峰期间适用文件通道实现高耐受性。通过降低吞吐效率提高系统可耐受性。
如果Agent崩溃,则只有存储在文件系统中的事件可以被恢复。
此通道处于试验阶段,不建议在生产环境中使用。
属性说明:
!type – 类型,必须是"SPILLABLEMEMORY"
memoryCapacity 10000 内存中存储事件的最大值,如果想要禁用内存缓冲区将此值设置为0。
overflowCapacity 100000000 可以存储在磁盘中的事件数量最大值。设置为0可以禁用磁盘存储。
overflowTimeout 3 The number of seconds to wait before enabling disk overflow when memory fills up.
byteCapacityBufferPercentage 20 Defines the percent of buffer between byteCapacity and the estimated total size of all events in the channel, to account for data in headers. See below.
byteCapacity see description Maximum bytes of memory allowed as a sum of all events in the memory queue. The implementation only counts the Event body, which is the reason for providing the byteCapacityBufferPercentage configuration parameter as well. Defaults to a computed value equal to 80% of the maximum memory available to the JVM (i.e. 80% of the -Xmx value passed on the command line). Note that if you have multiple memory channels on a single JVM, and they happen to hold the same physical events (i.e. if you are using a replicating channel selector from a single source) then those event sizes may be double-counted for channel byteCapacity purposes. Setting this value to 0 will cause this value to fall back to a hard internal limit of about 200 GB.
avgEventSize 500 Estimated average size of events, in bytes, going into the channel
<file channel properties> see file channel Any file channel property with the exception of ‘keep-alive’ and ‘capacity’ can be used. The keep-alive of file channel is managed by Spillable Memory Channel. Use ‘overflowCapacity’ to set the File channel’s capacity.
5.自定义渠道
自定义渠道需要自己实现Channel接口。
自定义Channle类及其依赖类必须在Flume启动前放置到类加载的目录下。
参数说明:
type - 自己实现的Channle类的全路径名称
----------------------------------------------
课后练习
flume从多个netcat中获取日志
通过Interceptors在事件中增加host信息
通过selector实现多路复用
不同的ip进行扇出,进入不同的channel和sink进行发送
存储到本地文件系统 存储到hdfs 存储到logger中
整体结构参看图