All Projects → cognitree → flume-elasticsearch-sink

cognitree / flume-elasticsearch-sink

Licence: Apache-2.0 license
Flume sink plugin for Elasticsearch

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to flume-elasticsearch-sink

Real-time-log-analysis-system
🐧基于spark streaming+flume+kafka+hbase的实时日志处理分析系统(分为控制台版本和基于springboot、Echarts等的Web UI可视化版本)
Stars: ✭ 31 (-20.51%)
Mutual labels:  flume
Flume
Mirror of Apache Flume
Stars: ✭ 2,200 (+5541.03%)
Mutual labels:  flume
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+28082.05%)
Mutual labels:  flume
God Of Bigdata
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+15305.13%)
Mutual labels:  flume
bigdata-fun
A complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-64.1%)
Mutual labels:  flume
BigData-News
基于Spark2.2新闻网大数据实时系统项目
Stars: ✭ 36 (-7.69%)
Mutual labels:  flume
litemall-dw
基于开源Litemall电商项目的大数据项目,包含前端埋点(openresty+lua)、后端埋点;数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化),同时也包含了Azkaban的workflow。
Stars: ✭ 36 (-7.69%)
Mutual labels:  flume
TitanDataOperationSystem
最好的大数据项目。《Titan数据运营系统》,本项目是一个全栈闭环系统,我们有用作数据可视化的web系统,然后用flume-kafaka-flume进行日志的读取,在hive设计数仓,编写spark代码进行数仓表之间的转化以及ads层表到mysql的迁移,使用azkaban进行定时任务的调度,使用技术:Java/Scala语言,Hadoop、Spark、Hive、Kafka、Flume、Azkaban、SpringBoot,Bootstrap, Echart等;
Stars: ✭ 62 (+58.97%)
Mutual labels:  flume
cloud
云计算之hadoop、hive、hue、oozie、sqoop、hbase、zookeeper环境搭建及配置文件
Stars: ✭ 48 (+23.08%)
Mutual labels:  flume
BookRecommenderSystem
基于大数据的图书推荐系统
Stars: ✭ 30 (-23.08%)
Mutual labels:  flume
web-click-flow
网站点击流离线日志分析
Stars: ✭ 14 (-64.1%)
Mutual labels:  flume
aaocp
一个对用户行为日志进行分析的大数据项目
Stars: ✭ 53 (+35.9%)
Mutual labels:  flume
logwatch
日志采集工具
Stars: ✭ 22 (-43.59%)
Mutual labels:  flume

Elasticsearch Sink

The sink reads events from a channel, serializes them into json documents and batches them into a bulk processor. Bulk processor batches the writes to elasticsearch as per configuration.

The elasticsearch index and type for each event can be defined statically in the configuration file or can be derived dynamically using a custom IndexBuilder.

By default, events are assumed to be in json format. This assumption can be overridden by implementing the Serializer interface.

Follow these steps to use this sink in Apache flume:

  • Build the plugin. This command will create the zip file inside the target directory.

mvn clean assembly:assembly

  • Extract the file into the flume installation directories plugin.d folder.

  • Configure the sink in the flume configuration file with properties as below

Required properties are in bold.

Property Name Default Description
channel -
type - The component type name, has to be com.cognitree.flume.sink.elasticsearch.ElasticSearchSink
es.cluster.name elasticsearch Name of the elasticsearch cluster to connect to
es.client.hosts - Comma separated hostname:port pairs ex: host1:9300,host2:9300. The default port is 9300
es.bulkActions 1000 The number of actions to batch into a request
es.bulkProcessor.name flume Name of the bulk processor
es.bulkSize 5 Flush the bulk request every mentioned size
es.bulkSize.unit MB Bulk request unit, supported values are KB and MB
es.concurrent.request 1 The maximum number of concurrent requests to allow while accumulating new bulk requests
es.flush.interval.time 10s Flush a batch as a bulk request every mentioned seconds irrespective of the number of requests
es.backoff.policy.time.interval 50M Backoff policy time interval, wait initially for the 50 miliseconds
es.backoff.policy.retries 8 Number of backoff policy retries
es.index default Index name to be used to store the documents
es.type default Type to be used to store the documents
es.index.builder com.cognitree.
flume.sink.
elasticsearch.
StaticIndexBuilder
Implementation of com.cognitree.flume.sink.elasticsearch.IndexBuilder interface
es.serializer com.cognitree.
flume.sink.
elasticsearch.
SimpleSerializer
Implementation of com.cognitree.flume.sink.elasticsearch.Serializer interface
es.serializer.csv.fields - Comma separated csv field name with data type i.e. column1:type1,column2:type2, Supported data types are string, boolean, int and float
es.serializer.csv.delimiter ,(comma) Delimiter for the data in flume event body
es.serializer.avro.schema.file - Absolute path for the schema configuration file

Example of agent named agent

  agent.channels = es_channel
  agent.sinks = es_sink
  agent.sinks.es_sink.type=com.cognitree.flume.sink.elasticsearch.ElasticSearchSink
  agent.sinks.es_sink.es.bulkActions=5
  agent.sinks.es_sink.es.bulkProcessor.name=bulkprocessor
  agent.sinks.es_sink.es.bulkSize=5
  agent.sinks.es_sink.es.bulkSize.unit=MB
  agent.sinks.es_sink.es.concurrent.request=1
  agent.sinks.es_sink.es.flush.interval.time=5m
  agent.sinks.es_sink.es.backoff.policy.time.interval=50M
  agent.sinks.es_sink.es.backoff.policy.retries=8
  agent.sinks.es_sink.es.cluster.name=es-cluster
  agent.sinks.es_sink.es.client.hosts=127.0.0.1:9300
  agent.sinks.es_sink.es.index=defaultindex
  agent.sinks.es_sink.es.index.builder=com.cognitree.flume.sink.elasticsearch.HeaderBasedIndexBuilder
  agent.sinks.es_sink.es.serializer=com.cognitree.flume.sink.elasticsearch.SimpleSerializer
  agent.sinks.es_sink.es.serializer.csv.fields=id:int,name:string,isemployee:boolean,leaves:float
  agent.sinks.es_sink.es.serializer.csv.delimiter=,
  agent.sinks.es_sink.es.serializer.avro.schema.file=/usr/local/schema.avsc
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].