Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

Hurence / Logisland

Licence: other

Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Programming Languages

java

68154 projects - #9 most used programming language

Labels

elasticsearch kafka spark analytics big-data influxdb cassandra stream-processing solr kafka-streams

Projects that are alternatives of or similar to Logisland

Gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Stars: ✭ 216 (+122.68%)

Mutual labels: kafka, spark, big-data, elasticsearch, cassandra

Flink Learning

flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例，还有 Flink 落地应用的大型项目案例（PVUV、日志存储、百亿数据实时去重、监控告警）分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》

Stars: ✭ 11,378 (+11629.9%)

Mutual labels: kafka, spark, stream-processing, elasticsearch, influxdb

Nagios Plugins

450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...

Stars: ✭ 1,000 (+930.93%)

Mutual labels: kafka, solr, elasticsearch, cassandra

Kafka Streams

equivalent to kafka-streams 🐙 for nodejs ✨🐢🚀✨

Stars: ✭ 613 (+531.96%)

Mutual labels: kafka, big-data, stream-processing, kafka-streams

Stream Reactor

Streaming reference architecture for ETL with Kafka and Kafka-Connect. You can find more on http://lenses.io on how we provide a unified solution to manage your connectors, most advanced SQL engine for Kafka and Kafka Streams, cluster monitoring and alerting, and more.

Stars: ✭ 753 (+676.29%)

Mutual labels: kafka, elasticsearch, influxdb, cassandra

Samsara

Samsara is a real-time analytics platform

Stars: ✭ 132 (+36.08%)

Mutual labels: kafka, analytics, stream-processing, elasticsearch

Data Accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (+154.64%)

Mutual labels: kafka, spark, big-data, kafka-streams

Kafka Connect Ui

Web tool for Kafka Connect |

Stars: ✭ 388 (+300%)

Mutual labels: kafka, elasticsearch, influxdb, cassandra

Dockerfiles

50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu

Stars: ✭ 847 (+773.2%)

Mutual labels: kafka, spark, solr, cassandra

Graylog Plugin Metrics Reporter

Graylog Metrics Reporter Plugins

Stars: ✭ 71 (-26.8%)

Mutual labels: elasticsearch, influxdb, cassandra

Go Streams

A lightweight stream processing library for Go

Stars: ✭ 615 (+534.02%)

Mutual labels: kafka, stream-processing, kafka-streams

Faust

Python Stream Processing

Stars: ✭ 5,899 (+5981.44%)

Mutual labels: kafka, stream-processing, kafka-streams

Sparta

Real Time Analytics and Data Pipelines based on Spark Streaming

Stars: ✭ 513 (+428.87%)

Mutual labels: kafka, spark, analytics

Freestyle

A cohesive & pragmatic framework of FP centric Scala libraries

Stars: ✭ 627 (+546.39%)

Mutual labels: kafka, spark, cassandra

Pdf

编程电子书，电子书，编程书籍，包括C，C#，Docker，Elasticsearch，Git，Hadoop，HeadFirst，Java，Javascript，jvm，Kafka，Linux，Maven，MongoDB，MyBatis，MySQL，Netty，Nginx，Python，RabbitMQ，Redis，Scala，Solr，Spark，Spring，SpringBoot，SpringCloud，TCPIP，Tomcat，Zookeeper，人工智能，大数据类，并发编程，数据库类，数据挖掘，新面试题，架构设计，算法系列，计算机类，设计模式，软件测试，重构优化，等更多分类

Stars: ✭ 12,009 (+12280.41%)

Mutual labels: spark, solr, elasticsearch

Agile data code 2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

Stars: ✭ 413 (+325.77%)

Mutual labels: kafka, spark, analytics

Springbootexamples

Spring Boot 学习教程

Stars: ✭ 794 (+718.56%)

Mutual labels: kafka, solr, elasticsearch

Demo Scene

👾Scripts and samples to support Confluent Demos and Talks. ⚠️Might be rough around the edges ;-) 👉For automated tutorials and QA'd code, see https://github.com/confluentinc/examples/

Stars: ✭ 806 (+730.93%)

Mutual labels: kafka, kafka-streams, elasticsearch

Szt Bigdata

深圳地铁大数据客流分析系统🚇🚄🌟

Stars: ✭ 826 (+751.55%)

Mutual labels: kafka, spark, elasticsearch

Kspp

A high performance/ real-time C++ Kafka streams framework (C++17)

Stars: ✭ 80 (-17.53%)

Mutual labels: kafka, stream-processing, kafka-streams

View All Similar Projects ➔

Logisland

.. image:: https://travis-ci.org/Hurence/logisland.svg?branch=master :target: https://travis-ci.org/Hurence/logisland

.. image:: https://badges.gitter.im/Join%20Chat.svg :target: https://gitter.im/logisland/logisland?utm_source=share-link&utm_medium=link&utm_campaign=share-link :alt: Gitter

Download the latest release build <https://github.com/Hurence/logisland/releases>_ and chat with us on gitter <https://gitter.im/logisland/logisland>_

LogIsland is an event mining scalable platform designed to handle a high throughput of events.

It is highly inspired from DataFlow programming tools such as Apache Nifi, but with a highly scalable architecture.

LogIsland is completely open source and free even for commercial use. Hurence provides support if required.

Event mining Workflow

Here is an example of a typical event mining pipeline.

Raw events (sensor data, logs, user click stream, ...) are sent to Kafka topics by a NIFI / Logstash / *Beats / Flume / Collectd (or whatever) agent
Raw events are structured in Logisland Records, then processed and eventually pushed back to another Kafka topic by a Logisland streaming job
Records are sent to external short living storage (Elasticsearch, Solr, Couchbase, ...) for online analytics.
Records are sent to external long living storage (HBase, HDFS, ...) for offline analytics (aggregated reports or ML models).
Logisland Processors handle Records to produce Alerts and Information from ML models

Online documentation

You can find the latest Logisland documentation, including a programming guide, on the project web page. <http://logisland.readthedocs.io/en/latest/index.html>_ This README file only contains basic setup instructions.

Browse the Java API documentation <http://logisland.readthedocs.io/en/latest/_static/apidocs/>_ for more information.

You can follow one getting started guide through the apache log indexing tutorial <http://logisland.readthedocs.io/en/latest/tutorials/index-apache-logs.html>_.

Building Logisland

to build from the source just clone source and package with maven (logisland requires a maven 3.5.2 version and beyond)

.. code-block:: sh

git clone https://github.com/Hurence/logisland.git
cd logisland
mvn clean package

the final package is available at logisland-assembly/target/logisland-1.3.0-bin.tar.gz

You can also download the latest release build <https://github.com/Hurence/logisland/releases>_

If you want to build with opencv support, please install OpenCV first and then

 mvn clean package -Dopencv

Quick start

Local Setup +++++++++++ Alternatively you can deploy logisland on any linux server from which Kafka and Spark are available

Replace all versions in the below code by the required versions (spark version, logisland version on specific HDP version, kafka scala version and kafka version etc.)

The Kafka distributions are available at this address: https://kafka.apache.org/downloads

Last tested version of scala version for kafka is: 2.11 with preferred release of kafka : 0.10.2.2

Last tested version of Spark is: 2.3.1 on Hadoop version: 2.7

But you should choose the Spark version that is compatible with your environment and hadoop installation if you have one (for example Spark 2.1.0 on hadoop 2.7). Note that hadoop 2.7 can run Spark 2.4.x, 2.3.x, 2.2.x, 2.1.x. Check at this URL what is available : http://d3kbcqa49mib13.cloudfront.net/

.. code-block:: sh

# install Kafka & start a zookeeper node + a broker
curl -s https://www-us.apache.org/dist/kafka/<kafka_release>/kafka_scala_version>-<kafka_version>.tgz | tar -xz -C /usr/local/
cd /usr/local/kafka_<scala_version>-<kafka_version>
nohup bin/zookeeper-server-start.sh config/zookeeper.properties > zookeeper.log 2>&1 &
JMX_PORT=10101 nohup bin/kafka-server-start.sh config/server.properties > kafka.log 2>&1 &

# install Spark (choose the spark version compatible with your hadoop distrib if you have one)
curl -s http://d3kbcqa49mib13.cloudfront.net/spark-<spark-version>-bin-hadoop<hadoop-version>.tgz | tar -xz -C /usr/local/
export SPARK_HOME=/usr/local/spark-<spark-version>-bin-hadoop<hadoop-version>

# install Logisland 1.3.0
curl -s https://github.com/Hurence/logisland/releases/download/v1.0.0-RC2/logisland-1.0.0-RC2-bin.tar.gz  | tar -xz -C /usr/local/
cd /usr/local/logisland-1.3.0

# launch a logisland job
bin/logisland.sh --conf conf/index-apache-logs.yml

you can find some logisland job configuration samples under $LOGISLAND_HOME/conf folder

Docker setup ++++++++++++ The easiest way to start is the launch a docker compose stack

.. code-block:: sh

# launch logisland environment
cd /tmp
curl -s https://raw.githubusercontent.com/Hurence/logisland/master/logisland-framework/logisland-resources/src/main/resources/conf/docker-compose.yml > docker-compose.yml
docker-compose up

# sample execution of a logisland job
docker exec -i -t logisland conf/index-apache-logs.yml

Hadoop distribution setup +++++++++++++++++++++++++ Launching logisland streaming apps is just easy as unarchiving logisland distribution on an edge node, editing a config with YARN parameters and submitting job.

.. code-block:: sh

# install Logisland 1.3.0
curl -s https://github.com/Hurence/logisland/releases/download/v0.10.0/logisland-1.3.0-bin-hdp2.5.tar.gz  | tar -xz -C /usr/local/
cd /usr/local/logisland-1.3.0
bin/logisland.sh --conf conf/index-apache-logs.yml

Start a stream processing job

A Logisland stream processing job is made of a bunch of components. At least one streaming engine and 1 or more stream processors. You set them up by a YAML configuration file.

Please note that events are serialized against an Avro schema while transiting through any Kafka topic. Every spark.streaming.batchDuration (time window), each processor will handle its bunch of Records to eventually generate some new Records to the output topic.

The following configuration.yml file contains a sample of job that parses raw Apache logs and send them to Elasticsearch.

The first part is the ProcessingEngine configuration (here a Spark streaming engine)

.. code-block:: yaml

version: 1.3.0
documentation: LogIsland job config file
engine:
  component: com.hurence.logisland.engine.spark.KafkaStreamProcessingEngine
  type: engine
  documentation: Index some apache logs with logisland
  configuration:
    spark.app.name: IndexApacheLogsDemo
    spark.master: yarn-cluster
    spark.driver.memory: 1G
    spark.driver.cores: 1
    spark.executor.memory: 2G
    spark.executor.instances: 4
    spark.executor.cores: 2
    spark.yarn.queue: default
    spark.yarn.maxAppAttempts: 4
    spark.yarn.am.attemptFailuresValidityInterval: 1h
    spark.yarn.max.executor.failures: 20
    spark.yarn.executor.failuresValidityInterval: 1h
    spark.task.maxFailures: 8
    spark.serializer: org.apache.spark.serializer.KryoSerializer
    spark.streaming.batchDuration: 4000
    spark.streaming.backpressure.enabled: false
    spark.streaming.unpersist: false
    spark.streaming.blockInterval: 500
    spark.streaming.kafka.maxRatePerPartition: 3000
    spark.streaming.timeout: -1
    spark.streaming.unpersist: false
    spark.streaming.kafka.maxRetries: 3
    spark.streaming.ui.retainedBatches: 200
    spark.streaming.receiver.writeAheadLog.enable: false
    spark.ui.port: 4050
  controllerServiceConfigurations:

Then comes a list of ControllerService which are the shared components that interact with outside world (Elasticearch, HBase, ...)

.. code-block:: yaml

    - controllerService: datastore_service
      component: com.hurence.logisland.service.elasticsearch.Elasticsearch_6_6_2_ClientService
      type: service
      documentation: elasticsearch service
      configuration:
        hosts: sandbox:9200
        batch.size: 5000

Then comes a list of RecordStream, each of them route the input batch of Record through a pipeline of Processor to the output topic

.. code-block:: yaml

  streamConfigurations:
    - stream: parsing_stream
      component: com.hurence.logisland.stream.spark.KafkaRecordStreamParallelProcessing
      type: stream
      documentation: a processor that converts raw apache logs into structured log records
      configuration:
        kafka.input.topics: logisland_raw
        kafka.output.topics: logisland_events
        kafka.error.topics: logisland_errors
        kafka.input.topics.serializer: none
        kafka.output.topics.serializer: com.hurence.logisland.serializer.KryoSerializer
        kafka.error.topics.serializer: com.hurence.logisland.serializer.JsonSerializer
        kafka.metadata.broker.list: sandbox:9092
        kafka.zookeeper.quorum: sandbox:2181
        kafka.topic.autoCreate: true
        kafka.topic.default.partitions: 4
        kafka.topic.default.replicationFactor: 1

Then come the configurations of all the Processor pipeline. Each Record will go through these components. Here we first parse raw apache logs and then we add those records to Elasticsearch. Please note that the datastore processor makes use of the previously defined ControllerService.

.. code-block:: yaml

      processorConfigurations:

        - processor: apache_parser
          component: com.hurence.logisland.processor.SplitText
          type: parser
          documentation: a parser that produce records from an apache log REGEX
          configuration:
            record.type: apache_log
            value.regex: (\S+)\s+(\S+)\s+(\S+)\s+\[([\w:\/]+\s[+\-]\d{4})\]\s+"(\S+)\s+(\S+)\s*(\S*)"\s+(\S+)\s+(\S+)
            value.fields: src_ip,identd,user,record_time,http_method,http_query,http_version,http_status,bytes_out

        - processor: es_publisher
          component: com.hurence.logisland.processor.datastore.BulkPut
          type: processor
          documentation: a processor that indexes processed events in elasticsearch
          configuration:
            datastore.client.service: datastore_service
            default.collection: logisland
            default.type: event
            timebased.collection: yesterday
            collection.field: search_index
            type.field: record_type

Once you've edited your configuration file, you can submit it to execution engine with the following cmd :

.. code-block:: bash

bin/logisland.sh -conf conf/job-configuration.yml

You should jump to the tutorials section <http://logisland.readthedocs.io/en/latest/tutorials/index.html>_ of the documentation. And then continue with components documentation <http://logisland.readthedocs.io/en/latest/components.html>_

Contributing

Please review the Contribution to Logisland guide <http://logisland.readthedocs.io/en/latest/developer.html>_ for information on how to get started contributing to the project.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 97

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (182) 🔗