bandar-logMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 20 (+5.26%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (+831.58%)
Data AcceleratorData Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (+1200%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+1036.84%)
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (+636.84%)
SmooksAn extensible Java framework for building XML and non-XML streaming applications
Stars: ✭ 293 (+1442.11%)
HydrographA visual ETL development and debugging tool for big data
Stars: ✭ 144 (+657.89%)
Presto Go ClientA Presto client for the Go programming language.
Stars: ✭ 183 (+863.16%)
Streamxkafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Stars: ✭ 96 (+405.26%)
Azure Event Hubs SparkEnabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (+636.84%)
SylphStream computing platform for bigdata
Stars: ✭ 362 (+1805.26%)
MahaA framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
Stars: ✭ 101 (+431.58%)
ExamplesDemo applications and code examples for Confluent Platform and Apache Kafka
Stars: ✭ 571 (+2905.26%)
Kafka Connectequivalent to kafka-connect 🔧 for nodejs ✨🐢🚀✨
Stars: ✭ 102 (+436.84%)
ElandPython Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Stars: ✭ 235 (+1136.84%)
RemoraKafka consumer lag-checking application for monitoring, written in Scala and Akka HTTP; a wrap around the Kafka consumer group command. Integrations with Cloudwatch and Datadog. Authentication recently added
Stars: ✭ 183 (+863.16%)
RegistrySchema Registry
Stars: ✭ 184 (+868.42%)
StoragetapperStorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Stars: ✭ 232 (+1121.05%)
Kafka UiOpen-Source Web GUI for Apache Kafka Management
Stars: ✭ 230 (+1110.53%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+105.26%)
StreamlineStreamLine - Streaming Analytics
Stars: ✭ 151 (+694.74%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+1800%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (+315.79%)
BenthosFancy stream processing made operationally mundane
Stars: ✭ 3,705 (+19400%)
PrestoThe official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (+68094.74%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (+1857.89%)
Vflow Enterprise Network Flow Collector (IPFIX, sFlow, Netflow) from Verizon Media
Stars: ✭ 776 (+3984.21%)
Hazelcast JetDistributed Stream and Batch Processing
Stars: ✭ 855 (+4400%)
Dockerfiles50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu
Stars: ✭ 847 (+4357.89%)
BurrowuiThis is a NodeJS/Angular 2 frontend UI for Kafka cluster monitoring with Burrow
Stars: ✭ 69 (+263.16%)
Aws Etl OrchestratorA serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Stars: ✭ 245 (+1189.47%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (+410.53%)
CmakCMAK is a tool for managing Apache Kafka clusters
Stars: ✭ 10,544 (+55394.74%)
Kafka Streamsequivalent to kafka-streams 🐙 for nodejs ✨🐢🚀✨
Stars: ✭ 613 (+3126.32%)
TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+24010.53%)
SpartaReal Time Analytics and Data Pipelines based on Spark Streaming
Stars: ✭ 513 (+2600%)
Go StreamsA lightweight stream processing library for Go
Stars: ✭ 615 (+3136.84%)
PrometheusKubernetes Setup for Prometheus and Grafana
Stars: ✭ 824 (+4236.84%)
StormMirror of Apache Storm
Stars: ✭ 6,297 (+33042.11%)
AngelA Flexible and Powerful Parameter Server for large-scale machine learning
Stars: ✭ 6,458 (+33889.47%)
Quarkus Microservices PocVery simplified shop sales system made in a microservices architecture using quarkus
Stars: ✭ 16 (-15.79%)
Bigdataguide大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (+4200%)
FluxSuccessor: https://github.com/fluxcd/flux2 — The GitOps Kubernetes operator
Stars: ✭ 6,688 (+35100%)
Awesome SreA curated list of Site Reliability and Production Engineering resources.
Stars: ✭ 7,687 (+40357.89%)
SqoopMirror of Apache Sqoop
Stars: ✭ 817 (+4200%)
PromgenPromgen is a configuration file generator for Prometheus
Stars: ✭ 754 (+3868.42%)
Stream ReactorStreaming reference architecture for ETL with Kafka and Kafka-Connect. You can find more on http://lenses.io on how we provide a unified solution to manage your connectors, most advanced SQL engine for Kafka and Kafka Streams, cluster monitoring and alerting, and more.
Stars: ✭ 753 (+3863.16%)
KafkacenterKafkaCenter is a unified platform for Kafka cluster management and maintenance, producer / consumer monitoring, and use of ecological components.
Stars: ✭ 896 (+4615.79%)