flink-crawlerContinuous scalable web crawler built on top of Flink and crawler-commons
mriyaReal-time ETL developed by Flink, data from MySQL to Greenplum. Use canal to parse the MySQL binlog, put it into kafka, use Flink to consume kafka and assemble the data into Greenplum, and more data sources and target sources will be added in the future.
flink-k8s-operatorAn example of building kubernetes operator (Flink) using Abstract operator's framework
fastdata-clusterFast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
litemall-dw基于开源Litemall电商项目的大数据项目,包含前端埋点(openresty+lua)、后端埋点;数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化),同时也包含了Azkaban的workflow。
LarkMidTableLarkMidTable 是一站式开源的数据中台,实现中台的 基础建设,数据治理,数据开发,监控告警,数据服务,数据的可视化,实现高效赋能数据前台并提供数据服务的产品。
fb scraperFBLYZE is a Facebook scraping system and analysis system.
FlinkTutorialFlinkTutorial 专注大数据Flink流试处理技术。从基础入门、概念、原理、实战、性能调优、源码解析等内容,使用Java开发,同时含有Scala部分核心代码。欢迎关注我的博客及github。
pigletA compiler for Pig Latin to Spark and Flink.
cassandra.realtimeDifferent ways to process data into Cassandra in realtime with technologies such as Kafka, Spark, Akka, Flink
emmaA quotation-based Scala DSL for scalable data analysis.
flink-learnLearning Flink : Flink CEP,Flink Core,Flink SQL
hadoopofficeHadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
dockerfilesMulti docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
logparserEasy parsing of Apache HTTPD and NGINX access logs with Java, Hadoop, Hive, Pig, Flink, Beam, Storm, Drill, ...
dlinkDinky is an out of the box one-stop real-time computing platform dedicated to the construction and practice of Unified Streaming & Batch and Unified Data Lake & Data Warehouse. Based on Apache Flink, Dinky provides the ability to connect many big data frameworks including OLAP and Data Lake.
flink-connector-kudu基于Apache-bahir-kudu-connector的flink-connector-kudu,支持Flink1.11.x DynamicTableSource/Sink,支持Range分区等
SANSA-StackBig Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
TiBigDataTiDB connectors for Flink/Hive/Presto
dpkb大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse
flink-deployerA tool that help automate deployment to an Apache Flink cluster
flink-clientJava library for managing Apache Flink via the Monitoring REST API
fdp-modelserverAn umbrella project for multiple implementations of model serving