WaterdropProduction Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+6774.07%)
StreamlineStreamLine - Streaming Analytics
Stars: ✭ 151 (+459.26%)
csvpluscsvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Stars: ✭ 67 (+148.15%)
fdp-modelserverAn umbrella project for multiple implementations of model serving
Stars: ✭ 47 (+74.07%)
DIRECTDIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.
Stars: ✭ 20 (-25.93%)
litemall-dw基于开源Litemall电商项目的大数据项目,包含前端埋点(openresty+lua)、后端埋点;数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化),同时也包含了Azkaban的workflow。
Stars: ✭ 36 (+33.33%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-11.11%)
vixtractwww.vixtract.ru
Stars: ✭ 40 (+48.15%)
hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+2166.67%)
etlflowEtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.
Stars: ✭ 38 (+40.74%)
SylphStream computing platform for bigdata
Stars: ✭ 362 (+1240.74%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+44.44%)
cassandra.realtimeDifferent ways to process data into Cassandra in realtime with technologies such as Kafka, Spark, Akka, Flink
Stars: ✭ 25 (-7.41%)
RegistrySchema Registry
Stars: ✭ 184 (+581.48%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (+555.56%)
Movie recommend基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
Stars: ✭ 2,092 (+7648.15%)
SANSA-StackBig Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
Stars: ✭ 130 (+381.48%)
Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+6274.07%)
Kinesis SqlKinesis Connector for Structured Streaming
Stars: ✭ 120 (+344.44%)
flink-deployerA tool that help automate deployment to an Apache Flink cluster
Stars: ✭ 143 (+429.63%)
ScramjetSimple yet powerful live data computation framework
Stars: ✭ 171 (+533.33%)
flink-connector-kudu基于Apache-bahir-kudu-connector的flink-connector-kudu,支持Flink1.11.x DynamicTableSource/Sink,支持Range分区等
Stars: ✭ 40 (+48.15%)
flink-clientJava library for managing Apache Flink via the Monitoring REST API
Stars: ✭ 48 (+77.78%)
Azure Event Hubs SparkEnabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (+418.52%)
T-WatchReal Time Twitter Sentiment Analysis Product
Stars: ✭ 20 (-25.93%)
Pyspark ExamplesCode examples on Apache Spark using python
Stars: ✭ 58 (+114.81%)
dlinkDinky is an out of the box one-stop real-time computing platform dedicated to the construction and practice of Unified Streaming & Batch and Unified Data Lake & Data Warehouse. Based on Apache Flink, Dinky provides the ability to connect many big data frameworks including OLAP and Data Lake.
Stars: ✭ 1,535 (+5585.19%)
TiBigDataTiDB connectors for Flink/Hive/Presto
Stars: ✭ 192 (+611.11%)
Lidea大型分布式系统实时监控平台
Stars: ✭ 28 (+3.7%)
Spark StatesCustom state store providers for Apache Spark
Stars: ✭ 83 (+207.41%)
Utils4sscala、spark使用过程中,各种测试用例以及相关资料整理
Stars: ✭ 1,070 (+3862.96%)
AirflowETLBlog post on ETL pipelines with Airflow
Stars: ✭ 20 (-25.93%)
Project FortisRepository for all parts of the Fortis architecture
Stars: ✭ 27 (+0%)
dpkb大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse
Stars: ✭ 123 (+355.56%)
ExDeMonA general purpose metrics monitor implemented with Apache Spark. Kafka source, Elastic sink, aggregate metrics, different analysis, notifications, actions, live configuration update, missing metrics, ...
Stars: ✭ 19 (-29.63%)
WormholeWormhole is a SPaaS (Stream Processing as a Service) Platform
Stars: ✭ 863 (+3096.3%)
MobiusC# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+3340.74%)
Bandar LogMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 19 (-29.63%)
Tweet-Analysis-With-Kafka-and-SparkA real time analytics dashboard to analyze the trending hashtags and @ mentions at any location using kafka and spark streaming.
Stars: ✭ 18 (-33.33%)
Spark ALS基于spark-ml,spark-mllib,spark-streaming的推荐算法实现
Stars: ✭ 89 (+229.63%)
AngelA Flexible and Powerful Parameter Server for large-scale machine learning
Stars: ✭ 6,458 (+23818.52%)
SpartaReal Time Analytics and Data Pipelines based on Spark Streaming
Stars: ✭ 513 (+1800%)
Data AcceleratorData Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (+814.81%)
CdapAn open source framework for building data analytic applications.
Stars: ✭ 509 (+1785.19%)
link-moveA model-driven dynamically-configurable framework to acquire data from external sources and save it to your database.
Stars: ✭ 32 (+18.52%)