Data AcceleratorData Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Example SparkSpark, Spark Streaming and Spark SQL unit testing strategies
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
ScramjetSimple yet powerful live data computation framework
Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
WaterdropProduction Ready Data Integration Product, documentation:
Utils4sscala、spark使用过程中,各种测试用例以及相关资料整理
WormholeWormhole is a SPaaS (Stream Processing as a Service) Platform
MobiusC# and F# language binding and extensions to Apache Spark
Bandar LogMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
AngelA Flexible and Powerful Parameter Server for large-scale machine learning
SpartaReal Time Analytics and Data Pipelines based on Spark Streaming
CdapAn open source framework for building data analytic applications.
SylphStream computing platform for bigdata
bandar-logMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
litemall-dw基于开源Litemall电商项目的大数据项目,包含前端埋点(openresty+lua)、后端埋点;数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化),同时也包含了Azkaban的workflow。
spark-utilsBasic framework utilities to quickly start writing production ready Apache Spark applications
cassandra.realtimeDifferent ways to process data into Cassandra in realtime with technologies such as Kafka, Spark, Akka, Flink
waspWASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
xxhadoopData Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !
interview-refresh-java-bigdataa one-stop repo to lookup for code snippets of core java concepts, sql, data structures as well as big data. It also consists of interview questions asked in real-life.
T-WatchReal Time Twitter Sentiment Analysis Product
Spark ALS基于spark-ml,spark-mllib,spark-streaming的推荐算法实现
fdp-modelserverAn umbrella project for multiple implementations of model serving
ExDeMonA general purpose metrics monitor implemented with Apache Spark. Kafka source, Elastic sink, aggregate metrics, different analysis, notifications, actions, live configuration update, missing metrics, ...