WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
SparkmeasureThis is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.
SidekickHigh Performance HTTP Sidecar Load Balancer
KyuubiKyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
SparkstreamingSpark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志分析统计;SpringBoot+Echarts实现数据可视化展示
OapOptimized Analytics Package for Spark* Platform
SparklensQubole Sparklens tool for performance tuning Apache Spark
ScalnetA Scala wrapper for Deeplearning4j, inspired by Keras. Scala + DL + Spark + GPUs
IqlAn ad hoc query service based on the spark sql engine.(基于spark sql引擎的即席查询服务)
Ytk LearnYtk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
WirbelsturmWirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
SparklintA tool for monitoring and tuning Spark jobs for efficiency.
CookFair job scheduler on Kubernetes and Mesos for batch workloads and Spark
Learningsparkv2This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
CrayonSimple framework agnostic UI router for SPAs
DeltaAn open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
SplineData Lineage Tracking And Visualization Solution
ZatZeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark
Awesome AdaA curated list of awesome resources related to the Ada and SPARK programming language
ElasticlusterCreate clusters of VMs on the cloud and configure them with Ansible.
Spark NotebookInteractive and Reactive Data Science using Scala and Spark.
Spark Druid OlapSparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.
CloudflowCloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Hbase RddSpark RDD to read, write and delete from HBase
DatavecETL Library for Machine Learning - data pipelines, data munging and wrangling
Sk DistDistributed scikit-learn meta-estimators in PySpark
Spark Jupyter AwsA guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
SuccinctEnabling queries on compressed data.
Big Data Rosetta CodeCode snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
IbisA pandas-like deferred expression system, with first-class SQL support
Book本项目收藏这些年来看过或者听过的一些不错的书籍,在整理文件时看见这些,发现删掉有点可惜,放着又太浪费空间,本着分享的原则,就把它们共享出来,一方面给需要的读者提供这些书籍,另一方面也是一种像知识库的积累吧
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
daf-kyloKylo integration with PDND (previously DAF).
dllibdllib is a distributed deep learning library running on Apache Spark
prostoProsto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
bigkubeMinikube for big data with Scala and Spark
Covid19TrackerA Robinhood style COVID-19 🦠 Android tracking app for the US. Open source and built with Kotlin.
SparkV🤖⚡ | The most POWERFUL multipurpose chat/meme bot that will boost the activity in your server.
trembitaModel complex data transformation pipelines easily
spark-extensionA library that provides useful extensions to Apache Spark and PySpark.