ClickhouseClickHouse® is a free analytics DBMS for big data
KoalasKoalas: pandas API on Apache Spark
CboardAn easy to use, self-service open BI reporting and BI dashboard platform.
Data AcceleratorData Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
HyperspaceAn open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Aws Etl OrchestratorA serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Kafka UiOpen-Source Web GUI for Apache Kafka Management
Selinon An advanced distributed task flow management on top of Celery
ElandPython Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Books整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据 、推荐系统、数据库、数据挖掘 、机器学习 、深度学习 、算法等。
Lite Virtual ListVirtual list component library supporting waterfall flow based on vue
NakedtensorBare bone examples of machine learning in TensorFlow
UsqlU-SQL Examples and Issue Tracking
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Awkward 0.xManipulate arrays of complex data structures as easily as Numpy.
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
HelicalinsightHelical Insight software is world’s first Open Source Business Intelligence framework which helps you to make sense out of your data and make well informed decisions.
Data Science Live BookAn open source book to learn data science, data analysis and machine learning, suitable for all ages!
GunAn open source cybersecurity protocol for syncing decentralized graph data.
FlumeMirror of Apache Flume
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
DvidDistributed, Versioned, Image-oriented Dataservice
KeyviKeyvi - a key value index that powers Cliqz search engine. It is an in-memory FST-based data structure highly optimized for size and lookup performance.
Attic PredictionioPredictionIO, a machine learning server for developers and ML engineers.
KeyviKeyvi - the key value index. It is an in-memory FST-based data structure highly optimized for size and lookup performance.
PrestoThe official home of the Presto distributed SQL query engine for big data
GeniA Clojure dataframe library that runs on Spark
DatasciencevmTools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
FiliEasily make RESTful web services for time series reporting with Big Data analytics engines like Druid and SQL Databases.
ParquetviewerSimple windows desktop application for viewing & querying Apache Parquet files
100daysofmlcodeMy journey to learn and grow in the domain of Machine Learning and Artificial Intelligence by performing the #100DaysofMLCode Challenge.
HydrographA visual ETL development and debugging tool for big data
Eel SdkBig Data Toolkit for the JVM
Sparkling GraphSparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
PoseidonA search engine which can hold 100 trillion lines of log data.
AcceleratorThe Accelerator is a tool for fast and reproducible processing of large amounts of data.
HamaMirror of Apache Hama
GafferA large-scale entity and relation database supporting aggregation of properties
TajoMirror of Apache Tajo