ClickHouse® is a free analytics DBMS for big data
Koalas: pandas API on Apache Spark
An easy to use, self-service open BI reporting and BI dashboard platform.
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Aws Etl Orchestrator
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Open-Source Web GUI for Apache Kafka Management
An advanced distributed task flow management on top of Celery
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据 、推荐系统、数据库、数据挖掘 、机器学习 、深度学习 、算法等。
Lite Virtual List
Virtual list component library supporting waterfall flow based on vue
Bare bone examples of machine learning in TensorFlow
U-SQL Examples and Issue Tracking
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Manipulate arrays of complex data structures as easily as Numpy.
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Helical Insight software is world’s first Open Source Business Intelligence framework which helps you to make sense out of your data and make well informed decisions.
Data Science Live Book
An open source book to learn data science, data analysis and machine learning, suitable for all ages!
An open source cybersecurity protocol for syncing decentralized graph data.
Mirror of Apache Flume
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Distributed, Versioned, Image-oriented Dataservice
Keyvi - a key value index that powers Cliqz search engine. It is an in-memory FST-based data structure highly optimized for size and lookup performance.
PredictionIO, a machine learning server for developers and ML engineers.
Keyvi - the key value index. It is an in-memory FST-based data structure highly optimized for size and lookup performance.
The official home of the Presto distributed SQL query engine for big data
A Clojure dataframe library that runs on Spark
Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Easily make RESTful web services for time series reporting with Big Data analytics engines like Druid and SQL Databases.
Simple windows desktop application for viewing & querying Apache Parquet files
My journey to learn and grow in the domain of Machine Learning and Artificial Intelligence by performing the #100DaysofMLCode Challenge.
A visual ETL development and debugging tool for big data
Big Data Toolkit for the JVM
SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
A search engine which can hold 100 trillion lines of log data.
The Accelerator is a tool for fast and reproducible processing of large amounts of data.
Mirror of Apache Hama
A large-scale entity and relation database supporting aggregation of properties
Mirror of Apache Tajo