DatahubThe Metadata Platform for the Modern Data Stack
Stars: ✭ 4,232 (+23411.11%)
GenieDistributed Big Data Orchestration Service
Stars: ✭ 1,544 (+8477.78%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+16005.56%)
smart-city-analyticsAnalyze large data sets collected from a long-range IoT system that uses LoRaWAN networking
Stars: ✭ 28 (+55.56%)
bigstatsrR package for statistical tools with big matrices stored on disk.
Stars: ✭ 139 (+672.22%)
mmtf-workshop-2018Structural Bioinformatics Training Workshop & Hackathon 2018
Stars: ✭ 50 (+177.78%)
KoalasKoalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+16811.11%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-22.22%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (+494.44%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+516.67%)
yildiz🦄🌟 Graph Database layer on top of Google Bigtable
Stars: ✭ 24 (+33.33%)
MahaA framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
Stars: ✭ 101 (+461.11%)
ibmpairsopen source tools for interaction with IBM PAIRS:
Stars: ✭ 23 (+27.78%)
CboardAn easy to use, self-service open BI reporting and BI dashboard platform.
Stars: ✭ 2,795 (+15427.78%)
spark-acidACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (+405.56%)
Graph samplingGraph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
Stars: ✭ 99 (+450%)
gorilla-replA fork of Jony Epsilon's rich REPL for Clojure in the notebook style.
Stars: ✭ 22 (+22.22%)
vxqueryMirror of Apache VXQuery
Stars: ✭ 19 (+5.56%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (+427.78%)
HyperspaceAn open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Stars: ✭ 246 (+1266.67%)
hotmapWebGL Heatmap Viewer for Big Data and Bioinformatics
Stars: ✭ 13 (-27.78%)
OrcAn ORC file format reader and writer for Go.
Stars: ✭ 97 (+438.89%)
egisEgis - a handy Ruby interface for AWS Athena
Stars: ✭ 38 (+111.11%)
big-sorterJava library that sorts very large files of records by splitting into smaller sorted files and merging
Stars: ✭ 49 (+172.22%)
Streamxkafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Stars: ✭ 96 (+433.33%)
TrafodionApache Trafodion
Stars: ✭ 242 (+1244.44%)
BeamApache Beam is a unified programming model for Batch and Streaming
Stars: ✭ 5,149 (+28505.56%)
bftkvA distributed key-value storage that's tolerant to Byzantine fault.
Stars: ✭ 27 (+50%)
TreevizTree diagrams with JavaScript 🌲 📈
Stars: ✭ 95 (+427.78%)
v6.dooring.public可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.
Stars: ✭ 323 (+1694.44%)
ytprivYT metadata exporter
Stars: ✭ 28 (+55.56%)
couchdb-mangoMirror of Apache CouchDB Mango
Stars: ✭ 34 (+88.89%)
Selinon An advanced distributed task flow management on top of Celery
Stars: ✭ 237 (+1216.67%)
clusterdockclusterdock is a framework for creating Docker-based container clusters
Stars: ✭ 26 (+44.44%)
computer-vision-notebooks👁️ An authorial set of fundamental Python recipes on Computer Vision and Digital Image Processing.
Stars: ✭ 89 (+394.44%)
opendcCollaborative Datacenter Simulation and Exploration for Everybody
Stars: ✭ 40 (+122.22%)
subsemblesubsemble R package for ensemble learning on subsets of data
Stars: ✭ 40 (+122.22%)
Books整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据 、推荐系统、数据库、数据挖掘 、机器学习 、深度学习 、算法等。
Stars: ✭ 222 (+1133.33%)
SynapseMLSimple and Distributed Machine Learning
Stars: ✭ 3,355 (+18538.89%)
Uproot4ROOT I/O in pure Python and NumPy.
Stars: ✭ 80 (+344.44%)
MLBDMaterials for "Machine Learning on Big Data" course
Stars: ✭ 20 (+11.11%)
CodexA free note-taking software for programmers and Computer Science students
Stars: ✭ 242 (+1244.44%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (+338.89%)
shiftingA privacy-focused list of alternatives to mainstream services to help the competition.
Stars: ✭ 31 (+72.22%)
data-viz-utilsFunctions for easily making publication-quality figures with matplotlib.
Stars: ✭ 16 (-11.11%)
text-rnn-tensorflowTutorial: Multi-layer Recurrent Neural Networks (LSTM, RNN) for text models in Python using TensorFlow.
Stars: ✭ 22 (+22.22%)
jupyterlab plotlyThis repository is deprecated. The extension has moved to https://github.com/jupyterlab/jupyter-renderers
Stars: ✭ 16 (-11.11%)
notebooksA docker-based starter kit for machine learning via jupyter notebooks. Designed for those who just want a runtime environment and get on with machine learning. Docker tags:
Stars: ✭ 29 (+61.11%)
PoseidonA search engine which can hold 100 trillion lines of log data.
Stars: ✭ 1,793 (+9861.11%)
Onlinestats.jlSingle-pass algorithms for statistics
Stars: ✭ 507 (+2716.67%)