DparkPython clone of Spark, a MapReduce alike framework in Python
PowerjobEnterprise job scheduling middleware with distributed computing ability.
6.824 2017⚡️ 6.824: Distributed Systems (Spring 2017). A course which present abstractions and implementation techniques for engineering distributed systems.
RedisgearsDynamic execution framework for your Redis data
Avro Hadoop StarterExample MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
DamprPython Data Processing library
SrcA light-weight distributed stream computing framework for Golang
MareMaRe leverages the power of Docker and Spark to run and scale your serial tools in MapReduce fashion.
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
MobiusC# and F# language binding and extensions to Apache Spark
Corral🐎 A serverless MapReduce framework written for AWS Lambda
CdapAn open source framework for building data analytic applications.
BigsliceA serverless cluster computing system for the Go programming language
Bdp Dataplatform大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
CascadingCascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.
BehemothBehemoth is an open source platform for large scale document analysis based on Apache Hadoop.
RedissonRedisson - Redis Java client with features of In-Memory Data Grid. Over 50 Redis based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Tomcat, Scheduler, JCache API, Hibernate, MyBatis, RPC, local cache ...
Tdigestt-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
GuitarA Simple and Efficient Distributed Multidimensional BI Analysis Engine.
st-hadoopST-Hadoop is an open-source MapReduce extension of Hadoop designed specially to analyze your spatio-temporal data efficiently
dtailDTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once.
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
mapreduceA in-process MapReduce library to help you optimizing service response time or concurrent task processing.
infantryRun MapReduce in user's browser.
MLBDMaterials for "Machine Learning on Big Data" course
oosoJava library for running Serverless MapReduce jobs
railScalable RNA-seq analysis
interview-refresh-java-bigdataa one-stop repo to lookup for code snippets of core java concepts, sql, data structures as well as big data. It also consists of interview questions asked in real-life.
etranErlang Parse Transforms Including Fold (MapReduce) comprehension, Elixir-like Pipeline, and default function arguments
pyspark-algorithmsPySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
gomrjobgomrjob - a Go Framework for Hadoop Map Reduce Jobs