CdapAn open source framework for building data analytic applications.
Stars: ✭ 509 (+960.42%)
dtailDTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once.
Stars: ✭ 112 (+133.33%)
learning-hadoop-and-sparkCompanion to Learning Hadoop and Learning Spark courses on Linked In Learning
Stars: ✭ 146 (+204.17%)
MobiusC# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+1835.42%)
MapreduceMapReduce by examples
Stars: ✭ 91 (+89.58%)
etranErlang Parse Transforms Including Fold (MapReduce) comprehension, Elixir-like Pipeline, and default function arguments
Stars: ✭ 19 (-60.42%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+45833.33%)
GuitarA Simple and Efficient Distributed Multidimensional BI Analysis Engine.
Stars: ✭ 86 (+79.17%)
RaftyImplementation of RAFT consensus in .NET core
Stars: ✭ 182 (+279.17%)
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+1685.42%)
infantryRun MapReduce in user's browser.
Stars: ✭ 14 (-70.83%)
oosoJava library for running Serverless MapReduce jobs
Stars: ✭ 25 (-47.92%)
RedisgearsDynamic execution framework for your Redis data
Stars: ✭ 152 (+216.67%)
railScalable RNA-seq analysis
Stars: ✭ 74 (+54.17%)
BigsliceA serverless cluster computing system for the Go programming language
Stars: ✭ 469 (+877.08%)
pyspark-algorithmsPySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (+50%)
SrcA light-weight distributed stream computing framework for Golang
Stars: ✭ 67 (+39.58%)
BraftAn industrial-grade C++ implementation of RAFT consensus algorithm based on brpc, widely used inside Baidu to build highly-available distributed systems.
Stars: ✭ 2,964 (+6075%)
BehemothBehemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Stars: ✭ 286 (+495.83%)
AtomixA reactive Java framework for building fault-tolerant distributed systems
Stars: ✭ 2,182 (+4445.83%)
MareMaRe leverages the power of Docker and Spark to run and scale your serial tools in MapReduce fashion.
Stars: ✭ 11 (-77.08%)
st-hadoopST-Hadoop is an open-source MapReduce extension of Hadoop designed specially to analyze your spatio-temporal data efficiently
Stars: ✭ 17 (-64.58%)
DamprPython Data Processing library
Stars: ✭ 102 (+112.5%)
connected-componentMap Reduce Implementation of Connected Component on Apache Spark
Stars: ✭ 68 (+41.67%)
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-29.17%)
6.824 2017⚡️ 6.824: Distributed Systems (Spring 2017). A course which present abstractions and implementation techniques for engineering distributed systems.
Stars: ✭ 219 (+356.25%)
mapreduceA in-process MapReduce library to help you optimizing service response time or concurrent task processing.
Stars: ✭ 93 (+93.75%)
durablefunctions-mapreduce-dotnetAn implementation of MapReduce on top of C# Durable Functions over the NYC 2017 Taxi dataset to compute average ride time per-day
Stars: ✭ 20 (-58.33%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (+91.67%)
MLBDMaterials for "Machine Learning on Big Data" course
Stars: ✭ 20 (-58.33%)
Corral🐎 A serverless MapReduce framework written for AWS Lambda
Stars: ✭ 648 (+1250%)
DparkPython clone of Spark, a MapReduce alike framework in Python
Stars: ✭ 2,668 (+5458.33%)
Bigdata💎🔥大数据学习笔记
Stars: ✭ 488 (+916.67%)
interview-refresh-java-bigdataa one-stop repo to lookup for code snippets of core java concepts, sql, data structures as well as big data. It also consists of interview questions asked in real-life.
Stars: ✭ 25 (-47.92%)
HadoopDedup🍉基于Hadoop和HBase的大规模海量数据去重
Stars: ✭ 27 (-43.75%)
Bdp Dataplatform大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Stars: ✭ 456 (+850%)
lectures-hse-sparkМасштабируемое машинное обучение и анализ больших данных с Apache Spark
Stars: ✭ 20 (-58.33%)
AsakusafwAsakusa Framework
Stars: ✭ 114 (+137.5%)
CascadingCascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.
Stars: ✭ 318 (+562.5%)
CkiteCKite - A JVM implementation of the Raft distributed consensus algorithm written in Scala
Stars: ✭ 214 (+345.83%)
Elixir IteraptorHandy enumerable operations implementation.
Stars: ✭ 55 (+14.58%)
RedissonRedisson - Redis Java client with features of In-Memory Data Grid. Over 50 Redis based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Tomcat, Scheduler, JCache API, Hibernate, MyBatis, RPC, local cache ...
Stars: ✭ 17,972 (+37341.67%)
gomrjobgomrjob - a Go Framework for Hadoop Map Reduce Jobs
Stars: ✭ 39 (-18.75%)
PowerjobEnterprise job scheduling middleware with distributed computing ability.
Stars: ✭ 3,231 (+6631.25%)
Avro Hadoop StarterExample MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Stars: ✭ 110 (+129.17%)
Data Algorithms Book MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (+1877.08%)
Tdigestt-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Stars: ✭ 274 (+470.83%)