All Categories → Data Processing → mapreduce

Top 57 mapreduce open source projects

Dpark
Python clone of Spark, a MapReduce alike framework in Python
Powerjob
Enterprise job scheduling middleware with distributed computing ability.
6.824 2017
⚡️ 6.824: Distributed Systems (Spring 2017). A course which present abstractions and implementation techniques for engineering distributed systems.
Redisgears
Dynamic execution framework for your Redis data
Avro Hadoop Starter
Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Repository
个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Mapreduce
MapReduce by examples
Big Data Engineering Coursera Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Src
A light-weight distributed stream computing framework for Golang
Elixir Iteraptor
Handy enumerable operations implementation.
Data Algorithms Book
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Mare
MaRe leverages the power of Docker and Spark to run and scale your serial tools in MapReduce fashion.
Bigdata Interview
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Mobius
C# and F# language binding and extensions to Apache Spark
Distributed Computing
distributed_computing include mapreduce kvstore etc.
Corral
🐎 A serverless MapReduce framework written for AWS Lambda
Cdap
An open source framework for building data analytic applications.
Bigslice
A serverless cluster computing system for the Go programming language
Bdp Dataplatform
大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Data Science Ipython Notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Cascading
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.
Behemoth
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Redisson
Redisson - Redis Java client with features of In-Memory Data Grid. Over 50 Redis based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Tomcat, Scheduler, JCache API, Hibernate, MyBatis, RPC, local cache ...
Tdigest
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Guitar
A Simple and Efficient Distributed Multidimensional BI Analysis Engine.
dtail
DTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once.
connected-component
Map Reduce Implementation of Connected Component on Apache Spark
mapreduce-examples
A collection of mapreduce problems and solutions
GooglePlay-Web-Crawler
Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive
mapreduce
A in-process MapReduce library to help you optimizing service response time or concurrent task processing.
durablefunctions-mapreduce-dotnet
An implementation of MapReduce on top of C# Durable Functions over the NYC 2017 Taxi dataset to compute average ride time per-day
ooso
Java library for running Serverless MapReduce jobs
interview-refresh-java-bigdata
a one-stop repo to lookup for code snippets of core java concepts, sql, data structures as well as big data. It also consists of interview questions asked in real-life.
etran
Erlang Parse Transforms Including Fold (MapReduce) comprehension, Elixir-like Pipeline, and default function arguments
HadoopDedup
🍉基于Hadoop和HBase的大规模海量数据去重
lectures-hse-spark
Масштабируемое машинное обучение и анализ больших данных с Apache Spark
learning-hadoop-and-spark
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
bigdata-doc
大数据学习笔记,学习路线,技术案例整理。
mit-6.824-distributed-systems
Template repository to work on the labs from MIT 6.824 Distributed Systems course.
gomrjob
gomrjob - a Go Framework for Hadoop Map Reduce Jobs
1-57 of 57 mapreduce projects