Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+39987.27%)

Mutual labels: mapreduce

Cascading

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.

Stars: ✭ 318 (+478.18%)

Mutual labels: mapreduce

Behemoth

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

Stars: ✭ 286 (+420%)

Mutual labels: mapreduce

Redisson

Redisson - Redis Java client with features of In-Memory Data Grid. Over 50 Redis based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Tomcat, Scheduler, JCache API, Hibernate, MyBatis, RPC, local cache ...

Stars: ✭ 17,972 (+32576.36%)

Mutual labels: mapreduce

Tdigest

t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark

Stars: ✭ 274 (+398.18%)

Mutual labels: mapreduce

data-algorithms-with-spark

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Stars: ✭ 34 (-38.18%)

Mutual labels: mapreduce

Guitar

A Simple and Efficient Distributed Multidimensional BI Analysis Engine.

Stars: ✭ 86 (+56.36%)

Mutual labels: mapreduce

st-hadoop

ST-Hadoop is an open-source MapReduce extension of Hadoop designed specially to analyze your spatio-temporal data efficiently

Stars: ✭ 17 (-69.09%)

Mutual labels: mapreduce

dtail

DTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once.

Stars: ✭ 112 (+103.64%)

Mutual labels: mapreduce

connected-component

Map Reduce Implementation of Connected Component on Apache Spark

Stars: ✭ 68 (+23.64%)

Mutual labels: mapreduce

mapreduce-examples

A collection of mapreduce problems and solutions

Stars: ✭ 23 (-58.18%)

Mutual labels: mapreduce

big data

A collection of tutorials on Hadoop, MapReduce, Spark, Docker

Stars: ✭ 34 (-38.18%)

Mutual labels: mapreduce

GooglePlay-Web-Crawler

Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive

Stars: ✭ 18 (-67.27%)

Mutual labels: mapreduce

mapreduce

A in-process MapReduce library to help you optimizing service response time or concurrent task processing.

Stars: ✭ 93 (+69.09%)

Mutual labels: mapreduce

infantry

Run MapReduce in user's browser.

Stars: ✭ 14 (-74.55%)

Mutual labels: mapreduce

durablefunctions-mapreduce-dotnet

An implementation of MapReduce on top of C# Durable Functions over the NYC 2017 Taxi dataset to compute average ride time per-day

Stars: ✭ 20 (-63.64%)

Mutual labels: mapreduce

web-click-flow

网站点击流离线日志分析

Stars: ✭ 14 (-74.55%)

Mutual labels: mapreduce

MLBD

Materials for "Machine Learning on Big Data" course

Stars: ✭ 20 (-63.64%)

Mutual labels: mapreduce

ooso

Java library for running Serverless MapReduce jobs

Stars: ✭ 25 (-54.55%)

Mutual labels: mapreduce

ParallelUtilities.jl

Fast and easy parallel mapreduce on HPC clusters

Stars: ✭ 28 (-49.09%)

Mutual labels: mapreduce

qs-hadoop

大数据生态圈学习

Stars: ✭ 18 (-67.27%)

Mutual labels: mapreduce

Data-pipeline-project

Data pipeline project

Stars: ✭ 18 (-67.27%)

Mutual labels: mapreduce

rail

Scalable RNA-seq analysis

Stars: ✭ 74 (+34.55%)

Mutual labels: mapreduce

interview-refresh-java-bigdata

a one-stop repo to lookup for code snippets of core java concepts, sql, data structures as well as big data. It also consists of interview questions asked in real-life.

Stars: ✭ 25 (-54.55%)

Mutual labels: mapreduce

etran

Erlang Parse Transforms Including Fold (MapReduce) comprehension, Elixir-like Pipeline, and default function arguments

Stars: ✭ 19 (-65.45%)

Mutual labels: mapreduce

HadoopDedup

🍉基于Hadoop和HBase的大规模海量数据去重

Stars: ✭ 27 (-50.91%)

Mutual labels: mapreduce

pyspark-algorithms

PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2