Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

✭ 22,048

python deep-learning machine-learning tensorflow aws data-science keras spark numpy pandas big-data scikit-learn caffe hadoop matplotlib kaggle theano scipy mapreduce

Cascading

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.

✭ 318

java hadoop mapreduce

Behemoth

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

✭ 286

java nlp hadoop mapreduce

Redisson

Redisson - Redis Java client with features of In-Memory Data Grid. Over 50 Redis based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Tomcat, Scheduler, JCache API, Hibernate, MyBatis, RPC, local cache ...

✭ 17,972

java redis list cache distributed map queue scheduler hibernate redis-client tomcat redis-cluster lock session mapreduce set executor distributed-locks spring-cache

Tdigest

t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark

✭ 274

python distributed-computing pyspark mapreduce

data-algorithms-with-spark

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Guitar

A Simple and Efficient Distributed Multidimensional BI Analysis Engine.

✭ 86

java groovy business-intelligence reports olap data-analysis mapreduce multidimensional olap-cube

st-hadoop

ST-Hadoop is an open-source MapReduce extension of Hadoop designed specially to analyze your spatio-temporal data efficiently

dtail

DTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once.

✭ 112

go c Makefile devops log distributed adhoc mapreduce log-management devops-tools troubleshooting mimecast

connected-component

Map Reduce Implementation of Connected Component on Apache Spark

✭ 68

scala apache-spark graph-algorithms mapreduce union-find connected-components graphx

mapreduce-examples

A collection of mapreduce problems and solutions

✭ 23

java mapreduce hadoop-mapreduce

big data

A collection of tutorials on Hadoop, MapReduce, Spark, Docker

✭ 34

Jupyter Notebook docker big-data spark hadoop bigdata jupyter-notebook pyspark mapreduce spark-sql testdfsio mapreduce-bash

GooglePlay-Web-Crawler

Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive

✭ 18

java PigLatin emr aws hive hadoop nutch s3 pig mapreduce

mapreduce

A in-process MapReduce library to help you optimizing service response time or concurrent task processing.

✭ 93

go concurrent-programming concurrent mapreduce mapreduce-go

infantry

Run MapReduce in user's browser.

✭ 14

javascript HTML python distributed mapreduce parallel-processing

durablefunctions-mapreduce-dotnet

An implementation of MapReduce on top of C# Durable Functions over the NYC 2017 Taxi dataset to compute average ride time per-day

✭ 20

C#powershell azure-functions mapreduce durable-functions

web-click-flow

网站点击流离线日志分析

✭ 14

java shell python hive hadoop etl mapreduce flume sqoop

MLBD

Materials for "Machine Learning on Big Data" course

✭ 20

Jupyter Notebook machine-learning big-data spark mapreduce distributed-machine-learning large-scale-machine-learning

ooso

Java library for running Serverless MapReduce jobs

✭ 25

java aws lambda library serverless mapreduce

ParallelUtilities.jl

Fast and easy parallel mapreduce on HPC clusters

✭ 28

julia hpc parallel parallel-computing distributed-computing distributed reduction high-performance-computing mapreduce hpc-clusters hpc-applications hpc-cluster

qs-hadoop

大数据生态圈学习

✭ 18

java scala elasticsearch spark hadoop storm bigdata spark-streaming mapreduce

rail

Scalable RNA-seq analysis

✭ 74

python Mathematica java shell r emr ipython alignments mapreduce rna-seq-analysis rail-rna

interview-refresh-java-bigdata

a one-stop repo to lookup for code snippets of core java concepts, sql, data structures as well as big data. It also consists of interview questions asked in real-life.

✭ 25

java snippets kafka spark interview recursion garbage-collection spark-streaming mapreduce java-collection

etran

Erlang Parse Transforms Including Fold (MapReduce) comprehension, Elixir-like Pipeline, and default function arguments

✭ 19