All Projects → learning-hadoop-and-spark → Similar Projects or Alternatives

397 Open source projects that are alternatives of or similar to learning-hadoop-and-spark

GooglePlay-Web-Crawler

Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive

Stars: ✭ 18 (-87.67%)

Mutual labels: emr, hadoop, mapreduce

gomrjob

gomrjob - a Go Framework for Hadoop Map Reduce Jobs

Stars: ✭ 39 (-73.29%)

Mutual labels: hadoop, mapreduce, dataproc

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+15001.37%)

Mutual labels: hadoop, mapreduce

Data-pipeline-project

Data pipeline project

Stars: ✭ 18 (-87.67%)

Mutual labels: hadoop, mapreduce

qs-hadoop

大数据生态圈学习

Stars: ✭ 18 (-87.67%)

Mutual labels: hadoop, mapreduce

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (+47.26%)

Mutual labels: apache-spark, hadoop

rail

Scalable RNA-seq analysis

Stars: ✭ 74 (-49.32%)

Mutual labels: emr, mapreduce

big data

A collection of tutorials on Hadoop, MapReduce, Spark, Docker

Stars: ✭ 34 (-76.71%)

Mutual labels: hadoop, mapreduce

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (-23.97%)

Mutual labels: apache-spark, hadoop

Src

A light-weight distributed stream computing framework for Golang

Stars: ✭ 67 (-54.11%)

Mutual labels: hadoop, mapreduce

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (-73.29%)

Mutual labels: apache-spark, hadoop

sparkucx

A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer

Stars: ✭ 32 (-78.08%)

Mutual labels: apache-spark, hadoop

Spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Stars: ✭ 1,721 (+1078.77%)

Mutual labels: emr, apache-spark

Behemoth

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.

Stars: ✭ 286 (+95.89%)

Mutual labels: hadoop, mapreduce

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-91.1%)

Mutual labels: apache-spark, hadoop

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (-82.88%)

Mutual labels: emr, hadoop

Avro Hadoop Starter

Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.

Stars: ✭ 110 (-24.66%)

Mutual labels: hadoop, mapreduce

Bigdata Notes

大数据入门指南 ⭐

Stars: ✭ 10,991 (+7428.08%)

Mutual labels: hadoop, mapreduce

connected-component

Map Reduce Implementation of Connected Component on Apache Spark

Stars: ✭ 68 (-53.42%)

Mutual labels: apache-spark, mapreduce

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (+21.23%)

Mutual labels: apache-spark, hadoop

Bigdata

💎🔥大数据学习笔记

Stars: ✭ 488 (+234.25%)

Mutual labels: hadoop, mapreduce

Bigdata Interview

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

Stars: ✭ 857 (+486.99%)

Mutual labels: hadoop, mapreduce

Asakusafw

Asakusa Framework

Stars: ✭ 114 (-21.92%)

Mutual labels: hadoop, mapreduce

Repository

个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。

Stars: ✭ 92 (-36.99%)

Mutual labels: hadoop, mapreduce

Dist Keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

Stars: ✭ 613 (+319.86%)

Mutual labels: apache-spark, hadoop

Mobius

C# and F# language binding and extensions to Apache Spark

Stars: ✭ 929 (+536.3%)

Mutual labels: apache-spark, mapreduce

Griffon Vm

Griffon Data Science Virtual Machine

Stars: ✭ 128 (-12.33%)

Mutual labels: apache-spark, hadoop

Cascading

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.

Stars: ✭ 318 (+117.81%)

Mutual labels: hadoop, mapreduce

Data Algorithms Book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

Stars: ✭ 949 (+550%)

Mutual labels: hadoop, mapreduce

web-click-flow

网站点击流离线日志分析

Stars: ✭ 14 (-90.41%)

Mutual labels: hadoop, mapreduce

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-83.56%)

Mutual labels: apache-spark, hadoop

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (+2.74%)

Mutual labels: apache-spark, hadoop

bigdata-doc

大数据学习笔记，学习路线，技术案例整理。

Stars: ✭ 37 (-74.66%)

Mutual labels: hadoop, mapreduce

fink-broker

Astronomy Broker based on Apache Spark

Stars: ✭ 18 (-87.67%)

Mutual labels: apache-spark

docker-hadoop

Docker image for main Apache Hadoop components (Yarn/Hdfs)

Stars: ✭ 59 (-59.59%)

Mutual labels: hadoop

Aws Data Wrangler

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Stars: ✭ 2,385 (+1533.56%)

Mutual labels: emr

streamsx.kafka

Repository for integration with Apache Kafka

Stars: ✭ 13 (-91.1%)

Mutual labels: apache-spark

teraslice

Scalable data processing pipelines in JavaScript

Stars: ✭ 48 (-67.12%)

Mutual labels: hadoop

Openemr

The most popular open source electronic health records and medical practice management solution.

Stars: ✭ 1,762 (+1106.85%)

Mutual labels: emr

JavaFramework

Simple Java Framework,designed for easily develop Spring based java program.Support Bigdata And metadata management.A common elasticsearch comm query tool and so on.

Stars: ✭ 16 (-89.04%)

Mutual labels: hadoop

freehealth

Free and open source Electronic Health Record

Stars: ✭ 39 (-73.29%)

Mutual labels: emr

openPDC

Open Source Phasor Data Concentrator

Stars: ✭ 109 (-25.34%)

Mutual labels: hadoop

learn-by-examples

Real-world Spark pipelines examples

Stars: ✭ 84 (-42.47%)

Mutual labels: apache-spark

beanszoo

Distributed Java micro-services using ZooKeeper

Stars: ✭ 12 (-91.78%)

Mutual labels: hadoop

sensu-plugins-aws

This plugin provides native AWS instrumentation for monitoring and metrics collection, including: health and metrics for various AWS services, such as EC2, RDS, ELB, and more, as well as handlers for EC2, SES, and SNS.

Stars: ✭ 79 (-45.89%)

Mutual labels: emr

healthcare

Open Source Healthcare ERP / Management System

Stars: ✭ 68 (-53.42%)

Mutual labels: emr

orion

Management and automation platform for Stateful Distributed Systems

Stars: ✭ 77 (-47.26%)

Mutual labels: hadoop

tscharts

Django REST framework-based Digital Patient Registration and EMR backend