All Projects → learning-hadoop-and-spark → Similar Projects or Alternatives

397 Open source projects that are alternatives of or similar to learning-hadoop-and-spark

GooglePlay-Web-Crawler
Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive
Stars: ✭ 18 (-87.67%)
Mutual labels:  emr, hadoop, mapreduce
gomrjob
gomrjob - a Go Framework for Hadoop Map Reduce Jobs
Stars: ✭ 39 (-73.29%)
Mutual labels:  hadoop, mapreduce, dataproc
Data Science Ipython Notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+15001.37%)
Mutual labels:  hadoop, mapreduce
Data-pipeline-project
Data pipeline project
Stars: ✭ 18 (-87.67%)
Mutual labels:  hadoop, mapreduce
qs-hadoop
大数据生态圈学习
Stars: ✭ 18 (-87.67%)
Mutual labels:  hadoop, mapreduce
Sparkrdma
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+47.26%)
Mutual labels:  apache-spark, hadoop
rail
Scalable RNA-seq analysis
Stars: ✭ 74 (-49.32%)
Mutual labels:  emr, mapreduce
big data
A collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-76.71%)
Mutual labels:  hadoop, mapreduce
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-23.97%)
Mutual labels:  apache-spark, hadoop
Src
A light-weight distributed stream computing framework for Golang
Stars: ✭ 67 (-54.11%)
Mutual labels:  hadoop, mapreduce
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-73.29%)
Mutual labels:  apache-spark, hadoop
sparkucx
A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (-78.08%)
Mutual labels:  apache-spark, hadoop
Spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+1078.77%)
Mutual labels:  emr, apache-spark
Behemoth
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Stars: ✭ 286 (+95.89%)
Mutual labels:  hadoop, mapreduce
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-91.1%)
Mutual labels:  apache-spark, hadoop
basin
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-82.88%)
Mutual labels:  emr, hadoop
Avro Hadoop Starter
Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.
Stars: ✭ 110 (-24.66%)
Mutual labels:  hadoop, mapreduce
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+7428.08%)
Mutual labels:  hadoop, mapreduce
connected-component
Map Reduce Implementation of Connected Component on Apache Spark
Stars: ✭ 68 (-53.42%)
Mutual labels:  apache-spark, mapreduce
Bigdata Playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (+21.23%)
Mutual labels:  apache-spark, hadoop
Bigdata
💎🔥大数据学习笔记
Stars: ✭ 488 (+234.25%)
Mutual labels:  hadoop, mapreduce
Bigdata Interview
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+486.99%)
Mutual labels:  hadoop, mapreduce
Asakusafw
Asakusa Framework
Stars: ✭ 114 (-21.92%)
Mutual labels:  hadoop, mapreduce
Repository
个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-36.99%)
Mutual labels:  hadoop, mapreduce
Dist Keras
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Stars: ✭ 613 (+319.86%)
Mutual labels:  apache-spark, hadoop
Mobius
C# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+536.3%)
Mutual labels:  apache-spark, mapreduce
Griffon Vm
Griffon Data Science Virtual Machine
Stars: ✭ 128 (-12.33%)
Mutual labels:  apache-spark, hadoop
Cascading
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.
Stars: ✭ 318 (+117.81%)
Mutual labels:  hadoop, mapreduce
Data Algorithms Book
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (+550%)
Mutual labels:  hadoop, mapreduce
web-click-flow
网站点击流离线日志分析
Stars: ✭ 14 (-90.41%)
Mutual labels:  hadoop, mapreduce
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-83.56%)
Mutual labels:  apache-spark, hadoop
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+2.74%)
Mutual labels:  apache-spark, hadoop
bigdata-doc
大数据学习笔记,学习路线,技术案例整理。
Stars: ✭ 37 (-74.66%)
Mutual labels:  hadoop, mapreduce
fink-broker
Astronomy Broker based on Apache Spark
Stars: ✭ 18 (-87.67%)
Mutual labels:  apache-spark
docker-hadoop
Docker image for main Apache Hadoop components (Yarn/Hdfs)
Stars: ✭ 59 (-59.59%)
Mutual labels:  hadoop
Aws Data Wrangler
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+1533.56%)
Mutual labels:  emr
streamsx.kafka
Repository for integration with Apache Kafka
Stars: ✭ 13 (-91.1%)
Mutual labels:  apache-spark
teraslice
Scalable data processing pipelines in JavaScript
Stars: ✭ 48 (-67.12%)
Mutual labels:  hadoop
Openemr
The most popular open source electronic health records and medical practice management solution.
Stars: ✭ 1,762 (+1106.85%)
Mutual labels:  emr
JavaFramework
Simple Java Framework,designed for easily develop Spring based java program.Support Bigdata And metadata management.A common elasticsearch comm query tool and so on.
Stars: ✭ 16 (-89.04%)
Mutual labels:  hadoop
freehealth
Free and open source Electronic Health Record
Stars: ✭ 39 (-73.29%)
Mutual labels:  emr
openPDC
Open Source Phasor Data Concentrator
Stars: ✭ 109 (-25.34%)
Mutual labels:  hadoop
learn-by-examples
Real-world Spark pipelines examples
Stars: ✭ 84 (-42.47%)
Mutual labels:  apache-spark
beanszoo
Distributed Java micro-services using ZooKeeper
Stars: ✭ 12 (-91.78%)
Mutual labels:  hadoop
sensu-plugins-aws
This plugin provides native AWS instrumentation for monitoring and metrics collection, including: health and metrics for various AWS services, such as EC2, RDS, ELB, and more, as well as handlers for EC2, SES, and SNS.
Stars: ✭ 79 (-45.89%)
Mutual labels:  emr
healthcare
Open Source Healthcare ERP / Management System
Stars: ✭ 68 (-53.42%)
Mutual labels:  emr
orion
Management and automation platform for Stateful Distributed Systems
Stars: ✭ 77 (-47.26%)
Mutual labels:  hadoop
tscharts
Django REST framework-based Digital Patient Registration and EMR backend
Stars: ✭ 14 (-90.41%)
Mutual labels:  emr
pdd-graph
PDD Graph : Bridging MIMIC-III and Linked Data Cloud
Stars: ✭ 31 (-78.77%)
Mutual labels:  emr
Location-based-Restaurants-Recommendation-System
Big Data Management and Analysis Final Project
Stars: ✭ 44 (-69.86%)
Mutual labels:  apache-spark
hadoop-ansible
Install hadoop cluster with ansible
Stars: ✭ 35 (-76.03%)
Mutual labels:  hadoop
Hello-AWS-Data-Services
Sample code for AWS data service and ML courses on LinkedIn Learning
Stars: ✭ 144 (-1.37%)
Mutual labels:  emr
sbt-lighter
SBT plugin for Apache Spark on AWS EMR
Stars: ✭ 57 (-60.96%)
Mutual labels:  emr
KeywordAnalysis
Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends
Stars: ✭ 49 (-66.44%)
Mutual labels:  wordcount
terraform-emr-spark-example
An example Terraform project that will configure a Secure and Customizable Spark Cluster on Amazon EMR.
Stars: ✭ 43 (-70.55%)
Mutual labels:  emr
awesome-tools
curated list of awesome tools and libraries for specific domains
Stars: ✭ 31 (-78.77%)
Mutual labels:  apache-spark
webhdfs
Node.js WebHDFS REST API client
Stars: ✭ 88 (-39.73%)
Mutual labels:  hadoop
TonY
TonY is a framework to natively run deep learning frameworks on Apache Hadoop.
Stars: ✭ 687 (+370.55%)
Mutual labels:  hadoop
isarn-sketches-spark
Routines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (-80.82%)
Mutual labels:  apache-spark
app
Aplicación web para ANDES
Stars: ✭ 12 (-91.78%)
Mutual labels:  emr
1-60 of 397 similar projects