All Projects → big_data → Similar Projects or Alternatives

795 Open source projects that are alternatives of or similar to big_data

datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+14.71%)
Mutual labels:  big-data, hadoop, pyspark, spark-sql
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+32226.47%)
Mutual labels:  big-data, hadoop, bigdata, mapreduce
Sparkrdma
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+532.35%)
Mutual labels:  big-data, hadoop, bigdata
SparkProgrammingInScala
Apache Spark Course Material
Stars: ✭ 57 (+67.65%)
Mutual labels:  big-data, bigdata, spark-sql
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+341.18%)
Mutual labels:  big-data, hadoop, pyspark
pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (+111.76%)
Mutual labels:  big-data, pyspark, mapreduce
bigdata-doc
大数据学习笔记,学习路线,技术案例整理。
Stars: ✭ 37 (+8.82%)
Mutual labels:  hadoop, bigdata, mapreduce
bigdatatutorial
bigdatatutorial
Stars: ✭ 34 (+0%)
Mutual labels:  hadoop, bigdata, spark-sql
Hadoop For Geoevent
ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-85.29%)
Mutual labels:  big-data, hadoop, bigdata
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-61.76%)
Mutual labels:  big-data, hadoop, bigdata
Bigdata Interview
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+2420.59%)
Mutual labels:  hadoop, bigdata, mapreduce
qs-hadoop
大数据生态圈学习
Stars: ✭ 18 (-47.06%)
Mutual labels:  hadoop, bigdata, mapreduce
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+226.47%)
Mutual labels:  big-data, hadoop, pyspark
Asakusafw
Asakusa Framework
Stars: ✭ 114 (+235.29%)
Mutual labels:  big-data, hadoop, mapreduce
Big Data Engineering Coursera Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Stars: ✭ 71 (+108.82%)
Mutual labels:  big-data, bigdata, mapreduce
Data Science Ipython Notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+64747.06%)
Mutual labels:  big-data, hadoop, mapreduce
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+3835.29%)
Mutual labels:  big-data, bigdata, pyspark
Movies-Analytics-in-Spark-and-Scala
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Stars: ✭ 47 (+38.24%)
Mutual labels:  big-data, hadoop, spark-sql
Drill
Apache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+4661.76%)
Mutual labels:  big-data, hadoop
Hdfs Shell
HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (+244.12%)
Mutual labels:  big-data, hadoop
Gaffer
A large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+4729.41%)
Mutual labels:  big-data, hadoop
Big Data Study
🐳 big data study
Stars: ✭ 141 (+314.71%)
Mutual labels:  big-data, bigdata
Calcite Avatica
Mirror of Apache Calcite - Avatica
Stars: ✭ 130 (+282.35%)
Mutual labels:  big-data, hadoop
Presto
The official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (+38008.82%)
Mutual labels:  big-data, hadoop
Calcite
Apache Calcite
Stars: ✭ 2,816 (+8182.35%)
Mutual labels:  big-data, hadoop
optimus
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+3873.53%)
Mutual labels:  bigdata, pyspark
twitter-archive-reader
Full featured TypeScript Twitter archive reader and browser
Stars: ✭ 43 (+26.47%)
Mutual labels:  big-data, bigdata
v6.dooring.public
可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.
Stars: ✭ 323 (+850%)
Mutual labels:  big-data, bigdata
Griffon Vm
Griffon Data Science Virtual Machine
Stars: ✭ 128 (+276.47%)
Mutual labels:  big-data, hadoop
Genie
Distributed Big Data Orchestration Service
Stars: ✭ 1,544 (+4441.18%)
Mutual labels:  big-data, bigdata
Eel Sdk
Big Data Toolkit for the JVM
Stars: ✭ 140 (+311.76%)
Mutual labels:  big-data, hadoop
clusterdock
clusterdock is a framework for creating Docker-based container clusters
Stars: ✭ 26 (-23.53%)
Mutual labels:  big-data, hadoop
Bigdata Playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (+420.59%)
Mutual labels:  big-data, hadoop
Spark R Notebooks
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (+220.59%)
Mutual labels:  big-data, bigdata
Tennis Crystal Ball
Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (+214.71%)
Mutual labels:  big-data, bigdata
Clustering4Ever
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Stars: ✭ 126 (+270.59%)
Mutual labels:  big-data, bigdata
check-engine
Data validation library for PySpark 3.0.0
Stars: ✭ 29 (-14.71%)
Mutual labels:  big-data, pyspark
learning-hadoop-and-spark
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Stars: ✭ 146 (+329.41%)
Mutual labels:  hadoop, mapreduce
Aws Etl Orchestrator
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Stars: ✭ 245 (+620.59%)
Mutual labels:  big-data, bigdata
gomrjob
gomrjob - a Go Framework for Hadoop Map Reduce Jobs
Stars: ✭ 39 (+14.71%)
Mutual labels:  hadoop, mapreduce
Gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+535.29%)
Mutual labels:  big-data, pyspark
awesome-coder-resources
编程路上加油站!------【持续更新中...欢迎star,欢迎常回来看看......】【内容:编程/学习/阅读资源,开源项目,面试题,网站,书,博客,教程等等】
Stars: ✭ 54 (+58.82%)
Mutual labels:  big-data, bigdata
spark-twitter-sentiment-analysis
Sentiment Analysis of a Twitter Topic with Spark Structured Streaming
Stars: ✭ 55 (+61.76%)
Mutual labels:  pyspark, spark-sql
flokkr
Documentation placeholder and utilities for all the other containers.
Stars: ✭ 30 (-11.76%)
Mutual labels:  hadoop, bigdata
dt-sql-parser
SQL Parsers for BigData, built with antlr4.
Stars: ✭ 135 (+297.06%)
Mutual labels:  bigdata, spark-sql
Springboard-Data-Science-Immersive
No description or website provided.
Stars: ✭ 52 (+52.94%)
Mutual labels:  hadoop, pyspark
GooglePlay-Web-Crawler
Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive
Stars: ✭ 18 (-47.06%)
Mutual labels:  hadoop, mapreduce
HadoopDedup
🍉基于Hadoop和HBase的大规模海量数据去重
Stars: ✭ 27 (-20.59%)
Mutual labels:  big-data, mapreduce
iis
Information Inference Service of the OpenAIRE system
Stars: ✭ 16 (-52.94%)
Mutual labels:  big-data, hadoop
web-click-flow
网站点击流离线日志分析
Stars: ✭ 14 (-58.82%)
Mutual labels:  hadoop, mapreduce
dockerfiles
Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
Stars: ✭ 29 (-14.71%)
Mutual labels:  hadoop, bigdata
Data-pipeline-project
Data pipeline project
Stars: ✭ 18 (-47.06%)
Mutual labels:  hadoop, mapreduce
anovos
Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark
Stars: ✭ 77 (+126.47%)
Mutual labels:  bigdata, pyspark
the-apache-ignite-book
All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above
Stars: ✭ 65 (+91.18%)
Mutual labels:  hadoop, bigdata
big-data-lite
Samples to the Oracle Big Data Lite VM
Stars: ✭ 41 (+20.59%)
Mutual labels:  big-data, hadoop
rastercube
rastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-55.88%)
Mutual labels:  big-data, hadoop
learning-spark
Tidy up Spark and Hadoop tutorials.
Stars: ✭ 28 (-17.65%)
Mutual labels:  hadoop, bigdata
hadoopoffice
HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
Stars: ✭ 56 (+64.71%)
Mutual labels:  hadoop, bigdata
sparkucx
A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (-5.88%)
Mutual labels:  big-data, hadoop
pyspark-ML-in-Colab
Pyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (-5.88%)
Mutual labels:  hadoop, pyspark
1-60 of 795 similar projects