groda / big_data

Licence: MIT License

A collection of tutorials on Hadoop, MapReduce, Spark, Docker

Programming Languages

Jupyter Notebook

11667 projects

Projects that are alternatives of or similar to big data

Bigdata Notes

大数据入门指南 ⭐

Stars: ✭ 10,991 (+32226.47%)

Mutual labels: big-data, hadoop, bigdata, mapreduce

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (+14.71%)

Mutual labels: big-data, hadoop, pyspark, spark-sql

Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Stars: ✭ 47 (+38.24%)

Mutual labels: big-data, hadoop, spark-sql

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+3835.29%)

Mutual labels: big-data, bigdata, pyspark

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (+341.18%)

Mutual labels: big-data, hadoop, pyspark

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+64747.06%)

Mutual labels: big-data, hadoop, mapreduce

Hadoop For Geoevent

ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.

Stars: ✭ 5 (-85.29%)

Mutual labels: big-data, hadoop, bigdata

Asakusafw

Asakusa Framework

Stars: ✭ 114 (+235.29%)

Mutual labels: big-data, hadoop, mapreduce

bigdatatutorial

Stars: ✭ 34 (+0%)

Mutual labels: hadoop, bigdata, spark-sql

bigdata-doc

大数据学习笔记，学习路线，技术案例整理。

Stars: ✭ 37 (+8.82%)

Mutual labels: hadoop, bigdata, mapreduce

pyspark-algorithms

PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2

Stars: ✭ 72 (+111.76%)

Mutual labels: big-data, pyspark, mapreduce

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (+226.47%)

Mutual labels: big-data, hadoop, pyspark

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-61.76%)

Mutual labels: big-data, hadoop, bigdata

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (+108.82%)

Mutual labels: big-data, bigdata, mapreduce

Bigdata Interview

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

Stars: ✭ 857 (+2420.59%)

Mutual labels: hadoop, bigdata, mapreduce

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (+532.35%)

Mutual labels: big-data, hadoop, bigdata

qs-hadoop

大数据生态圈学习

Stars: ✭ 18 (-47.06%)

Mutual labels: hadoop, bigdata, mapreduce

SparkProgrammingInScala

Apache Spark Course Material

Stars: ✭ 57 (+67.65%)

Mutual labels: big-data, bigdata, spark-sql

hadoopoffice

HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)

Stars: ✭ 56 (+64.71%)

Mutual labels: hadoop, bigdata

rastercube

rastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)

Stars: ✭ 15 (-55.88%)

Mutual labels: big-data, hadoop

View All Similar Projects ➔

Big Data for beginners

Some tutorials and demos on Hadoop, Spark, etc., mostly in the form of Jupyter notebooks.

mapreduce_with_bash.ipynb An introduction to MapReduce using MapReduce Streaming and bash to create mapper and reducer
simplest_mapreduce_bash_wordcount.ipynb A very basic MapReduce wordcount example
mrjob_wordcount.ipynb A simple MapReduce job with mrjob
Hadoop_spilling.ipynb Hadoop spilling explained
TestDFSio.ipynb Demo of TestDFSio for benchmarking Hadoop clusters
docker_for_beginners.md Docker for beginners: an introduction to the world of containers
demoSparkSQLPython.ipynb Pyspark basic demo
ngrams_with_pyspark.ipynb Basic example of ngrams generation with pyspark
Encoding+dataframe+columns.ipynb Encoding Spark dataframe columns
Unicode.ipynb Exploring Unicode categories ()
polynomial_regression.ipynb Worked out example of polynomial regression with numpy
generate_data_with_Faker.ipynb Generate fake data with the Faker Python library
online_resources.md Online resources for learning Big Data

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

groda / big_data

Programming Languages

Labels

Projects that are alternatives of or similar to big data

Big Data for beginners