Bigdata Notes大数据入门指南 ⭐
Stars: ✭ 10,991 (+32226.47%)
Mutual labels: big-data, hadoop, bigdata, mapreduce
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+14.71%)
Mutual labels: big-data, hadoop, pyspark, spark-sql
Movies-Analytics-in-Spark-and-ScalaData cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Stars: ✭ 47 (+38.24%)
Mutual labels: big-data, hadoop, spark-sql
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+3835.29%)
Mutual labels: big-data, bigdata, pyspark
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+341.18%)
Mutual labels: big-data, hadoop, pyspark
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+64747.06%)
Mutual labels: big-data, hadoop, mapreduce
Hadoop For GeoeventArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-85.29%)
Mutual labels: big-data, hadoop, bigdata
AsakusafwAsakusa Framework
Stars: ✭ 114 (+235.29%)
Mutual labels: big-data, hadoop, mapreduce
bigdatatutorialbigdatatutorial
Stars: ✭ 34 (+0%)
Mutual labels: hadoop, bigdata, spark-sql
bigdata-doc大数据学习笔记,学习路线,技术案例整理。
Stars: ✭ 37 (+8.82%)
Mutual labels: hadoop, bigdata, mapreduce
pyspark-algorithmsPySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (+111.76%)
Mutual labels: big-data, pyspark, mapreduce
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+226.47%)
Mutual labels: big-data, hadoop, pyspark
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-61.76%)
Mutual labels: big-data, hadoop, bigdata
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+2420.59%)
Mutual labels: hadoop, bigdata, mapreduce
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+532.35%)
Mutual labels: big-data, hadoop, bigdata
qs-hadoop大数据生态圈学习
Stars: ✭ 18 (-47.06%)
Mutual labels: hadoop, bigdata, mapreduce
SparkProgrammingInScalaApache Spark Course Material
Stars: ✭ 57 (+67.65%)
Mutual labels: big-data, bigdata, spark-sql
hadoopofficeHadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
Stars: ✭ 56 (+64.71%)
Mutual labels: hadoop, bigdata
rastercuberastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-55.88%)
Mutual labels: big-data, hadoop