All Projects → Pushkr → Apache Spark Hands On

Pushkr / Apache Spark Hands On

Educational notes,Hands on problems w/ solutions for hadoop ecosystem

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Apache Spark Hands On

Bigdataguide
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (+1004.05%)
Mutual labels:  spark, hadoop, bigdata, hive
Hadoopcryptoledger
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (+70.27%)
Mutual labels:  spark, hadoop, bigdata, hive
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+14752.7%)
Mutual labels:  spark, hadoop, bigdata, hive
God Of Bigdata
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+8018.92%)
Mutual labels:  spark, hadoop, bigdata, hive
bigdata-doc
大数据学习笔记,学习路线,技术案例整理。
Stars: ✭ 37 (-50%)
Mutual labels:  hive, hadoop, bigdata
Sparkrdma
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+190.54%)
Mutual labels:  spark, hadoop, bigdata
the-apache-ignite-book
All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above
Stars: ✭ 65 (-12.16%)
Mutual labels:  hive, hadoop, bigdata
Bigdata Interview
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+1058.11%)
Mutual labels:  spark, hadoop, bigdata
dockerfiles
Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
Stars: ✭ 29 (-60.81%)
Mutual labels:  hive, hadoop, bigdata
swordfish
Open-source distribute workflow schedule tools, also support streaming task.
Stars: ✭ 35 (-52.7%)
Mutual labels:  spark, hive, hadoop
BigData-News
基于Spark2.2新闻网大数据实时系统项目
Stars: ✭ 36 (-51.35%)
Mutual labels:  spark, hive, hadoop
Szt Bigdata
深圳地铁大数据客流分析系统🚇🚄🌟
Stars: ✭ 826 (+1016.22%)
Mutual labels:  spark, hadoop, hive
Javaorbigdata Interview
Java开发者或者大数据开发者面试知识点整理
Stars: ✭ 203 (+174.32%)
Mutual labels:  spark, hadoop, bigdata
Bigdata docker
Big Data Ecosystem Docker
Stars: ✭ 161 (+117.57%)
Mutual labels:  spark, hadoop, hive
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-82.43%)
Mutual labels:  spark, hadoop, bigdata
Spline
Data Lineage Tracking And Visualization Solution
Stars: ✭ 306 (+313.51%)
Mutual labels:  spark, hadoop, bigdata
hadoopoffice
HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
Stars: ✭ 56 (-24.32%)
Mutual labels:  hive, hadoop, bigdata
Bigdata Notebook
Stars: ✭ 100 (+35.14%)
Mutual labels:  spark, hadoop, bigdata
yuzhouwan
Code Library for My Blog
Stars: ✭ 39 (-47.3%)
Mutual labels:  spark, hadoop, bigdata
Wedatasphere
WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (+402.7%)
Mutual labels:  spark, hadoop, hive

For the benefit of community, Please feel free to add/request anything that hasnt been covered. Please remember this is beginners guide and not a expert level documentation.

Hadoop

  • /Flume : contains notes and examples of apache flume
  • /Hive : contains notes and examples of apache hive
  • /MySQL : code sample containing peices to create db, create table and load data in mysql
  • /Sqoop : contains notes and examples of import/export using sqoop
  • /spark : contains notes,documentation, sample example(s) of spark APIs

Hands-on :

  • /exam : sample cca-175 exam questions and solutions (in solution branch)
  • /problem1 - complex data structure handling using hive. (exposure to Hive,create table,LOAD,named_struct,struct)
  • /problem2 - Stock data analysis. (exposure to : json file handing, SparkSQL,map,reduce,filter,join,groupByKey,keyBy,UDFs etc)
  • /problem3 - MovieLens database analysis
  • /problem4 - Lahman's baseball database analysis
  • /problem5 - Hortonworks certification sample. Total 10 tasks .
  • /Tweeter - Tweeter data analysis
  • /problem6 - Retail database sample excercises

My Answers to few PySpark Questions on StackOverFlow : Link

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].