Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )

Stars: ✭ 29 (-60.81%)

Mutual labels: hive, hadoop, bigdata

swordfish

Open-source distribute workflow schedule tools, also support streaming task.

Stars: ✭ 35 (-52.7%)

Mutual labels: spark, hive, hadoop

BigData-News

基于Spark2.2新闻网大数据实时系统项目

Stars: ✭ 36 (-51.35%)

Mutual labels: spark, hive, hadoop

Szt Bigdata

深圳地铁大数据客流分析系统🚇🚄🌟

Stars: ✭ 826 (+1016.22%)

Mutual labels: spark, hadoop, hive

Javaorbigdata Interview

Java开发者或者大数据开发者面试知识点整理

Stars: ✭ 203 (+174.32%)

Mutual labels: spark, hadoop, bigdata

Bigdata docker

Big Data Ecosystem Docker

Stars: ✭ 161 (+117.57%)

Mutual labels: spark, hadoop, hive

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-82.43%)

Mutual labels: spark, hadoop, bigdata

Spline

Data Lineage Tracking And Visualization Solution

Stars: ✭ 306 (+313.51%)

Mutual labels: spark, hadoop, bigdata

hadoopoffice

HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)

Stars: ✭ 56 (-24.32%)

Mutual labels: hive, hadoop, bigdata

Bigdata Notebook

Stars: ✭ 100 (+35.14%)

Mutual labels: spark, hadoop, bigdata

yuzhouwan

Code Library for My Blog

Stars: ✭ 39 (-47.3%)

Mutual labels: spark, hadoop, bigdata

Wedatasphere

WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!

Stars: ✭ 372 (+402.7%)

Mutual labels: spark, hadoop, hive

View All Similar Projects ➔

For the benefit of community, Please feel free to add/request anything that hasnt been covered. Please remember this is beginners guide and not a expert level documentation.

Hadoop

/Flume : contains notes and examples of apache flume
/Hive : contains notes and examples of apache hive
/MySQL : code sample containing peices to create db, create table and load data in mysql
/Sqoop : contains notes and examples of import/export using sqoop
/spark : contains notes,documentation, sample example(s) of spark APIs

Hands-on :

/exam : sample cca-175 exam questions and solutions (in solution branch)
/problem1 - complex data structure handling using hive. (exposure to Hive,create table,LOAD,named_struct,struct)
/problem2 - Stock data analysis. (exposure to : json file handing, SparkSQL,map,reduce,filter,join,groupByKey,keyBy,UDFs etc)
/problem3 - MovieLens database analysis
/problem4 - Lahman's baseball database analysis
/problem5 - Hortonworks certification sample. Total 10 tasks .
/Tweeter - Tweeter data analysis
/problem6 - Retail database sample excercises

My Answers to few PySpark Questions on StackOverFlow : Link

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 74

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗