CheckChe0803 / Bigdata Interview
🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857
Projects that are alternatives of or similar to Bigdata Interview
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+1182.5%)
Mutual labels: kafka, spark, hadoop, bigdata, mapreduce, hbase, hdfs, yarn
God Of Bigdata
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+601.05%)
Mutual labels: kafka, spark, hadoop, bigdata, flink, hbase, hdfs
Repository
个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-89.26%)
Mutual labels: kafka, spark, hadoop, flink, mapreduce, hbase, hdfs
Bigdataguide
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (-4.67%)
Mutual labels: kafka, spark, hadoop, bigdata, flink, hbase
bigdata-doc
大数据学习笔记,学习路线,技术案例整理。
Stars: ✭ 37 (-95.68%)
Mutual labels: hadoop, bigdata, hdfs, mapreduce, flink
fastdata-cluster
Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-97.67%)
Mutual labels: spark, yarn, hadoop, hdfs, flink
Wedatasphere
WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (-56.59%)
Mutual labels: kafka, spark, hadoop, hbase
bigdata-fun
A complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-98.37%)
Mutual labels: spark, hadoop, hbase, hdfs
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (-52.63%)
Mutual labels: spark, hadoop, hbase, hdfs
Javaorbigdata Interview
Java开发者或者大数据开发者面试知识点整理
Stars: ✭ 203 (-76.31%)
Mutual labels: spark, hadoop, bigdata, interview
Sparkstreaming
💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算);🚀 支持运行过程中增删topic;🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。
Stars: ✭ 179 (-79.11%)
Mutual labels: kafka, spark, flink, hbase
dockerfiles
Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
Stars: ✭ 29 (-96.62%)
Mutual labels: hadoop, bigdata, hbase, flink
Flink Learning
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Stars: ✭ 11,378 (+1227.65%)
Mutual labels: kafka, spark, flink, hbase
Bdp Dataplatform
大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Stars: ✭ 456 (-46.79%)
Mutual labels: spark, flink, mapreduce, hbase
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-98.48%)
Mutual labels: spark, hadoop, bigdata, hdfs
wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-97.78%)
Mutual labels: yarn, hadoop, hbase, hdfs
大数据面试题汇总与答案分享
Hadoop | Hive | Spark | Flink | HBase | Kafka | Zookeeper |
一、Hadoop
-
讲一下环形缓冲区的概念
二、Hive
三、Spark
- 讲一下spark 的运行架构
- 一个spark程序的执行流程
- spark的shuffle介绍
- Spark的 partitioner 都有哪些?
- spark 有哪几种join
- RDD有哪些特点
- 讲一下宽依赖和窄依赖
- Spark中的算子都有哪些
- RDD的缓存级别都有哪些
- RDD 懒加载是什么意思
- 讲一下spark的几种部署方式
- spark on yarn 模式下的 cluster模式和 client模式有什么区别
- spark运行原理,从提交一个jar到最后返回结果,整个过程
- spark的stage是如何划分的
- spark的rpc: spark2.0为什么放弃了akka 而用netty?
- spark的各种HA, master/worker/executor/driver/task的ha
- spark的内存管理机制,spark 1.6前后分析对比, spark2.0 做出来哪些优化
- 讲一下spark 中的广播变量
- 什么是数据倾斜,怎样去处理数据倾斜
- 分析一下一段spark代码中哪些部分在Driver端执行,哪些部分在Worker端执行
四、Flink
五、HBase
- 讲一下 Hbase 架构
- hbase 如何设计 rowkey
- 讲一下hbase的存储结构,这样的存储结构有什么优缺点
- hbase的HA实现,zookeeper在其中的作用
- HMaster宕机的时候,哪些操作还能正常工作
- 讲一下hbase的写数据的流程
- 讲一下hbase读数据的流程
六、Kafka
七、Zookeeper
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].