Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+209.39%)
Python BigdataData science and Big Data with Python
Stars: ✭ 112 (-59.57%)
Flink Learningflink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Stars: ✭ 11,378 (+4007.58%)
Spring Boot Quick🌿 基于springboot的快速学习示例,整合自己遇到的开源框架,如:rabbitmq(延迟队列)、Kafka、jpa、redies、oauth2、swagger、jsp、docker、spring-batch、异常处理、日志输出、多模块开发、多环境打包、缓存cache、爬虫、jwt、GraphQL、dubbo、zookeeper和Async等等📌
Stars: ✭ 1,819 (+556.68%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-66.79%)
Bigdataguide大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (+194.95%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-94.95%)
Technology Talk汇总java生态圈常用技术框架、开源中间件,系统架构、数据库、大公司架构案例、常用三方类库、项目管理、线上问题排查、个人成长、思考等知识
Stars: ✭ 12,136 (+4281.23%)
HeraclesHigh performance HBase / Spark SQL engine
Stars: ✭ 27 (-90.25%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (-22.02%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+46.57%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (+34.3%)
Hadoop cookbookCookbook to install Hadoop 2.0+ using Chef
Stars: ✭ 82 (-70.4%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+492.78%)
God Of Bigdata专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+2068.95%)
Bdp Dataplatform大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Stars: ✭ 456 (+64.62%)
Dockerfiles50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu
Stars: ✭ 847 (+205.78%)
yuzhouwanCode Library for My Blog
Stars: ✭ 39 (-85.92%)
Sparkstreaming💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算);🚀 支持运行过程中增删topic;🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。
Stars: ✭ 179 (-35.38%)
swordfishOpen-source distribute workflow schedule tools, also support streaming task.
Stars: ✭ 35 (-87.36%)
daf-kyloKylo integration with PDND (previously DAF).
Stars: ✭ 20 (-92.78%)
spark-extensionA library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-90.97%)
CasperA compiler for automatically re-targeting sequential Java code to Apache Spark.
Stars: ✭ 45 (-83.75%)
Big Data Rosetta CodeCode snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Stars: ✭ 254 (-8.3%)
dllibdllib is a distributed deep learning library running on Apache Spark
Stars: ✭ 32 (-88.45%)
libPerl Utility Library for my other repos
Stars: ✭ 16 (-94.22%)
smolderHL7 Apache Spark Datasource
Stars: ✭ 33 (-88.09%)
visionsType System for Data Analysis in Python
Stars: ✭ 136 (-50.9%)
spark-demosCollection of different demo applications using Apache Spark
Stars: ✭ 15 (-94.58%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-59.93%)
tpch-sparkTPC-H queries in Apache Spark SQL using native DataFrames API
Stars: ✭ 63 (-77.26%)
spark-data-sourcesDeveloping Spark External Data Sources using the V2 API
Stars: ✭ 36 (-87%)
incubator-linkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+787.73%)
frovedisFramework of vectorized and distributed data analytics
Stars: ✭ 59 (-78.7%)
hadoop-docker-liteDocker build project to setup a lightweight hadoop cluster containing hadoop, pig, zookeeper, hbase, phoenix, storm, kafka, kafka manager
Stars: ✭ 24 (-91.34%)
Spark-PMoFSpark Shuffle Optimization with RDMA+AEP
Stars: ✭ 28 (-89.89%)
prostoProsto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Stars: ✭ 54 (-80.51%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-95.31%)
kafka-compose🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (-88.45%)
HelkThe Hunting ELK
Stars: ✭ 3,097 (+1018.05%)
Sk DistDistributed scikit-learn meta-estimators in PySpark
Stars: ✭ 260 (-6.14%)
bigkubeMinikube for big data with Scala and Spark
Stars: ✭ 16 (-94.22%)
docker-sparkApache Spark docker container image (Standalone mode)
Stars: ✭ 34 (-87.73%)
sentry-sparkApache Spark Sentry Integration
Stars: ✭ 14 (-94.95%)
confluent-spark-avroSpark UDFs to deserialize Avro messages with schemas stored in Schema Registry.
Stars: ✭ 18 (-93.5%)
spark-acidACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (-67.15%)