Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )

Stars: ✭ 29 (+3.57%)

Mutual labels: hadoop, bigdata

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-53.57%)

Mutual labels: hadoop, bigdata

Bigdata Notebook

Stars: ✭ 100 (+257.14%)

Mutual labels: hadoop, bigdata

Bigdataguide

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

Stars: ✭ 817 (+2817.86%)

Mutual labels: hadoop, bigdata

Apache Spark Hands On

Educational notes,Hands on problems w/ solutions for hadoop ecosystem

Stars: ✭ 74 (+164.29%)

Mutual labels: hadoop, bigdata

hadoopoffice

HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)

Stars: ✭ 56 (+100%)

Mutual labels: hadoop, bigdata

Big data architect skills

一个大数据架构师应该掌握的技能

Stars: ✭ 400 (+1328.57%)

Mutual labels: hadoop, bigdata

Bigdata Interview

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

Stars: ✭ 857 (+2960.71%)

Mutual labels: hadoop, bigdata

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (+667.86%)

Mutual labels: hadoop, bigdata

bigdatatutorial

Stars: ✭ 34 (+21.43%)

Mutual labels: hadoop, bigdata

hive to es

同步Hive数据仓库数据到Elasticsearch的小工具

Stars: ✭ 21 (-25%)

Mutual labels: hadoop

2019 egu workshop jupyter notebooks

Short course on interactive analysis of Big Earth Data with Jupyter Notebooks

Stars: ✭ 29 (+3.57%)

Mutual labels: bigdata

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (+39.29%)

Mutual labels: hadoop

datacatalog-tag-manager

Python package to manage Google Cloud Data Catalog tags, loading metadata from external sources -- currently supports the CSV file format

Stars: ✭ 17 (-39.29%)

Mutual labels: bigdata

HDFS-Netdisc

基于Hadoop的分布式云存储系统 🌴

Stars: ✭ 56 (+100%)

Mutual labels: hadoop

gan deeplearning4j

Automatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.

Stars: ✭ 19 (-32.14%)

Mutual labels: bigdata

smart-data-lake

Smart Automation Tool for building modern Data Lakes and Data Pipelines

Stars: ✭ 79 (+182.14%)

Mutual labels: hadoop

lectures-hse-spark

Масштабируемое машинное обучение и анализ больших данных с Apache Spark

Stars: ✭ 20 (-28.57%)

Mutual labels: bigdata

learning-hadoop-and-spark

Companion to Learning Hadoop and Learning Spark courses on Linked In Learning

Stars: ✭ 146 (+421.43%)

Mutual labels: hadoop

skein

A tool and library for easily deploying applications on Apache YARN

Stars: ✭ 128 (+357.14%)

Mutual labels: hadoop

corc

An ORC File Scheme for the Cascading data processing platform.

Stars: ✭ 14 (-50%)

Mutual labels: hadoop

anovos

Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark

Stars: ✭ 77 (+175%)

Mutual labels: bigdata

openPDC

Open Source Phasor Data Concentrator

Stars: ✭ 109 (+289.29%)

Mutual labels: hadoop

TiBigData

TiDB connectors for Flink/Hive/Presto

Stars: ✭ 192 (+585.71%)

Mutual labels: bigdata

awesome-coder-resources

编程路上加油站！------【持续更新中...欢迎star,欢迎常回来看看......】【内容：编程/学习/阅读资源，开源项目,面试题,网站,书,博客,教程等等】

Stars: ✭ 54 (+92.86%)

Mutual labels: bigdata

webhdfs

Node.js WebHDFS REST API client

Stars: ✭ 88 (+214.29%)

Mutual labels: hadoop

BigInsights-on-Apache-Hadoop

Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix

Stars: ✭ 21 (-25%)

Mutual labels: hadoop

Data-pipeline-project

Data pipeline project

Stars: ✭ 18 (-35.71%)

Mutual labels: hadoop

dpkb

大数据相关内容汇总，包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词：Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse

Stars: ✭ 123 (+339.29%)

Mutual labels: hadoop

chatnoir-resiliparse

A robust web archive analytics toolkit

Stars: ✭ 26 (-7.14%)

Mutual labels: bigdata

StreamBench

Measuring the performance of popular streaming engines with Yahoo's Streaming Benchmark

Stars: ✭ 52 (+85.71%)

Mutual labels: bigdata

TonY

TonY is a framework to natively run deep learning frameworks on Apache Hadoop.

Stars: ✭ 687 (+2353.57%)

Mutual labels: hadoop

oci-cloudera

Terraform module to deploy Cloudera on Oracle Cloud Infrastructure (OCI)

Stars: ✭ 20 (-28.57%)

Mutual labels: hadoop

xxhadoop

Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !

Stars: ✭ 37 (+32.14%)

Mutual labels: hadoop

Notes

This is a learning note | Java基础，JVM，源码，大数据，面经

Stars: ✭ 69 (+146.43%)

Mutual labels: bigdata

Clustering4Ever

C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.

Stars: ✭ 126 (+350%)

Mutual labels: bigdata

PersonNotes

个人笔记集中营，快糙猛的形式记录技术性Notes .. 📚☕️⌨️🎧

Stars: ✭ 61 (+117.86%)

Mutual labels: bigdata

bigquery-data-lineage

Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.

Stars: ✭ 112 (+300%)

Mutual labels: bigdata

zdh web

大数据采集,抽取平台

Stars: ✭ 292 (+942.86%)

Mutual labels: bigdata

yarn-prometheus-exporter

Export Hadoop YARN (resource-manager) metrics in prometheus format

Stars: ✭ 44 (+57.14%)

Mutual labels: hadoop

intersect

一道面试题的思考 - 6000万数据包和300万数据包在50M内存使用环境中求交集

Stars: ✭ 54 (+92.86%)

Mutual labels: bigdata

pyspark-ML-in-Colab

Pyspark in Google Colab: A simple machine learning (Linear Regression) model

Stars: ✭ 32 (+14.29%)

Mutual labels: hadoop

hive-bigquery-storage-handler

Hive Storage Handler for interoperability between BigQuery and Apache Hive

Stars: ✭ 16 (-42.86%)

Mutual labels: hadoop

gomrjob

gomrjob - a Go Framework for Hadoop Map Reduce Jobs

Stars: ✭ 39 (+39.29%)

Mutual labels: hadoop

teraslice

Scalable data processing pipelines in JavaScript

Stars: ✭ 48 (+71.43%)

Mutual labels: hadoop

1-60 of 368 similar projects

›

next*5