📁 Extract, Transform, Load (ETL) 👷 refers to a process in database usage and especially in data warehousing. This repository contains a starter kit featuring ETL related work.

Stars: ✭ 21 (-99.65%)

Mutual labels: hive, bigdata, azkaban

Technology Talk

汇总java生态圈常用技术框架、开源中间件，系统架构、数据库、大公司架构案例、常用三方类库、项目管理、线上问题排查、个人成长、思考等知识

Stars: ✭ 12,136 (+102%)

Mutual labels: kafka, spark, hbase

Bigdata practice

大数据分析可视化实践

Stars: ✭ 166 (-97.24%)

Mutual labels: kafka, bigdata, hive

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (-97.05%)

Mutual labels: kafka, hadoop, hbase

Ibis

A pandas-like deferred expression system, with first-class SQL support

Stars: ✭ 1,630 (-72.87%)

Mutual labels: hadoop, hdfs, spark

Eel Sdk

Big Data Toolkit for the JVM

Stars: ✭ 140 (-97.67%)

Mutual labels: kafka, hadoop, hive

Gimel

Big Data Processing Framework - Unified Data API or SQL on Any Storage

Stars: ✭ 216 (-96.4%)

Mutual labels: kafka, spark, hbase

Every Single Day I Tldr

A daily digest of the articles or videos I've found interesting, that I want to share with you.

Stars: ✭ 249 (-95.86%)

Mutual labels: kafka, spark, bigdata

Kafka Connect Hdfs

Kafka Connect HDFS connector

Stars: ✭ 400 (-93.34%)

Mutual labels: kafka, hadoop, hdfs

Devicehive Java Server

DeviceHive Java Server

Stars: ✭ 241 (-95.99%)

Mutual labels: zookeeper, kafka

docker-hadoop

Docker image for main Apache Hadoop components (Yarn/Hdfs)

Stars: ✭ 59 (-99.02%)

Mutual labels: hadoop, hdfs

phoenix

Apache Phoenix / Hbase Spring Boot Microservices

Stars: ✭ 23 (-99.62%)

Mutual labels: hadoop, hbase

Thunder

⚡️ Nepxion Thunder is a distribution RPC framework based on Netty + Hessian + Kafka + ActiveMQ + Tibco + Zookeeper + Redis + Spring Web MVC + Spring Boot + Docker 多协议、多组件、多序列化的分布式RPC调用框架

Stars: ✭ 204 (-96.6%)

Mutual labels: zookeeper, kafka

bigdatatutorial

Stars: ✭ 34 (-99.43%)

Mutual labels: hadoop, bigdata

kafka-connect-fs

Kafka Connect FileSystem Connector

Stars: ✭ 107 (-98.22%)

Mutual labels: hadoop, hdfs

Zenko

Zenko is the open source multi-cloud data controller: own and keep control of your data on any cloud.

Stars: ✭ 353 (-94.12%)

Mutual labels: zookeeper, kafka

orion

Management and automation platform for Stateful Distributed Systems

Stars: ✭ 77 (-98.72%)

Mutual labels: hadoop, hbase

teraslice

Scalable data processing pipelines in JavaScript

Stars: ✭ 48 (-99.2%)

Mutual labels: hadoop, hdfs

Firecamp

Serverless Platform for the stateful services

Stars: ✭ 194 (-96.77%)

Mutual labels: zookeeper, kafka

Lidea

大型分布式系统实时监控平台

Stars: ✭ 28 (-99.53%)

Mutual labels: hbase, flink

Real-time-log-analysis-system

🐧基于spark streaming+flume+kafka+hbase的实时日志处理分析系统(分为控制台版本和基于springboot、Echarts等的Web UI可视化版本)

Stars: ✭ 31 (-99.48%)

Mutual labels: hbase, flume

logparser

Easy parsing of Apache HTTPD and NGINX access logs with Java, Hadoop, Hive, Pig, Flink, Beam, Storm, Drill, ...

Stars: ✭ 139 (-97.69%)

Mutual labels: hive, flink

smart-data-lake

Smart Automation Tool for building modern Data Lakes and Data Pipelines

Stars: ✭ 79 (-98.69%)

Mutual labels: hive, hadoop

HDFS-Netdisc

基于Hadoop的分布式云存储系统 🌴

Stars: ✭ 56 (-99.07%)

Mutual labels: hadoop, hdfs

hive-bigquery-storage-handler

Hive Storage Handler for interoperability between BigQuery and Apache Hive

Stars: ✭ 16 (-99.73%)

Mutual labels: hive, hadoop

Iceberg

Iceberg is a table format for large, slow-moving tabular data

Stars: ✭ 393 (-93.46%)

Mutual labels: spark, hadoop

Kafdrop

Kafka Web UI

Stars: ✭ 3,158 (-47.44%)

Mutual labels: zookeeper, kafka

qs-hadoop

大数据生态圈学习

Stars: ✭ 18 (-99.7%)

Mutual labels: hadoop, bigdata

disk

基于hadoop+hbase+springboot实现分布式网盘系统

Stars: ✭ 53 (-99.12%)

Mutual labels: hadoop, hbase

Agile data code 2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

Stars: ✭ 413 (-93.13%)

Mutual labels: kafka, spark

Ytk Learn

Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).

Stars: ✭ 337 (-94.39%)

Mutual labels: spark, hadoop

skein

A tool and library for easily deploying applications on Apache YARN

Stars: ✭ 128 (-97.87%)

Mutual labels: hadoop, hdfs

Kafka Connect Ui

Web tool for Kafka Connect |

Stars: ✭ 388 (-93.54%)

Mutual labels: kafka, hdfs

learning-spark

Tidy up Spark and Hadoop tutorials.

Stars: ✭ 28 (-99.53%)

Mutual labels: hadoop, bigdata

mango

Core utility library & data connectors designed for simpler usage in Scala

Stars: ✭ 41 (-99.32%)

Mutual labels: hbase, zookeeper

coolplayflink

Flink: Stateful Computations over Data Streams

Stars: ✭ 14 (-99.77%)

Mutual labels: bigdata, flink

Wirbelsturm

Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.

Stars: ✭ 332 (-94.47%)

Mutual labels: kafka, spark

hadoop-etl-udfs

The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL

Stars: ✭ 17 (-99.72%)

Mutual labels: hive, hadoop

liquibase-impala

Liquibase extension to add Impala Database support

Stars: ✭ 23 (-99.62%)

Mutual labels: hive, hadoop

common-datax

基于DataX的通用数据同步微服务，一个Restful接口搞定所有通用数据同步

Stars: ✭ 51 (-99.15%)

Mutual labels: hive, azkaban

flink-learn

Learning Flink : Flink CEP,Flink Core,Flink SQL

Stars: ✭ 70 (-98.83%)

Mutual labels: bigdata, flink

Moonbox

Moonbox is a DVtaaS (Data Virtualization as a Service) Platform

Stars: ✭ 424 (-92.94%)

Mutual labels: spark, hive

Gather Deployment

Gathers scalable tensorflow and infrastructure deployment

Stars: ✭ 326 (-94.57%)

Mutual labels: kafka, hadoop

hive-jdbc-driver

An alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC

Stars: ✭ 31 (-99.48%)

Mutual labels: hive, hadoop

darwin

Avro Schema Evolution made easy

Stars: ✭ 26 (-99.57%)

Mutual labels: hadoop, hbase

datasqueeze

Hadoop utility to compact small files

Stars: ✭ 18 (-99.7%)

Mutual labels: hadoop, hdfs

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-99.6%)

Mutual labels: hive, hadoop

Marmaray

Generic Data Ingestion & Dispersal Library for Hadoop

Stars: ✭ 414 (-93.11%)

Mutual labels: spark, hadoop

fsbrowser

Fast desktop client for Hadoop Distributed File System

Stars: ✭ 27 (-99.55%)

Mutual labels: hadoop, hdfs

flokkr

Documentation placeholder and utilities for all the other containers.