qweryA SQL-like language for performing ETL transformations.
incubator-linkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
spark-acidACID Data Source for Apache Spark based on Hive ACID
AddaxAddax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.
Beezig🐝 Beezig - The Hive plugin for 5zig.
mutant-swarmMutation testing framework and code coverage for Hive SQL
swordfishOpen-source distribute workflow schedule tools, also support streaming task.
litemall-dw基于开源Litemall电商项目的大数据项目,包含前端埋点(openresty+lua)、后端埋点;数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化),同时也包含了Azkaban的workflow。
documentrA naive solution to document schemas
dlux open tokenDLUX distributed deterministic finite state automata. Built for HIVE to take advantage of free transactions using multi-sig and escrow for security.
TitanDataOperationSystem最好的大数据项目。《Titan数据运营系统》,本项目是一个全栈闭环系统,我们有用作数据可视化的web系统,然后用flume-kafaka-flume进行日志的读取,在hive设计数仓,编写spark代码进行数仓表之间的转化以及ads层表到mysql的迁移,使用azkaban进行定时任务的调度,使用技术:Java/Scala语言,Hadoop、Spark、Hive、Kafka、Flume、Azkaban、SpringBoot,Bootstrap, Echart等;
cloud云计算之hadoop、hive、hue、oozie、sqoop、hbase、zookeeper环境搭建及配置文件
ETL-Starter-Kit📁 Extract, Transform, Load (ETL) 👷 refers to a process in database usage and especially in data warehousing. This repository contains a starter kit featuring ETL related work.
hivemindHive API server (offloads most API calls from hived) implemented using Python+SQL
cobra-policytoolManage Apache Atlas and Ranger configuration for your Hadoop environment.
hivebergDemonstration of a Hive Input Format for Iceberg
waggle-danceHive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
apiaryApiary provides modules which can be combined to create a federated cloud data lake
hive-jdbc-driverAn alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC
hive-cubeData self exporting and monitoring platform based on Hive data warehouse. https://hc.smartloli.org
hadoop-etl-udfsThe Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
beekeeperService for automatically managing and cleaning up unreferenced data
databricks-dbapiDBAPI and SQLAlchemy dialect for Databricks Workspace and SQL Analytics clusters
hadoopofficeHadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
simple-ddl-parserSimple DDL Parser to parse SQL (HQL, TSQL, AWS Redshift, BigQuery, Snowflake and other dialects) ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc. & table properties, types, domains, etc.
data-profilinga set of scripts to pull meta data and data profiling metrics from relational database systems
xxhadoopData Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !
hiveql-parserHiveQL Parser. Parse HiveQL code and print AST in JSON format if success, else print well formed syntax error message.
fenseFense is a database proxy written in Java, which can connect DB of different engines at the same time. The key features are: authority management, query cache, audit security, current limiting fuse, onesql and so on
dockerfilesMulti docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
the-apache-ignite-bookAll code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above
last fmA simple app to demonstrate a testable, maintainable, and scalable architecture for flutter. flutter_bloc, get_it, hive, and REST API are some of the tech stacks used in this project.
reglnWindows Rregistry Linking Utility
logparserEasy parsing of Apache HTTPD and NGINX access logs with Java, Hadoop, Hive, Pig, Flink, Beam, Storm, Drill, ...
MzingaOpen-source software to play the board game Hive.
smart-data-lakeSmart Automation Tool for building modern Data Lakes and Data Pipelines
TiBigDataTiDB connectors for Flink/Hive/Presto
dpkb大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse
HiveRunnerAn Open Source unit test framework for Hive queries based on JUnit 4 and 5
Sub-TrackFlutter Application to keep track of Subscriptions
DataX-srcDataX 是异构数据广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。