Top 123 hive open source projects

incubator-linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
BigData-News
基于Spark2.2新闻网大数据实时系统项目
spark-acid
ACID Data Source for Apache Spark based on Hive ACID
Addax
Addax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.
Beezig
🐝 Beezig - The Hive plugin for 5zig.
docker-hive
Docker image for Apache Hive Metastore
swordfish
Open-source distribute workflow schedule tools, also support streaming task.
litemall-dw
基于开源Litemall电商项目的大数据项目,包含前端埋点(openresty+lua)、后端埋点;数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化),同时也包含了Azkaban的workflow。
documentr
A naive solution to document schemas
dlux open token
DLUX distributed deterministic finite state automata. Built for HIVE to take advantage of free transactions using multi-sig and escrow for security.
GooglePlay-Web-Crawler
Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive
TitanDataOperationSystem
最好的大数据项目。《Titan数据运营系统》,本项目是一个全栈闭环系统,我们有用作数据可视化的web系统,然后用flume-kafaka-flume进行日志的读取,在hive设计数仓,编写spark代码进行数仓表之间的转化以及ads层表到mysql的迁移,使用azkaban进行定时任务的调度,使用技术:Java/Scala语言,Hadoop、Spark、Hive、Kafka、Flume、Azkaban、SpringBoot,Bootstrap, Echart等;
EngineeringTeam
와이빅타 엔지니어링팀의 자료를 정리해두는 곳입니다.
cloud
云计算之hadoop、hive、hue、oozie、sqoop、hbase、zookeeper环境搭建及配置文件
spark-waimai
基于spark的外卖大数据平台分析系统
ETL-Starter-Kit
📁 Extract, Transform, Load (ETL) 👷 refers to a process in database usage and especially in data warehousing. This repository contains a starter kit featuring ETL related work.
hivemind
Hive API server (offloads most API calls from hived) implemented using Python+SQL
cobra-policytool
Manage Apache Atlas and Ranger configuration for your Hadoop environment.
hiveberg
Demonstration of a Hive Input Format for Iceberg
waggle-dance
Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
apiary
Apiary provides modules which can be combined to create a federated cloud data lake
aaocp
一个对用户行为日志进行分析的大数据项目
HiveJdbcStorageHandler
No description or website provided.
hive-jdbc-driver
An alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC
hive-cube
Data self exporting and monitoring platform based on Hive data warehouse. https://hc.smartloli.org
liquibase-impala
Liquibase extension to add Impala Database support
hadoop-etl-udfs
The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
beekeeper
Service for automatically managing and cleaning up unreferenced data
databricks-dbapi
DBAPI and SQLAlchemy dialect for Databricks Workspace and SQL Analytics clusters
hadoopoffice
HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
simple-ddl-parser
Simple DDL Parser to parse SQL (HQL, TSQL, AWS Redshift, BigQuery, Snowflake and other dialects) ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc. & table properties, types, domains, etc.
data-profiling
a set of scripts to pull meta data and data profiling metrics from relational database systems
xxhadoop
Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !
hiveql-parser
HiveQL Parser. Parse HiveQL code and print AST in JSON format if success, else print well formed syntax error message.
common-datax
基于DataX的通用数据同步微服务,一个Restful接口搞定所有通用数据同步
fense
Fense is a database proxy written in Java, which can connect DB of different engines at the same time. The key features are: authority management, query cache, audit security, current limiting fuse, onesql and so on
dockerfiles
Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
hive-bigquery-storage-handler
Hive Storage Handler for interoperability between BigQuery and Apache Hive
last fm
A simple app to demonstrate a testable, maintainable, and scalable architecture for flutter. flutter_bloc, get_it, hive, and REST API are some of the tech stacks used in this project.
hive to es
同步Hive数据仓库数据到Elasticsearch的小工具
regln
Windows Rregistry Linking Utility
logparser
Easy parsing of Apache HTTPD and NGINX access logs with Java, Hadoop, Hive, Pig, Flink, Beam, Storm, Drill, ...
Mzinga
Open-source software to play the board game Hive.
smart-data-lake
Smart Automation Tool for building modern Data Lakes and Data Pipelines
bigdata-doc
大数据学习笔记,学习路线,技术案例整理。
TiBigData
TiDB connectors for Flink/Hive/Presto
dpkb
大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse
HiveRunner
An Open Source unit test framework for Hive queries based on JUnit 4 and 5
DataX-src
DataX 是异构数据广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。
61-120 of 123 hive projects