Top 123 hive open source projects

A SQL-like language for performing ETL transformations.

✭ 28

scala cli aws tsv json query sdk csv sql kafka hive avro athena etl s3 kafka-consumer kafka-producer delimited-data etl-framework psv delimited

incubator-linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

BigData-News

基于Spark2.2新闻网大数据实时系统项目

✭ 36

scala java shell kafka spark hive hadoop hbase flume cdh5 sturctured-streaming

spark-acid

ACID Data Source for Apache Spark based on Hive ACID

✭ 91

scala ANTLR shell big-data spark hive acid hive-acid

Addax

Addax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.

✭ 615

java python mysql database influxdb hive hadoop etl clickhouse excel kudu impala oracle db2 sqlserver data-integrity datax trino prestosql addax

Beezig

🐝 Beezig - The Hive plugin for 5zig.

✭ 16

java plugin minecraft hive pvp hivemc 5zig timv

docker-hive

Docker image for Apache Hive Metastore

✭ 42

Dockerfile shell Makefile docker hive

mutant-swarm

Mutation testing framework and code coverage for Hive SQL

✭ 20

java StringTemplate CSS coverage sql hive mutation-testing coverage-report junit code-coverage unit-test sql-coverage

swordfish

Open-source distribute workflow schedule tools, also support streaming task.

✭ 35

java python spark hive hadoop scheduler hbase

litemall-dw

基于开源Litemall电商项目的大数据项目，包含前端埋点(openresty+lua)、后端埋点；数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化)，同时也包含了Azkaban的workflow。

✭ 36

java javascript Vue scala SCSS CSS redis vagrant kafka spring-boot hive solr clickhouse hbase spark-streaming openresty flume oozie flink azkaban spark-sql maxwell cdh6

documentr

A naive solution to document schemas

✭ 24

python HTML Makefile shell documentation sql hive

TIL

Today I Learned

✭ 43

shell kubernetes raspberry-pi elasticsearch mongo query kibana logstash sql kafka spring spring-boot hive hadoop pipeline algorithms

dlux open token

DLUX distributed deterministic finite state automata. Built for HIVE to take advantage of free transactions using multi-sig and escrow for security.

✭ 16

javascript hive ipfs dlux

GooglePlay-Web-Crawler

Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive

✭ 18

java PigLatin emr aws hive hadoop nutch s3 pig mapreduce

TitanDataOperationSystem

最好的大数据项目。《Titan数据运营系统》，本项目是一个全栈闭环系统，我们有用作数据可视化的web系统，然后用flume-kafaka-flume进行日志的读取，在hive设计数仓，编写spark代码进行数仓表之间的转化以及ads层表到mysql的迁移，使用azkaban进行定时任务的调度，使用技术：Java/Scala语言，Hadoop、Spark、Hive、Kafka、Flume、Azkaban、SpringBoot，Bootstrap， Echart等；

✭ 62

javascript HTML CSS SCSS java scala kafka spark hive hadoop flume azkaban

EngineeringTeam

와이빅타 엔지니어링팀의 자료를 정리해두는 곳입니다.

✭ 41

engineering sql kafka spark hive hadoop nosql crawling ybigta

cloud

云计算之hadoop、hive、hue、oozie、sqoop、hbase、zookeeper环境搭建及配置文件

✭ 48

shell hive hadoop hbase zookeeper pig flume oozie hue sqoop flume-ng

spark-waimai

基于spark的外卖大数据平台分析系统

✭ 24

scala python shell spark hive

ETL-Starter-Kit

📁 Extract, Transform, Load (ETL) 👷 refers to a process in database usage and especially in data warehousing. This repository contains a starter kit featuring ETL related work.

✭ 21

scala groovy hive gradle bigdata datascience pig scalding azkaban datamining starter-project etl-framework mapreduce-jobs

hivemind

Hive API server (offloads most API calls from hived) implemented using Python+SQL

✭ 46

python PLpgSQL shell HTML CMake Dockerfile hive blockchain web3 decentralization communities dapps

cobra-policytool

Manage Apache Atlas and Ranger configuration for your Hadoop environment.

✭ 16

python utility database hive hadoop ranger datawarehouse atlas

hiveberg

Demonstration of a Hive Input Format for Iceberg

✭ 22

java hive data-lake iceberg

web-click-flow

网站点击流离线日志分析

✭ 14

java shell python hive hadoop etl mapreduce flume sqoop

BigDataTools

tools for bigData

✭ 36

java elasticsearch kafka hive bigdata hbase hdfs

waggle-dance

Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.

✭ 194

java shell hive federation metastore hive-metastore

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

✭ 24

scala shell Dockerfile json csv apache-spark hive hadoop avro etl parquet transformation-rules etl-framework etl-pipeline join-data

apiary

Apiary provides modules which can be combined to create a federated cloud data lake

✭ 30

aws hive datalake hive-metastore

aaocp

一个对用户行为日志进行分析的大数据项目

✭ 53

PLpgSQL scala nginx spark hive hadoop hbase zookeeper hdfs flume echarts

HiveJdbcStorageHandler

No description or website provided.

✭ 21

java hive jdbc storagehandler

hive-jdbc-driver

An alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC

✭ 31

java Thrift shell hive hadoop jdbc apache thrift

hive-cube

Data self exporting and monitoring platform based on Hive data warehouse. https://hc.smartloli.org

✭ 34

javascript java shell CSS Batchfile PLpgSQL hive hive-cube

liquibase-impala

Liquibase extension to add Impala Database support

✭ 23

java hive hadoop impala database-migrations liquibase

hadoop-etl-udfs

The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL

✭ 17

java python hive hadoop parquet udf exasol hcatalog user-defined-function exasol-integration

beekeeper

Service for automatically managing and cleaning up unreferenced data

✭ 43

java big-data hive s3 maintenance cleanup metastore hive-metastore oss-portal-featured

databricks-dbapi

DBAPI and SQLAlchemy dialect for Databricks Workspace and SQL Analytics clusters

✭ 21

python Makefile sqlalchemy hive dbapi databricks

hadoopoffice

HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)

✭ 56

java scala shell spark hive hadoop excel bigdata office poi flink hadoop-ecosystem hadoopoffice analyze-office-documents

simple-ddl-parser

Simple DDL Parser to parse SQL (HQL, TSQL, AWS Redshift, BigQuery, Snowflake and other dialects) ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc. & table properties, types, domains, etc.

✭ 76

python mysql parser sql hive postgresql snowflake sql-parser ddl schemas mssql oracle-db columns redshift hacktoberfest oracle-database ddl-parser hql ddls

data-profiling

a set of scripts to pull meta data and data profiling metrics from relational database systems

✭ 57

python metadata sql database hive inventory oracle sqlserver data-profiling

xxhadoop

Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !

✭ 37

java scala shell elasticsearch kafka spark hive hadoop storm hbase zookeeper spark-streaming mr hadoop-rpc

hiveql-parser

HiveQL Parser. Parse HiveQL code and print AST in JSON format if success, else print well formed syntax error message.

✭ 25

java parser sql hive syntax-checker

common-datax

基于DataX的通用数据同步微服务，一个Restful接口搞定所有通用数据同步

✭ 51

java FreeMarker redis hive freemarker mybatis azkaban mybatis-plus datax springboot2 dynamic-datasource

BigInsights-on-Apache-Hadoop

Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix

✭ 21

spark hive hadoop hbase spark-streaming ibm-bluemix oozie ambari zeppelin webhdfs knox biginsights bigsql

awesome-hive

A curated list of awesome Hive resources.

✭ 20

awesome hive decentralized blockchain blog-engine collections developer-tools awesome-list communities blockchain-platform dapps censorship-resistance

fense

Fense is a database proxy written in Java, which can connect DB of different engines at the same time. The key features are: authority management, query cache, audit security, current limiting fuse, onesql and so on

✭ 22

javascript java CSS Less HTML mysql hive clickhouse cache avatica calcite tidb doris dbproxy oneservice onesql

dockerfiles

Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )

✭ 29

shell Dockerfile python Makefile Batchfile XSLT javascript dockerfile kafka spark cassandra hive hadoop docker-image bigdata hbase zookeeper mesos hue flink zeppelin drill

the-apache-ignite-book

All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above

✭ 65

java streaming memoization sql spark hive hadoop spring-data bigdata hibernate distributed-database ignite nosql-database in-memory-database streaming-data gridgain hibernate-ogm in-memory-computations in-memory-caching

hive-bigquery-storage-handler

Hive Storage Handler for interoperability between BigQuery and Apache Hive

✭ 16

java Dockerfile bigquery google hive hadoop gcp apache

last fm

A simple app to demonstrate a testable, maintainable, and scalable architecture for flutter. flutter_bloc, get_it, hive, and REST API are some of the tech stacks used in this project.

✭ 134