All Projects → timveil → hive-jdbc-driver

timveil / hive-jdbc-driver

Licence: Apache-2.0 license
An alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC

Programming Languages

java
68154 projects - #9 most used programming language
Thrift
134 projects
shell
77523 projects

Projects that are alternatives of or similar to hive-jdbc-driver

Hive Jdbc Uber Jar
Hive JDBC "uber" or "standalone" jar based on the latest Apache Hive version
Stars: ✭ 188 (+506.45%)
Mutual labels:  hive, hadoop, jdbc, apache
Kyuubi
Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
Stars: ✭ 363 (+1070.97%)
Mutual labels:  hive, jdbc, thrift
Drill
Apache Drill is a distributed MPP query layer for self describing data
Stars: ✭ 1,619 (+5122.58%)
Mutual labels:  hive, hadoop, jdbc
hive-bigquery-storage-handler
Hive Storage Handler for interoperability between BigQuery and Apache Hive
Stars: ✭ 16 (-48.39%)
Mutual labels:  hive, hadoop, apache
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+14677.42%)
Mutual labels:  hive, hadoop, jdbc
implyr
SQL backend to dplyr for Impala
Stars: ✭ 74 (+138.71%)
Mutual labels:  hadoop, jdbc, apache
Hive
Apache Hive
Stars: ✭ 4,031 (+12903.23%)
Mutual labels:  hive, hadoop, apache
dpkb
大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse
Stars: ✭ 123 (+296.77%)
Mutual labels:  hive, hadoop
bigdata-doc
大数据学习笔记,学习路线,技术案例整理。
Stars: ✭ 37 (+19.35%)
Mutual labels:  hive, hadoop
hadoop-etl-udfs
The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
Stars: ✭ 17 (-45.16%)
Mutual labels:  hive, hadoop
smart-data-lake
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Stars: ✭ 79 (+154.84%)
Mutual labels:  hive, hadoop
hive to es
同步Hive数据仓库数据到Elasticsearch的小工具
Stars: ✭ 21 (-32.26%)
Mutual labels:  hive, hadoop
apache-flink-jdbc-streaming
Sample project for Apache Flink with Streaming Engine and JDBC Sink
Stars: ✭ 22 (-29.03%)
Mutual labels:  jdbc, apache
yarn-prometheus-exporter
Export Hadoop YARN (resource-manager) metrics in prometheus format
Stars: ✭ 44 (+41.94%)
Mutual labels:  hadoop, apache
Facebook Hive Udfs
Facebook's Hive UDFs
Stars: ✭ 213 (+587.1%)
Mutual labels:  hive, hadoop
liquibase-impala
Liquibase extension to add Impala Database support
Stars: ✭ 23 (-25.81%)
Mutual labels:  hive, hadoop
Bigdata docker
Big Data Ecosystem Docker
Stars: ✭ 161 (+419.35%)
Mutual labels:  hive, hadoop
BigInsights-on-Apache-Hadoop
Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix
Stars: ✭ 21 (-32.26%)
Mutual labels:  hive, hadoop
dockerfiles
Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
Stars: ✭ 29 (-6.45%)
Mutual labels:  hive, hadoop
xxhadoop
Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !
Stars: ✭ 37 (+19.35%)
Mutual labels:  hive, hadoop

Hive JDBC Driver

This project is alternative to the JDBC driver that is bundled with the Apache Hive project. The desire to build this grew out of my experience maintaining the Hive JDBC "uber jar" project (here) which attempted to produce a smaller, more complete standalone driver jar by crafting an alternative Maven pom file. While that effort mostly succeed in creating a slightly smaller jar, I felt like more could be done to improve the Hive JDBC experience.

As I started building out this project I realized that I wanted to deviate significantly from the existing Apache implementation. As a result, this project does not desire or attempt to be URL or even feature compatible with the existing Apache Driver. One obvious manifestation of this is that existing JDBC connection strings/URLs that work with the Apache Driver WILL NOT WORK with this driver without modification. I've provided a mapping for existing URL properties here as well as plenty of examples.

Another significant deviation from the Apache implementation is the absence of Hadoop or Hive dependencies and their transitive dependency graphs. The only bridge to Hive in this driver is the Thrift Interface Description Language (IDL) file and the Java bindings it generates. All necessary code was rewritten from the ground up with an emphasis on eliminating external dependencies. This has the clear benefit of significantly reducing jar sizes and reducing opportunities for class conflicts! See size comparison below:

the standalone jar for Hive 1.2.x does not contain all necessary dependencies so this is not an accurate representation of the real size

Areas of Focus

The following are board areas where I have attempted expand or improve the existing Hive Driver:

  • Jar Size - focused on creating a smaller, more portable jar
  • Dependency Graph - because JDBC drivers are often embedded in other applications it is important to limit the number of external dependencies that are shaded into the final jar. Shaded dependencies are often the source of size bloat and classloader conflicts. Every effort has been made to limit the number of external dependencies.
  • Logging - logging inside hadoop dependencies and Hive is often a confusing mix of logging frameworks. This driver works to provide clearer logging thru the Log4J2 api.
  • JDBC Compatibility - it is doubtful that Hive will ever allow true JDBC specification compatibility... the underlying datastore simply doesn't yet (may never) provide many of the required concepts. Having said that, there are plenty of methods and interfaces within the JDBC spec that have not been implemented by the Apache Driver that could have been. I've attempted rectify that.
  • Documentation - the existing Hive documentation can be difficult to follow. For example, there doesn't seem to be a good single point of reference for all supported URL parameters. Instead the complete picture of options must be gleaned from a handful of examples and sources. This makes setting up connections difficult.
  • Simplification - the existing driver supports concepts like "embedded mode" which adds complexity to connection logic and requires server side dependencies. If you need "embedded mode", this driver is not for you.
  • External Configuration - in my experience you often need to add Java VM options (-Dsome_config) to get Hive JDBC working or to enable debugging. This is especially prevalent when dealing with Kerberos. This driver moves some of those common configuration flags to URL properties.

Current State

This project is pre-alpha and should be considered experimental a this point. Currently it is built against Hortonworks Repos, but will soon be switched to more closely follow the Apache released versions.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].