Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

80+ HAProxy Configs for Hadoop, Big Data, NoSQL, Docker, Elasticsearch, SolrCloud, HBase, MySQL, PostgreSQL, Apache Drill, Hive, Presto, Impala, Hue, ZooKeeper, SSH, RabbitMQ, Redis, Riak, Cloudera, OpenTSDB, InfluxDB, Prometheus, Kibana, Graphite, Rancher etc.

Stars: ✭ 106 (-43.62%)

Mutual labels: hadoop, hive

Avro Hadoop Starter

Example MapReduce jobs in Java, Hive, Pig, and Hadoop Streaming that work on Avro data.

Stars: ✭ 110 (-41.49%)

Mutual labels: hadoop, hive

Datax

DataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server

Stars: ✭ 116 (-38.3%)

Mutual labels: hadoop, hive

Dataspherestudio

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Stars: ✭ 1,195 (+535.64%)

Mutual labels: hadoop, hive

Apache Spark Hands On

Educational notes,Hands on problems w/ solutions for hadoop ecosystem

Stars: ✭ 74 (-60.64%)

Mutual labels: hadoop, hive

Repository

个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。

Stars: ✭ 92 (-51.06%)

Mutual labels: hadoop, hive

Hive Funnel Udf

Hive UDFs for funnel analysis

Stars: ✭ 72 (-61.7%)

Mutual labels: hadoop, hive

Docs4dev

后端开发常用框架文档及中文翻译，包含 Spring 系列文档（Spring, Spring Boot, Spring Cloud, Spring Security, Spring Session），大数据（Apache Hive, HBase, Apache Flume），日志（Log4j2, Logback），Http Server（NGINX，Apache），Python，数据库（OpenTSDB，MySQL，PostgreSQL）等最新官方文档以及对应的中文翻译。

Stars: ✭ 974 (+418.09%)

Mutual labels: hive, apache

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (-20.21%)

Mutual labels: hadoop, apache

Eel Sdk

Big Data Toolkit for the JVM

Stars: ✭ 140 (-25.53%)

Mutual labels: hadoop, hive

Movie recommend

基于Spark的电影推荐系统，包含爬虫项目、web网站、后台管理系统以及spark推荐系统

Stars: ✭ 2,092 (+1012.77%)

Mutual labels: hadoop, hive

View All Similar Projects ➔

If you are feeling adventurous, I've been building my own Hive JDBC driver. Check out that project here: https://github.com/timveil/hive-jdbc.

Update

When I first created this project in 2014 the Hive project did not produce a "standalone" jar that reliably contained all required dependencies to successfully create a JDBC connection. Since that time the community has resolved many, if not all, of those early issues. As of today, the "standalone" jar published by recent versions of Hive make this project mostly obsolete. You can grab the official Hive standalone jar using maven (see below).

<dependency>
    <groupId>org.apache.hive</groupId>
    <artifactId>hive-jdbc</artifactId>
    <version>YOUR VERSION OF HIVE</version>
    <classifier>standalone</classifier>
</dependency>

Overview

Sometimes you want to connect your favorite database query or visualization tool to Hive. I've found that this can be quite cumbersome, typically requiring you to copy jars from the Hadoop cluster to some place locally that can be read by your tool. Recent versions of Hive include a "standalone" jar, but even this does not include all required dependencies. The goal of this simple maven project is to easily pull the required jars into a single place locally and create an "uber" or "standalone" jar that can be referenced by any JDBC compliant tool.

Please note this jar works with both kerberized and non-kerberized clusters. Configuring tools to work against kerberized clusters is typically more involved and requires an understanding of Kerberos. A full explanation of configuring Kerberos is outside the scope of this project.

You can download the latest binaries from the releases page: https://github.com/timveil/hive-jdbc-uber-jar/releases

Updated for Apache Hive 3.1.2 - 07/05/20 LATEST
Updated for HDP 2.6.5.0 - 07/04/18
Updated for HDP 2.6.3.0 - 11/01/17
Updated for HDP 2.6.1.0 - 06/08/17
Resolved issue with Zookeeper discovery - 01/27/17
Added support kerberized clusters - 12/19/16
Updated for HDP 2.5.3.0 - 12/01/16
Updated for HDP 2.5.0.0 - 09/12/16

Note about Kerberos and the workaround

When connecting to a kerberized cluster, ultimately the class org.apache.hadoop.util.VersionInfo is loaded. This class attempts to load a file called *-version-info.properties in an effort to determine the current Hadoop version. To do this, the following snippet of code is called:

  protected VersionInfo(String component) {
    info = new Properties();
    String versionInfoFile = component + "-version-info.properties";
    InputStream is = null;
    try {
      is = Thread.currentThread().getContextClassLoader().getResourceAsStream(versionInfoFile);
      if (is == null) {
        throw new IOException("Resource not found");
      }
      info.load(is);
    } catch (IOException ex) {
      LogFactory.getLog(getClass()).warn("Could not read '" +
          versionInfoFile + "', " + ex.toString(), ex);
    } finally {
      IOUtils.closeStream(is);
    }
  }

When using DataGrip, this code executes successfully, but with DbVisualizer and other tools like SQuirreLSQL, the properties file is not found and errors are generated downstream. For example, the following error is often encountered if the properties file fails to load.

java.lang.RuntimeException: Illegal Hadoop Version: Unknown (expected A.B.* format)
   at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:168)
   at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:143)
   at org.apache.hadoop.hive.shims.ShimLoader.getHadoopThriftAuthBridge(ShimLoader.java:129)
   at org.apache.hive.service.auth.KerberosSaslHelper.getKerberosTransport(KerberosSaslHelper.java:54)
   at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:414)
   at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:191)
   at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:155)
   at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)

The trouble seems to be caused by the way org.apache.hadoop.util.VersionInfo attempts to load the properties file using Thread.currentThread().getContextClassLoader(). I suspect the difference in behavior between tools boils down to how each chooses to load the "uber" jar. In any event, I have overwritten org.apache.hadoop.util.VersionInfo in this project to use a more robust approach for loading the properties file.

// Original code uses Thread.currentThread().getContextClassLoader() which does not
// contain the properties file in DbVisualizer or SQuirreLSQL

is = Thread.currentThread().getContextClassLoader().getResourceAsStream(versionInfoFile);

// My updated code first checks the Thread.currentThread().getContextClassLoader(), but if no InputStream is found
// then checks this.getClass().getClassLoader(). In my testing, this eliminated the above error in
// both DbVisualizer and SQuirrelSQL.  The behavior of DataGrip was unchanged.

is = Thread.currentThread().getContextClassLoader().getResourceAsStream(versionInfoFile);
if (is == null) {
    LogFactory.getLog(getClass()).warn("Could not find '" +
            versionInfoFile + "' in Thread.currentThread().getContextClassLoader()."
            + "  Trying this.getClass().getClassLoader()");
    is = this.getClass().getClassLoader().getResourceAsStream(versionInfoFile);
}

This updated code is inserted into the final "uber" jar and replaces the copy of org.apache.hadoop.util.VersionInfo originally found in hadoop-common.jar.

Non-kerberized Setup

DbVisualizer (as of version 9.5.5)

Below is an example configuration using DbVisualizer:

Open the Diver Manager dialog ("Tools" > "Driver Manager...") and hit the "Create a new driver" icon.
Fill in the information as seen below. For the "Driver File Paths" you are pointing to hive-jdbc-uber-x.jar.
```
jdbc:hive2://<server>:<port10000>/<database>
```
Create a new connection ("Database" > "Create Database Connection") and fill out the details based on your cluster as seen below.
Hit the "Connect" button to test the connection. You should see something like the following in the "Connection Message" text area if the connection is successful.
```
Apache Hive
1.2.1000.2.5.3.0-37
null
null
```

JetBrains DataGrip (as of version 2016.3)

Below is an example configuration using JetBrains DataGrip:

Under "File" > "Data Sources...", create a new Driver.

jdbc:hive2://{host}:{port}/{database}[;<;,{:identifier}={:param}>]

Then create a new Project Data Source using the new Driver.
After creating the Project Data Source, test the connection. You should see the following:

Kerberized Setup

Connecting a JDBC tool to a kerberized cluster is a bit more complicated than connecting to a non-kerberized cluster. Before getting started, ensure the following:

The krb5.conf file on your workstation matches the one on your cluster
You have a valid kerberos principal that can access the appropriate services your custer

You can successfully kinit from your workstation against the realm specified in your krb5.conf file

# for example, from my Mac, i execute the follwing command
kinit -t /[path to my keytab]/tveil.keytab

You have installed the JCE jars in the correct place. The latest JCE jars have a modified date of 12/20/2013.
- In DbVisualizer go to "Help" > "About" > "System Properties" and look for the java.home entry. This should point to the JRE used by the tool. Make sure the updated JCE jars are the lib/security directory under this folder.
- In DataGrip - todo

DbVisualizer (as of version 9.5.5)

Below is an example configuration using DbVisualizer against a kerberized cluster:

kinit with an appropriate principal and launch DbVisualizer

Open DbVisualizer preferences ("DbVisualizer" > "Preferences") and add the following properties. DbVisualizer will need to be restarted after applying these changes.

# optional flag to help debug kerberos issues
-Dsun.security.krb5.debug=true

-Djavax.security.auth.useSubjectCredsOnly=false

# Depending on your OS or Kerberos install, you may need to pass in the location of your krb5 configuration file.
# On mac/linux this is often a krb5.conf file.  On windows, this is often a krb5.ini file.
-Djava.security.krb5.conf="[path-to-kerberos-file]/krb5.ini"

Open the Diver Manager dialog ("Tools" > "Driver Manager...") and hit the "Create a new driver" icon.
Fill in the information as seen below. For the "Driver File Paths" you are pointing to hive-jdbc-uber-x.jar.
```
jdbc:hive2://<server>:<port10000>/<database>
```
Create a new connection ("Database" > "Create Database Connection") and fill out the details based on your cluster as seen below. Please note that you must append the "principal" to the "database" parameter for kerberized connections.
Hit the "Connect" button to test the connection. You should see something like the following in the "Connection Message" text area if the connection is successful.
```
Apache Hive
1.2.1000.2.5.3.0-37
null
null
```

JetBrains DataGrip (as of version 2016.3)

Below is an example configuration using JetBrains Data Grip against a kerberized cluster:

kinit with an appropriate principal and launch DataGrip

Under "File" > "Data Sources...", create a new Driver.

jdbc:hive2://{host}:{port}/{database}[;<;,{:identifier}={:param}>]

Then create a new Project Data Source using the new Driver.

# optional flag to help debug kerberos issues
-Dsun.security.krb5.debug=true

-Djavax.security.auth.useSubjectCredsOnly=false

# Depending on your OS or Kerberos install, you may need to pass in the location of your krb5 configuration file.
# On mac/linux this is often a krb5.conf file.  On windows, this is often a krb5.ini file.
-Djava.security.krb5.conf="[path-to-kerberos-file]/krb5.ini"

After creating the Project Data Source, test the connection. You should see the following:

How to Build

To build locally, you must have Maven installed and properly configured. After that it's as simple as running mvn:package. A file called hive-jdbc-uber-x.jar will be created in your target directory. The newly created jar will have the Hive JDBC driver as well as all required dependencies.

Logging

All logging dependencies have been filtered and bridged with SLF4J in this jar and Log4J has been included as the logging implementation. While no log4j.properties has been included in this jar, its fairly easy to configure Log4J and DbVisualizer to debug whats happening inside JDBC. To setup Log4J in DbVisualizer, do the following.

Create a log4j.properties file and put it somewhere easy to remember find on your workstation. Below is a very simple example.

log4j.rootLogger=WARN, console

log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{HH:mm:ss:SSS} %-5p [%c]: %m%n

log4j.logger.org.apache.hive=DEBUG
log4j.logger.org.apache.hadoop=DEBUG
log4j.logger.org.apache.thrift=DEBUG

Add the following JVM flag to DbVisualizer preferences (see "DbVisualizer" > "Preferences")

-Dlog4j.configuration=file:[path to your log4j.properties file]/log4j.properties

You should now see driver logging in the DbVisualizer logs.

If the configuration doesn't work or you elect not to provide a log4j.properties file reference, you will see the following messages in your DbVisualizer log. You can ignore these warnings if you are not interested in the driver logs.

2017-08-09 17:24:39.750 INFO   786 [ExecutorRunner-pool-2-thread-1 - R.flush] log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
2017-08-09 17:24:39.751 INFO   786 [ExecutorRunner-pool-2-thread-1 - R.flush] log4j:WARN Please initialize the log4j system properly.
2017-08-09 17:24:39.751 INFO   786 [ExecutorRunner-pool-2-thread-1 - R.flush] log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 188

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (6) 🔗