exasol / hadoop-etl-udfs

Licence: MIT license

The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL

Programming Languages

java

68154 projects - #9 most used programming language

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to hadoop-etl-udfs

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (+41.18%)

Mutual labels: hive, hadoop, parquet

Drill

Apache Drill is a distributed MPP query layer for self describing data

Stars: ✭ 1,619 (+9423.53%)

Mutual labels: hive, hadoop, parquet

Eel Sdk

Big Data Toolkit for the JVM

Stars: ✭ 140 (+723.53%)

Mutual labels: hive, hadoop, parquet

Hive Jdbc Uber Jar

Hive JDBC "uber" or "standalone" jar based on the latest Apache Hive version

Stars: ✭ 188 (+1005.88%)

Mutual labels: hive, hadoop

r-exasol

The EXASOL package for R provides an interface to the EXASOL database.

Stars: ✭ 22 (+29.41%)

Mutual labels: exasol, exasol-integration

Linkis

Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.

Stars: ✭ 2,323 (+13564.71%)

Mutual labels: hive, udf

Movie recommend

基于Spark的电影推荐系统，包含爬虫项目、web网站、后台管理系统以及spark推荐系统

Stars: ✭ 2,092 (+12205.88%)

Mutual labels: hive, hadoop

sqlalchemy exasol

SQLAlchemy dialect for EXASOL

Stars: ✭ 34 (+100%)

Mutual labels: exasol, exasol-integration

Facebook Hive Udfs

Facebook's Hive UDFs

Stars: ✭ 213 (+1152.94%)

Mutual labels: hive, hadoop

dpkb

大数据相关内容汇总，包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词：Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse

Stars: ✭ 123 (+623.53%)

Mutual labels: hive, hadoop

hive to es

同步Hive数据仓库数据到Elasticsearch的小工具

Stars: ✭ 21 (+23.53%)

Mutual labels: hive, hadoop

xxhadoop

Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !

Stars: ✭ 37 (+117.65%)

Mutual labels: hive, hadoop

hadoopoffice

HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)

Stars: ✭ 56 (+229.41%)

Mutual labels: hive, hadoop

Bigdata docker

Big Data Ecosystem Docker

Stars: ✭ 161 (+847.06%)

Mutual labels: hive, hadoop

Presto

The official home of the Presto distributed SQL query engine for big data

Stars: ✭ 12,957 (+76117.65%)

Mutual labels: hive, hadoop

spark-connector

A connector for Apache Spark to access Exasol

Stars: ✭ 13 (-23.53%)

Mutual labels: exasol, exasol-integration

smart-data-lake

Smart Automation Tool for building modern Data Lakes and Data Pipelines

Stars: ✭ 79 (+364.71%)

Mutual labels: hive, hadoop

hive-bigquery-storage-handler

Hive Storage Handler for interoperability between BigQuery and Apache Hive

Stars: ✭ 16 (-5.88%)

Mutual labels: hive, hadoop

Hadoopcryptoledger

Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive

Stars: ✭ 126 (+641.18%)

Mutual labels: hive, hadoop

dockerfiles

Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )

Stars: ✭ 29 (+70.59%)

Mutual labels: hive, hadoop

View All Similar Projects ➔

Hadoop ETL UDFs

Overview

Hadoop ETL UDFs are the main way to transfer data between Exasol and Hadoop (HCatalog tables on HDFS). The SQL syntax for calling the UDFs is similar to that of Exasol's native IMPORT and EXPORT commands, but with added UDF paramters for specifying the various necessary and optional Hadoop properties.

A brief overview of features includes support for:

HCatalog Metadata (e.g., table location, columns, partitions).
Multiple file formats (e.g., Parquet, ORC, RCFile)
HDFS HA
Partitions
Parallelization

For a more detailed description of the features, please refer to the IMPORT and EXPORT sections below.

Getting Started

Before you can start using the Hadoop ETL UDFs, you have to deploy the UDFs in your Exasol database. Please follow the step-by-step deployment guide.

Using the UDFs

After deloying the UDFs, you can begin using them to easily transfer data to and from Hadoop.

IMPORT

The IMPORT UDFs load data into Exasol from Hadoop (HCatalog tables on HDFS). To import data, you just need to execute the SQL statement IMPORT INTO ... FROM SCRIPT ETL.IMPORT_HCAT_TABLE WITH ... with the appropriate parameters. This calls the ETL.IMPORT_HCAT_TABLE UDF, which was previously created during deployment.

For example, run the following statement to import data into an existing table.

CREATE TABLE sample_07 (code VARCHAR(1000), description VARCHAR (1000), total_emp INT, salary INT);

IMPORT INTO sample_07
FROM SCRIPT ETL.IMPORT_HCAT_TABLE WITH
 HCAT_DB         = 'default'
 HCAT_TABLE      = 'sample_07'
 HCAT_ADDRESS    = 'thrift://hive-metastore-host:9083'
 HCAT_USER       = 'hive'
 HDFS_USER       = 'hdfs';

Please see the IMPORT details for a full description.

EXPORT

Note: This functionality is available in Exasol starting with version 6.0.3.

The EXPORT UDFs load data from Exasol into Hadoop (HCatalog tables on HDFS). To export data, you just need to execute the SQL statement EXPORT ... INTO SCRIPT ETL.EXPORT_HCAT_TABLE WITH ... with the appropriate parameters. This calls the ETL.EXPORT_HCAT_TABLE UDF, which was previously created during deployment.

For example, run the following statement to export data from an existing table.

CREATE TABLE TABLE1 (COL1 SMALLINT, COL2 INT, COL3 VARCHAR(50));

EXPORT TABLE1
INTO SCRIPT ETL.EXPORT_HCAT_TABLE WITH
 HCAT_DB         = 'default'
 HCAT_TABLE      = 'test_table'
 HCAT_ADDRESS    = 'thrift://hive-metastore-host:9083'
 HCAT_USER       = 'hive'
 HDFS_USER       = 'hdfs';

Please see the EXPORT details for a full description.

Frequent Issues

In case you cannot connect to certain parts of Hadoop it is a good idea to test the DNS hostname resolution and TCP/IP connectivity to all hosts and ports of Hadoop (HCatalog, HDFS, and Kerberos servers if used). For this you can use the python script in solution 325. Note that this script is designed for testing http connections, so you can ignore the http check failures.
Google DataProc Integration issues.
Hive null values are imported as \N. For now, you can do post processing after the import to convert them into proper values.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

exasol / hadoop-etl-udfs

Programming Languages

Labels

Projects that are alternatives of or similar to hadoop-etl-udfs

Hadoop ETL UDFs

Overview

Getting Started

Using the UDFs

IMPORT

EXPORT

Note: This functionality is available in Exasol starting with version 6.0.3.

Frequent Issues