bytedance / clickhouse_hadoop

Licence: other

Import data from clickhouse to hadoop with pure SQL

Programming Languages

68154 projects - #9 most used programming language

Projects that are alternatives of or similar to clickhouse hadoop

Addax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.

Stars: ✭ 615 (+2265.38%)

Mutual labels: hadoop, clickhouse

Szt Bigdata

深圳地铁大数据客流分析系统🚇🚄🌟

Stars: ✭ 826 (+3076.92%)

Mutual labels: hadoop, clickhouse

Datax

DataX is an open source universal ETL tool that support Cassandra, ClickHouse, DBF, Hive, InfluxDB, Kudu, MySQL, Oracle, Presto(Trino), PostgreSQL, SQL Server

Stars: ✭ 116 (+346.15%)

Mutual labels: hadoop, clickhouse

hadoop-ecosystem

Visualizations of the Hadoop Ecosystem

Stars: ✭ 20 (-23.08%)

Mutual labels: hadoop

clickhouse-ast-parser

AST parser and visitor for ClickHouse SQL

Stars: ✭ 60 (+130.77%)

Mutual labels: clickhouse

darwin

Avro Schema Evolution made easy

Stars: ✭ 26 (+0%)

Mutual labels: hadoop

dbal-clickhouse

Doctrine DBAL driver for ClickHouse database

Stars: ✭ 77 (+196.15%)

Mutual labels: clickhouse

liquibase-impala

Liquibase extension to add Impala Database support

Stars: ✭ 23 (-11.54%)

Mutual labels: hadoop

awesome-clickhouse

A curated list of awesome ClickHouse software.

Stars: ✭ 71 (+173.08%)

Mutual labels: clickhouse

hive-jdbc-driver

An alternative to the "hive standalone" jar for connecting Java applications to Apache Hive via JDBC

Stars: ✭ 31 (+19.23%)

Mutual labels: hadoop

implyr

SQL backend to dplyr for Impala

Stars: ✭ 74 (+184.62%)

Mutual labels: hadoop

wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Stars: ✭ 19 (-26.92%)

Mutual labels: hadoop

appmetrica-logsapi-loader

A tool for automatic data loading from AppMetrica LogsAPI into (local) ClickHouse

Stars: ✭ 18 (-30.77%)

Mutual labels: clickhouse

presto

Teradata Distribution of Presto -- A Distributed SQL Query Engine for Big Data

Stars: ✭ 91 (+250%)

Mutual labels: hadoop

cds

Data syncing in golang for ClickHouse.

Stars: ✭ 839 (+3126.92%)

Mutual labels: clickhouse

trickster

Open Source HTTP Reverse Proxy Cache and Time Series Dashboard Accelerator

Stars: ✭ 1,753 (+6642.31%)

Mutual labels: clickhouse

aaocp

一个对用户行为日志进行分析的大数据项目

Stars: ✭ 53 (+103.85%)

Mutual labels: hadoop

hadoop-crypto

Library for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.

Stars: ✭ 38 (+46.15%)

Mutual labels: hadoop

Proton

High performance Pinba server

Stars: ✭ 27 (+3.85%)

Mutual labels: clickhouse

UBA

UEBA Solution for Insider Security. This repo is archived. Thanks!

Stars: ✭ 36 (+38.46%)

Mutual labels: hadoop

View All Similar Projects ➔

ClickHouse Hadoop

Integrate ClickHouse natively with Hive, currently only writing is supported. Connecting Hadoop's massive data storage and deep processing power with the high performance of ClickHouse.

Build the Project

mvn package -Phadoop26 -DskipTests

Run the test cases

It is required that a clickhouse-server is running in the localhost to correctly run the test cases.

Usage

Create ClickHouse table

CREATE TABLE hive_test
(
    c1 String,
    c2 Float64,
    c3 String
)
ENGINE = MergeTree()
PARTITION BY c3
ORDER BY c1

Create Hive External Table

Before starting the hive cli, set the environment variable HIVE_AUX_JARS_PATH

export HIVE_AUX_JARS_PATH=<path-to-your-project>/target/clickhouse-hadoop-<version>.jar

Then start the hive-cli and create Hive external table:

CREATE EXTERNAL TABLE default.ck_test(
   c1 string,
   c2 double,
   c3 string
)
STORED BY 'data.bytedance.net.ck.hive.ClickHouseStorageHandler'
TBLPROPERTIES('clickhouse.conn.urls'='jdbc:clickhouse://<host-1>:<port1>,jdbc:clickhouse://<host2>:<port2>',
'clickhouse.table.name'='hive_test');

Data Ingestion

In hive-cli

INSERT INTO default.ck_test
select  c1, c2, c3 FROM default.source_table where part='part_val'

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

bytedance / clickhouse_hadoop

Programming Languages

Labels

Projects that are alternatives of or similar to clickhouse hadoop

ClickHouse Hadoop

Build the Project

Run the test cases

Usage

Create ClickHouse table

Create Hive External Table

Data Ingestion