All Projects → silentsokolov → dbt-clickhouse

silentsokolov / dbt-clickhouse

Licence: Apache-2.0 license
The Clickhouse plugin for dbt (data build tool)

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to dbt-clickhouse

trickster
Open Source HTTP Reverse Proxy Cache and Time Series Dashboard Accelerator
Stars: ✭ 1,753 (+2176.62%)
Mutual labels:  clickhouse
onelinerhub
2.5k code solutions with clear explanation @ onelinerhub.com
Stars: ✭ 645 (+737.66%)
Mutual labels:  clickhouse
dbt-superset-lineage
Make dbt docs and Apache Superset talk to one another
Stars: ✭ 60 (-22.08%)
Mutual labels:  dbt
fal
do more with dbt. fal helps you run Python alongside dbt, so you can send Slack alerts, detect anomalies and build machine learning models.
Stars: ✭ 567 (+636.36%)
Mutual labels:  dbt
dataops-platform-airflow-dbt
Build DataOps platform with Apache Airflow and dbt on AWS
Stars: ✭ 33 (-57.14%)
Mutual labels:  dbt
awesome-clickhouse
A curated list of awesome ClickHouse software.
Stars: ✭ 71 (-7.79%)
Mutual labels:  clickhouse
ClickHouseMigrator
Help to migrate data to ClickHouse, create database and table auto.
Stars: ✭ 58 (-24.68%)
Mutual labels:  clickhouse
vulkn
Love your Data. Love the Environment. Love VULKИ.
Stars: ✭ 43 (-44.16%)
Mutual labels:  clickhouse
appmetrica-logsapi-loader
A tool for automatic data loading from AppMetrica LogsAPI into (local) ClickHouse
Stars: ✭ 18 (-76.62%)
Mutual labels:  clickhouse
dbt-invoke
A CLI for creating, updating, and deleting dbt property files
Stars: ✭ 42 (-45.45%)
Mutual labels:  dbt
spark-utils
Utility functions for dbt projects running on Spark
Stars: ✭ 19 (-75.32%)
Mutual labels:  dbt
Proton
High performance Pinba server
Stars: ✭ 27 (-64.94%)
Mutual labels:  clickhouse
cds
Data syncing in golang for ClickHouse.
Stars: ✭ 839 (+989.61%)
Mutual labels:  clickhouse
clickhouse-ast-parser
AST parser and visitor for ClickHouse SQL
Stars: ✭ 60 (-22.08%)
Mutual labels:  clickhouse
dbt-ml-preprocessing
A SQL port of python's scikit-learn preprocessing module, provided as cross-database dbt macros.
Stars: ✭ 128 (+66.23%)
Mutual labels:  dbt
dbt-formatter
Formatting for dbt jinja-flavored sql
Stars: ✭ 37 (-51.95%)
Mutual labels:  dbt
dbt artifacts
A dbt package for modelling dbt metadata. https://brooklyn-data.github.io/dbt_artifacts
Stars: ✭ 119 (+54.55%)
Mutual labels:  dbt
radondb-clickhouse-kubernetes
Open Source,High Availability Cluster,based on ClickHouse
Stars: ✭ 54 (-29.87%)
Mutual labels:  clickhouse
clickhouse hadoop
Import data from clickhouse to hadoop with pure SQL
Stars: ✭ 26 (-66.23%)
Mutual labels:  clickhouse
dbal-clickhouse
Doctrine DBAL driver for ClickHouse database
Stars: ✭ 77 (+0%)
Mutual labels:  clickhouse

clickhouse dbt logo

build

dbt-clickhouse

This plugin ports dbt functionality to Clickhouse.

We do not support older versions of Clickhouse. The plugin uses syntax that requires version 21 or newer.

Installation

Use your favorite Python package manager to install the app from PyPI, e.g.

pip install dbt-clickhouse

Supported features

  • Table materialization
  • View materialization
  • Incremental materialization
  • Seeds
  • Sources
  • Docs generate
  • Tests
  • Snapshots
  • Ephemeral materialization

Usage Notes

Database

The dbt model database.schema.table is not compatible with Clickhouse because Clickhouse does not support a schema. So we use a simple model schema.table, where schema is the Clickhouse's database. Please, don't use default database!

Model Configuration

Option Description Required?
engine The table engine (type of table) to use when creating tables Optional (default: MergeTree())
order_by A tuple of column names or arbitrary expressions. This allows you to create a small sparse index that helps find data faster. Optional (default: tuple())
partition_by A partition is a logical combination of records in a table by a specified criterion. The partition key can be any expression from the table columns. Optional
inserts_only This property is relevant only for incremental materialization. If set to True, incremental updates will be inserted directly to the target table without creating intermediate table. This option has the potential of significantly improve performance and avoid memory limitations on big updates. Optional

Example Profile

your_profile_name:
  target: dev
  outputs:
    dev:
      type: clickhouse
      schema: [database name]

      # optional
      port: [port]  # default 8123
      user: [user] # default 'default'
      host: [db.clickhouse.com] # default localhost
      password: [password] # default ''
      verify: [verify] # default True
      secure: [secure] # default False
      connect_timeout: [10] # default 10 seconds.

Running Tests

This adapter passes all of dbt basic tests as presented in dbt's official docs: https://docs.getdbt.com/docs/contributing/testing-a-new-adapter#testing-your-adapter.

Note: The only feature that is not supported and not tested is Ephemeral materialization.

Tests running command: pytest tests/integration

You can customize a few test params through environment variables. In order to provide custom params you'll need to create test.env file under root (remember not to commit this file!) and define the following env variables inside:

  1. HOST_ENV_VAR_NAME - Default=localhost
  2. USER_ENV_VAR_NAME - your ClickHouse username. Default=default
  3. PASSWORD_ENV_VAR_NAME - your ClickHouse password. Default=''
  4. PORT_ENV_VAR_NAME - ClickHouse client port. Default=8123
  5. RUN_DOCKER_ENV_VAR_NAME - Identify whether to run clickhouse-server docker image (see tests/docker-compose.yml). Default=False. Set it to True if you'd like to raise a docker image (assuming docker-compose is installed in your machine) during tests that launches a clickhouse-server. Note: If you decide to run a docker image you should set PORT_ENV_VAR_NAME to 10900 too.

Original Author

ClickHouse wants to thank @silentsokolov for creating this connector and for their valuable contributions.

Update 05/31/2022

  • Incremental changes of an incremental model are loaded into a MergeTree table instead of in-memory temporary table. This removed memory limitations - Clickhouse recommends that in-memory table engines should not exceed 100 million rows.
  • Incremental model supports 'inserts_only' mode where incremental changes are loaded directly to the target table instead of creating a temporary table for the changes and running another insert-into command. This mode is relevant only for immutable data, and can accelerate dramatically the performance of the incremental materialization.
  • Fix update and delete in snapshots.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].