All Projects → databricks → dbt-databricks

databricks / dbt-databricks

Licence: Apache-2.0 license
A dbt adapter for Databricks.

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to dbt-databricks

architect big data solutions with spark
code, labs and lectures for the course
Stars: ✭ 40 (-65.22%)
Mutual labels:  etl, databricks
NBi
NBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile y…
Stars: ✭ 102 (-11.3%)
Mutual labels:  etl
Storagetapper
StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Stars: ✭ 232 (+101.74%)
Mutual labels:  etl
databricks-notebooks
Collection of Sample Databricks Spark Notebooks ( mostly for Azure Databricks )
Stars: ✭ 57 (-50.43%)
Mutual labels:  databricks
Example Airflow Dags
Example DAGs using hooks and operators from Airflow Plugins
Stars: ✭ 243 (+111.3%)
Mutual labels:  etl
blackbricks
Black for Databricks notebooks
Stars: ✭ 40 (-65.22%)
Mutual labels:  databricks
Elastic
R client for the Elasticsearch HTTP API
Stars: ✭ 227 (+97.39%)
Mutual labels:  etl
StoreItemDemand
(117th place - Top 26%) Deep learning using Keras and Spark for the "Store Item Demand Forecasting" Kaggle competition.
Stars: ✭ 24 (-79.13%)
Mutual labels:  databricks
AirflowETL
Blog post on ETL pipelines with Airflow
Stars: ✭ 20 (-82.61%)
Mutual labels:  etl
vixtract
www.vixtract.ru
Stars: ✭ 40 (-65.22%)
Mutual labels:  etl
thain
Thain is a distributed flow schedule platform.
Stars: ✭ 81 (-29.57%)
Mutual labels:  etl
Aws Etl Orchestrator
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Stars: ✭ 245 (+113.04%)
Mutual labels:  etl
id3c
Data logistics system enabling real-time pathogen surveillance. Built for the Seattle Flu Study.
Stars: ✭ 21 (-81.74%)
Mutual labels:  etl
Eland
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Stars: ✭ 235 (+104.35%)
Mutual labels:  etl
hive-metastore-client
A client for connecting and running DDLs on hive metastore.
Stars: ✭ 37 (-67.83%)
Mutual labels:  etl
Etl2pcapng
Utility that converts an .etl file containing a Windows network packet capture into .pcapng format.
Stars: ✭ 228 (+98.26%)
Mutual labels:  etl
airflow-dbt-python
A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
Stars: ✭ 111 (-3.48%)
Mutual labels:  dbt
awesome-dbt
A curated list of awesome dbt resources
Stars: ✭ 520 (+352.17%)
Mutual labels:  dbt
DIRECT
DIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.
Stars: ✭ 20 (-82.61%)
Mutual labels:  etl
krawler
A minimalist (geospatial) ETL
Stars: ✭ 51 (-55.65%)
Mutual labels:  etl

databricks logo dbt logo

Unit Tests Badge Integration Tests Badge

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

The Databricks Lakehouse provides one simple platform to unify all your data, analytics and AI workloads.

dbt-databricks

The dbt-databricks adapter contains all of the code enabling dbt to work with Databricks. This adapter is based off the amazing work done in dbt-spark. Some key features include:

  • Easy setup. No need to install an ODBC driver as the adapter uses pure Python APIs.
  • Open by default. For example, it uses the the open and performant Delta table format by default. This has many benefits, including letting you use MERGE as the the default incremental materialization strategy.
  • Support for Unity Catalog. dbt-databricks>=1.1.1 supports the 3-level namespace of Unity Catalog (catalog / schema / relations) so you can organize and secure your data the way you like.
  • Performance. The adapter generates SQL expressions that are automatically accelerated by the native, vectorized Photon execution engine.

Choosing between dbt-databricks and dbt-spark

If you are developing a dbt project on Databricks, we recommend using dbt-databricks for the reasons noted above.

dbt-spark is an actively developed adapter which works with Databricks as well as Apache Spark anywhere it is hosted e.g. on AWS EMR.

Getting started

Installation

Install using pip:

pip install dbt-databricks

Upgrade to the latest version

pip install --upgrade dbt-databricks

Profile Setup

your_profile_name:
  target: dev
  outputs:
    dev:
      type: databricks
      catalog: [optional catalog name, if you are using Unity Catalog, only available in dbt-databricks>=1.1.1]
      schema: [database/schema name]
      host: [your.databrickshost.com]
      http_path: [/sql/your/http/path]
      token: [dapiXXXXXXXXXXXXXXXXXXXXXXX]

Quick Starts

These following quick starts will get you up and running with the dbt-databricks adapter:

Compatibility

The dbt-databricks adapter has been tested:

  • with Python 3.7 or above.
  • against Databricks SQL and Databricks runtime releases 9.1 LTS and later.

Tips and Tricks

Choosing compute for a Python model

You can override the compute used for a specific Python model by setting the http_path property in model configuration. This can be useful if, for example, you want to run a Python model on an All Purpose cluster, while running SQL models on a SQL Warehouse. Note that this capability is only available for Python models.

def model(dbt, session):
    dbt.config(
      http_path="sql/protocolv1/..."
    )
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].