All Projects → omnata-labs → dbt-ml-preprocessing

omnata-labs / dbt-ml-preprocessing

Licence: MIT license
A SQL port of python's scikit-learn preprocessing module, provided as cross-database dbt macros.

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to dbt-ml-preprocessing

tellery
Tellery lets you build metrics using SQL and bring them to your team. As easy as using a document. As powerful as a data modeling tool.
Stars: ✭ 219 (+71.09%)
Mutual labels:  bigquery, snowflake, redshift, dbt
dbd
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
Stars: ✭ 30 (-76.56%)
Mutual labels:  bigquery, snowflake, redshift
growthbook
Open Source Feature Flagging and A/B Testing Platform
Stars: ✭ 2,342 (+1729.69%)
Mutual labels:  bigquery, snowflake, redshift
pre-commit-dbt
🎣 List of `pre-commit` hooks to ensure the quality of your `dbt` projects.
Stars: ✭ 149 (+16.41%)
Mutual labels:  bigquery, snowflake, dbt
Tbls
tbls is a CI-Friendly tool for document a database, written in Go.
Stars: ✭ 940 (+634.38%)
Mutual labels:  bigquery, snowflake, redshift
carto-spatial-extension
A set of UDFs and Procedures to extend BigQuery, Snowflake, Redshift and Postgres with Spatial Analytics capabilities
Stars: ✭ 131 (+2.34%)
Mutual labels:  bigquery, snowflake, redshift
starlake
Starlake is a Spark Based On Premise and Cloud ELT/ETL Framework for Batch & Stream Processing
Stars: ✭ 16 (-87.5%)
Mutual labels:  bigquery, snowflake, redshift
Sql Runner
Run templatable playbooks of SQL scripts in series and parallel on Redshift, PostgreSQL, BigQuery and Snowflake
Stars: ✭ 68 (-46.87%)
Mutual labels:  bigquery, snowflake, redshift
Locopy
locopy: Loading/Unloading to Redshift and Snowflake using Python.
Stars: ✭ 73 (-42.97%)
Mutual labels:  snowflake, redshift
Fluentmigrator
Fluent migrations framework for .NET
Stars: ✭ 2,636 (+1959.38%)
Mutual labels:  snowflake, redshift
astro
Astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Stars: ✭ 79 (-38.28%)
Mutual labels:  bigquery, snowflake
BQconvert
BigQuery Schema Conversion Tool
Stars: ✭ 20 (-84.37%)
Mutual labels:  bigquery, redshift
Yuniql
Free and open source schema versioning and database migration made natively with .NET Core.
Stars: ✭ 156 (+21.88%)
Mutual labels:  snowflake, redshift
snowflake-starter
A _simple_ starter template for Snowflake Cloud Data Platform
Stars: ✭ 31 (-75.78%)
Mutual labels:  snowflake, dbt
Redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Stars: ✭ 20,147 (+15639.84%)
Mutual labels:  bigquery, redshift
Ddlparse
DDL parase and Convert to BigQuery JSON schema and DDL statements
Stars: ✭ 52 (-59.37%)
Mutual labels:  bigquery, redshift
Sqlpad
Web-based SQL editor run in your own private cloud. Supports MySQL, Postgres, SQL Server, Vertica, Crate, ClickHouse, Trino, Presto, SAP HANA, Cassandra, Snowflake, BigQuery, SQLite, and more with ODBC
Stars: ✭ 4,113 (+3113.28%)
Mutual labels:  bigquery, snowflake
simple-ddl-parser
Simple DDL Parser to parse SQL (HQL, TSQL, AWS Redshift, BigQuery, Snowflake and other dialects) ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc. & table properties, types, domains, etc.
Stars: ✭ 76 (-40.62%)
Mutual labels:  snowflake, redshift
sklearn-oblique-tree
a python interface to OC1 and other oblique decision tree implementations
Stars: ✭ 33 (-74.22%)
Mutual labels:  scikit-learn
go-bqloader
bqloader is a simple ETL framework to load data from Cloud Storage into BigQuery.
Stars: ✭ 16 (-87.5%)
Mutual labels:  bigquery

dbt-ml-preprocessing

A package for dbt which enables standardization of data sets. You can use it to build a feature store in your data warehouse, without using external libraries like Spark's mllib or Python's scikit-learn.

The package contains a set of macros that mirror the functionality of the scikit-learn preprocessing module. Originally they were developed as part of the 2019 Medium article Feature Engineering in Snowflake.

Currently they have been tested in Snowflake, Redshift , BigQuery, SQL Server and PostgreSQL 13.2. The test case expectations have been built using scikit-learn (see *.py in integration_tests/data/sql), so you can expect behavioural parity with it.

The macros are:

scikit-learn function macro name Snowflake BigQuery Redshift MSSQL PostgreSQL Example
KBinsDiscretizer k_bins_discretizer Y Y Y Y Y example
LabelEncoder label_encoder Y Y Y Y Y example
MaxAbsScaler max_abs_scaler Y Y Y Y Y example
MinMaxScaler min_max_scaler Y Y Y Y Y example
Normalizer normalizer Y Y Y Y Y example
OneHotEncoder one_hot_encoder Y Y Y Y Y example
QuantileTransformer quantile_transformer Y Y N N Y example
RobustScaler robust_scaler Y Y Y Y Y example
StandardScaler standard_scaler Y Y Y N Y example

* 2D charts taken from scikit-learn.org, GIFs are my own

Installation

To use this in your dbt project, create or modify packages.yml to include:

packages:
  - package: "omnata-labs/dbt_ml_preprocessing"
    version: [">=1.0.2"]

(replace the revision number with the latest)

Then run: dbt deps to import the package.

dbt 1.0.0 compatibility

dbt-ml-preprocessing version 1.2.0 is the first version to support (and require) dbt 1.0.0.

If you are not ready to upgrade to dbt 1.0.0, please use dbt-ml-preprocessing version 1.0.2.

Usage

To read the macro documentation and see examples, simply generate your docs, and you'll see macro documentation in the Projects tree under dbt_ml_preprocessing:

docs screenshot

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].