All Projects → fal-ai → fal

fal-ai / fal

Licence: Apache-2.0 license
do more with dbt. fal helps you run Python alongside dbt, so you can send Slack alerts, detect anomalies and build machine learning models.

Programming Languages

python
139335 projects - #7 most used programming language
Gherkin
971 projects
javascript
184084 projects - #8 most used programming language
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to fal

tellery
Tellery lets you build metrics using SQL and bring them to your team. As easy as using a document. As powerful as a data modeling tool.
Stars: ✭ 219 (-61.38%)
Mutual labels:  dbt, data-modeling
Retentioneering Tools
Retentioneering: product analytics, data-driven customer journey map optimization, marketing analytics, web analytics, transaction analytics, graph visualization, and behavioral segmentation with customer segments in Python. Opensource analytics, predictive analytics over clickstream, sentiment analysis, AB tests, machine learning, and Monte Carlo Markov Chain simulations, extending Pandas, Networkx and sklearn.
Stars: ✭ 291 (-48.68%)
Mutual labels:  pandas, machinelearning
Data-Scientist-In-Python
This repository contains notes and projects of Data scientist track from dataquest course work.
Stars: ✭ 23 (-95.94%)
Mutual labels:  pandas, machinelearning
Datacamp Python Data Science Track
All the slides, accompanying code and exercises all stored in this repo. 🎈
Stars: ✭ 250 (-55.91%)
Mutual labels:  pandas, machinelearning
Code
Compilation of R and Python programming codes on the Data Professor YouTube channel.
Stars: ✭ 287 (-49.38%)
Mutual labels:  pandas, machinelearning
Data-Science-Resources
A guide to getting started with Data Science and ML.
Stars: ✭ 17 (-97%)
Mutual labels:  pandas, machinelearning
RcppDynProg
Dynamic Programming implemented in Rcpp. Includes example partition and out of sample fitting applications.
Stars: ✭ 13 (-97.71%)
Mutual labels:  machinelearning
SwiftUIMLKitTranslator
SwiftUI MLKit Language Identification & Translator
Stars: ✭ 23 (-95.94%)
Mutual labels:  machinelearning
grailer
web scraping tool for grailed.com
Stars: ✭ 30 (-94.71%)
Mutual labels:  pandas
trackanimation
Track Animation is a Python 2 and 3 library that provides an easy and user-adjustable way of creating visualizations from GPS data.
Stars: ✭ 74 (-86.95%)
Mutual labels:  pandas
datascience-mashup
In this repo I will try to gather all of the projects related to data science with clean datasets and high accuracy models to solve real world problems.
Stars: ✭ 36 (-93.65%)
Mutual labels:  machinelearning
pandas-workshop
An introductory workshop on pandas with notebooks and exercises for following along.
Stars: ✭ 161 (-71.6%)
Mutual labels:  pandas
legend-studio
Legend Studio
Stars: ✭ 53 (-90.65%)
Mutual labels:  data-modeling
Datscan
DatScan is an initiative to build an open-source CMS that will have the capability to solve any problem using data Analysis just with the help of various modules and a vast standardized module library
Stars: ✭ 13 (-97.71%)
Mutual labels:  pandas
pandas twitter
Analyzing Trump's tweets using Python (Pandas + Twitter workshop)
Stars: ✭ 81 (-85.71%)
Mutual labels:  pandas
dbt-formatter
Formatting for dbt jinja-flavored sql
Stars: ✭ 37 (-93.47%)
Mutual labels:  dbt
introduction to ml with python
도서 "[개정판] 파이썬 라이브러리를 활용한 머신 러닝"의 주피터 노트북과 코드입니다.
Stars: ✭ 211 (-62.79%)
Mutual labels:  pandas
resolving-python-data-science
Learning materials to get started with Python using Pandas, a library to manipulate Data.
Stars: ✭ 26 (-95.41%)
Mutual labels:  pandas
Data-Science-101
Notes and tutorials on how to use python, pandas, seaborn, numpy, matplotlib, scipy for data science.
Stars: ✭ 19 (-96.65%)
Mutual labels:  pandas
hamilton
A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+7.94%)
Mutual labels:  pandas

fal: do more with dbt

fal allows you to run Python scripts directly from your dbt project.

fal Cloud   Total downloads   fal on PyPI   Discord conversation

With fal, you can:

  • Send Slack notifications upon dbt model success or failure.
  • Download dbt models into a Python context with a familiar syntax: ref('my_dbt_model')
  • Use Python libraries such as sklearn or prophet to build more complex pipelines downstream of and in between (new!) dbt models.

and more...

Check out our Getting Started guide to get a quickstart, head to our documentation site for a deeper dive or play with in-depth examples to see how fal can help you get more done with dbt.

Intro video

Getting Started

1. Install fal

$ pip install fal

2. Go to your dbt directory

$ cd ~/src/my_dbt_project

3. Create a Python script: send_slack_message.py

import os
from slack_sdk import WebClient
from slack_sdk.errors import SlackApiError

CHANNEL_ID = os.getenv("SLACK_BOT_CHANNEL")
SLACK_TOKEN = os.getenv("SLACK_BOT_TOKEN")

client = WebClient(token=SLACK_TOKEN)
message_text = f"Model: {context.current_model.name}. Status: {context.current_model.status}."

try:
    response = client.chat_postMessage(
        channel=CHANNEL_ID,
        text=message_text
    )
except SlackApiError as e:
    assert e.response["error"]

4. Add a meta section in your schema.yml

models:
  - name: historical_ozone_levels
    description: Ozone levels
    config:
      materialized: table
    columns:
      - name: ozone_level
        description: Ozone level
      - name: ds
        description: Date
    meta:
      fal:
        scripts:
          - send_slack_message.py

5. (New!) Run fal flow run

$ fal flow run
# both your dbt models and fal scripts are run

6. Alternatively run dbt and fal consecutively

$ dbt run
# Your dbt models are run

$ fal run
# Your python scripts are run

Examples

To explore what is possible with fal, take a look at the in-depth examples below. We will be adding more examples here over time:

Check out the examples directory for more

How it works?

fal is a command line tool that can read the state of your dbt project and help you run Python scripts after your dbt runs by leveraging the meta config.

models:
  - name: historical_ozone_levels
    ...
    meta:
      fal:
        post-hook:
          # scripts will run concurrently
          - send_slack_message.py
          - another_python_script.py

fal also provides useful helpers within the Python context to seamlessly interact with dbt models: ref("my_dbt_model_name") will pull a dbt model into your Python script as a pandas.DataFrame.

Running scripts before dbt runs

Run scripts before the model runs by using the pre-hook: configuration option.

Given the following schema.yml:

models:
  - name: boston
    description: Ozone levels
    config:
      materialized: table
    meta:
      owner: "@meder"
      fal:
        scripts:
          pre-hook:
            - fal_scripts/trigger_fivetran.py
          post-hook:
            - fal_scripts/slack.py

fal flow run will run fal_scripts/trigger_fivetran.py, then the boston dbt model, and finally fal_scripts/slack.py. If a model is selected with a selection flag (e.g. --select boston), the hooks associated to the model will always run with it.

$ fal flow run --select boston

Concepts

profile.yml and Credentials

fal integrates with dbt's profile.yml file to access and read data from the data warehouse. Once you setup credentials in your profile.yml file for your existing dbt workflows anytime you use ref or source to create a dataframe fal authenticates using the credentials specified in the profile.yml file.

meta Syntax

models:
  - name: historical_ozone_levels
    ...
    meta:
      owner: "@me"
      fal:
        post-hook:
          - send_slack_message.py
          - another_python_script.py

Use the fal and post-hook keys underneath the meta config to let fal CLI know where to look for the Python scripts. You can pass a list of scripts as shown above to run one or more scripts as a post-hook operation after a dbt run.

Variables and functions

Inside a Python script, you get access to some useful variables and functions

Variables

A context object with information relevant to the model through which the script was run. For the meta Syntax example, we would get the following:

context.current_model.name
#= historical_ozone_levels

context.current_model.meta
#= {'owner': '@me'}

context.current_model.meta.get("owner")
#= '@me'

context.current_model.status
# Could be one of
#= 'success'
#= 'error'
#= 'skipped'

context object also has access to test information related to the current model. If the previous dbt command was either test or build, the context.current_model.test property is populated with a list of tests:

context.current_model.tests
#= [CurrentTest(name='not_null', modelname='historical_ozone_levels, column='ds', status='Pass')]

ref and source functions

There are also available some familiar functions from dbt

# Refer to dbt models or sources by name and returns it as `pandas.DataFrame`
ref('model_name')
source('source_name', 'table_name')

# You can use it to get the running model data
ref(context.current_model.name)

write_to_model function

It is also possible to send data back to your data-warehouse. This makes it easy to get the data, process it and upload it back into dbt territory.

This function is available in Python Data models only, that is a Python script inside your models directory. Once added, it will automatically be run by fal without having to add any extra configurations in the schema.yml.

source_df = source('source_name', 'table_name')
ref_df = ref('a_model')

# Your code here
df = ...

# Upload a `pandas.DataFrame` back to the datawarehouse
write_to_model(df)

write_to_model also accepts an optional dtype argument, which lets you specify datatypes of columns. It works the same way as dtype argument for DataFrame.to_sql function.

from sqlalchemy.types import Integer
# Upload but specifically create the `value` column with type `integer`
# Can be useful if data has `None` values
write_to_model(df, dtype={'value': Integer()})

Importing fal as a Python package

You may be interested in accessing dbt models and sources easily from a Jupyter Notebook or another Python script. For that, just import the fal package and intantiate a FalDbt project:

from fal import FalDbt
faldbt = FalDbt(profiles_dir="~/.dbt", project_dir="../my_project")

faldbt.list_sources()
# [
#    DbtSource(name='results' ...),
#    DbtSource(name='ticket_data_sentiment_analysis' ...)
#    ...
# ]

faldbt.list_models()
# [
#    DbtModel(name='zendesk_ticket_data' ...),
#    DbtModel(name='agent_wait_time' ...)
#    ...
# ]


sentiments = faldbt.source('results', 'ticket_data_sentiment_analysis')
# pandas.DataFrame
tickets = faldbt.ref('stg_zendesk_ticket_data')
# pandas.DataFrame

Supported dbt versions

Any extra configuration to work with different dbt versions is not needed, latest fal version currently supports:

  • 1.0.*
  • 1.1.*

If you need another version, open an issue and we will take a look!

Contributing / Development

We use Poetry for dependency management and easy development testing.

Use Poetry shell to trying your changes right away:

~ $ cd fal

~/fal $ poetry install

~/fal $ poetry shell
Spawning shell within [...]/fal-eFX98vrn-py3.8

~/fal fal-eFX98vrn-py3.8 $ cd ../dbt_project

~/dbt_project fal-eFX98vrn-py3.8 $ fal flow run
19:27:30  Found 5 models, 0 tests, 0 snapshots, 0 analyses, 165 macros, 0 operations, 0 seed files, 1 source, 0 exposures, 0 metrics
19:27:30 | Starting fal run for following models and scripts:
[...]

Running tests

Tests rely on a Postgres database to be present, this can be achieved with docker-compose:

~/fal $ docker-compose -f tests/docker-compose.yml up -d
Creating network "tests_default" with the default driver
Creating fal_db ... done

# Necessary for the import test
~/fal $ dbt run --profiles-dir tests/mock/mockProfile --project-dir tests/mock
Running with dbt=1.0.1
[...]
Completed successfully
Done. PASS=5 WARN=0 ERROR=0 SKIP=0 TOTAL=5

~/fal $ pytest -s

Why are we building this?

We think dbt is great because it empowers data people to get more done with the tools that they are already familiar with.

dbt's SQL only design is powerful, but if you ever want to get out of SQL-land and connect to external services or get into Python-land for any reason, you will have a hard time. We built fal to enable Python workloads (sending alerts to Slack, building predictive models, pushing data to non-data warehose destinations and more) right within dbt.

This library will form the basis of our attempt to more comprehensively enable data science workloads downstream of dbt. And because having reliable data pipelines is the most important ingredient in building predictive analytics, we are building a library that integrates well with dbt.

Have feedback or need help?

Join us in #fal on Discord

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].