Web-based SQL editor run in your own private cloud. Supports MySQL, Postgres, SQL Server, Vertica, Crate, ClickHouse, Trino, Presto, SAP HANA, Cassandra, Snowflake, BigQuery, SQLite, and more with ODBC

Stars: ✭ 4,113 (+5106.33%)

Mutual labels: bigquery, postgres, snowflake

polygon-etl

ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub

Stars: ✭ 53 (-32.91%)

Mutual labels: bigquery, airflow, etl

Tbls

tbls is a CI-Friendly tool for document a database, written in Go.

Stars: ✭ 940 (+1089.87%)

Mutual labels: bigquery, sqlite, snowflake

starlake

Starlake is a Spark Based On Premise and Cloud ELT/ETL Framework for Batch & Stream Processing

Stars: ✭ 16 (-79.75%)

Mutual labels: bigquery, etl, snowflake

bigquery-kafka-connect

☁️ nodejs kafka connect connector for Google BigQuery

Stars: ✭ 17 (-78.48%)

Mutual labels: bigquery, etl

etlflow

EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.

Stars: ✭ 38 (-51.9%)

Mutual labels: bigquery, etl

nim-gatabase

Connection-Pooling Compile-Time ORM for Nim

Stars: ✭ 103 (+30.38%)

Mutual labels: postgres, sqlite

incubator-liminal

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.

Stars: ✭ 117 (+48.1%)

Mutual labels: airflow, workflows

hamilton

A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.

Stars: ✭ 612 (+674.68%)

Mutual labels: etl, pandas

go-bqloader

bqloader is a simple ETL framework to load data from Cloud Storage into BigQuery.

Stars: ✭ 16 (-79.75%)

Mutual labels: bigquery, etl

go-storage

A vendor-neutral storage library for Golang: Write once, run on every storage service.

Stars: ✭ 387 (+389.87%)

Mutual labels: s3, gcs

pgsink

Logically replicate data out of Postgres into sinks (files, Google BigQuery, etc)

Stars: ✭ 53 (-32.91%)

Mutual labels: bigquery, postgres

dbt-ml-preprocessing

A SQL port of python's scikit-learn preprocessing module, provided as cross-database dbt macros.

Stars: ✭ 128 (+62.03%)

Mutual labels: bigquery, snowflake

erdiagram

Entity-Relationship diagram code generator library

Stars: ✭ 28 (-64.56%)

Mutual labels: postgres, sqlite

sftp-gateway

This repository contains a docker image configured to use the SSH File Transfer Protocol (SFTP) to transfer all its files to Cloud Blob Storage Services. This image can be deployed on a Kubernetes cluster with Helm.

Stars: ✭ 18 (-77.22%)

Mutual labels: s3, gcs

Data-Engineering-Projects

Personal Data Engineering Projects

Stars: ✭ 167 (+111.39%)

Mutual labels: postgres, airflow

View All Similar Projects ➔

astro

workflows made easy

astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python. It helps DAG authors to achieve more with less code. It is powered by Apache Airflow and maintained by Astronomer.

⚠️ Disclaimer This project's development status is alpha. In other words, it is not production-ready yet. The interfaces may change. We welcome alpha users and brave souls to test it - any feedback is welcome.

Install

Astro is available at PyPI. Use the standard Python installation tools.

To install a cloud-agnostic version of Astro, run:

pip install astro-projects

If using cloud providers, install using the optional dependencies of interest:

pip install astro-projects[amazon,google,snowflake,postgres]

Quick-start

After installing Astro, copy the following example dag calculate_popular_movies.py to a local directory named dags:

from datetime import datetime
from airflow import DAG
from astro import sql as aql
from astro.sql.table import Table


@aql.transform()
def top_five_animations(input_table: Table):
    return """
        SELECT Title, Rating
        FROM {{input_table}}
        WHERE Genre1=='Animation'
        ORDER BY Rating desc
        LIMIT 5;
    """


with DAG(
    "calculate_popular_movies",
    schedule_interval=None,
    start_date=datetime(2000, 1, 1),
    catchup=False,
) as dag:
    imdb_movies = aql.load_file(
        path="https://raw.githubusercontent.com/astro-projects/astro/main/tests/data/imdb.csv",
        task_id="load_csv",
        output_table=Table(
            table_name="imdb_movies", database="sqlite", conn_id="sqlite_default"
        ),
    )

    top_five_animations(
        input_table=imdb_movies,
        output_table=Table(
            table_name="top_animation", database="sqlite", conn_id="sqlite_default"
        ),
    )

Set up a local instance of Airflow by running:

export AIRFLOW_HOME=`pwd`
export AIRFLOW__CORE__ENABLE_XCOM_PICKLING=True

airflow db init

Create an SQLite database for the example to run with and run the DAG:

# The sqlite_default connection has different host for MAC vs. Linux
export SQL_TABLE_NAME=`airflow connections get sqlite_default -o yaml | grep host | awk '{print $2}'`

sqlite3 "$SQL_TABLE_NAME" "VACUUM;"
airflow dags test calculate_popular_movies `date -Iseconds`

Check the top five animations calculated by your first Astro DAG by running:

sqlite3 "$SQL_TABLE_NAME" "select * from top_animation;" ".exit"

You should see the following output:

$ sqlite3 "$SQL_TABLE_NAME" "select * from top_animation;" ".exit"
Toy Story 3 (2010)|8.3
Inside Out (2015)|8.2
How to Train Your Dragon (2010)|8.1
Zootopia (2016)|8.1
How to Train Your Dragon 2 (2014)|7.9

Requirements

Because astro relies on the Task Flow API and it depends on Apache Airflow >= 2.1.0.

Supported technologies

Databases	File types	File locations
Google BigQuery	CSV	Amazon S3
Postgres	JSON	Filesystem
Snowflake	NDJSON	Google GCS
SQLite	Parquet

Available operations

A summary of the currently available operations in astro. More details are available in the reference guide.

load_file: load a given file into a SQL table
transform: applies a SQL select statement to a source table and saves the result to a destination table
truncate: remove all records from a SQL table
run_raw_sql: run any SQL statement without handling its output
append: insert rows from the source SQL table into the destination SQL table, if there are no conflicts
merge: insert rows from the source SQL table into the destination SQL table, depending on conflicts:
- ignore: do not add rows that already exist
- update: replace existing rows with new ones
save_file: export SQL table rows into a destination file
dataframe: export given SQL table into in-memory Pandas data-frame
render: given a directory containing SQL statements, dynamically create transform tasks within a DAG

Documentation

The documentation is a work in progress, and we aim to follow the Diátaxis system:

Tutorial: a hands-on introduction to astro
How-to guides: simple step-by-step user guides to accomplish specific tasks
Reference guide: commands, modules, classes and methods
Explanation: Clarification and discussion of key decisions when designing the project.

Changelog

We follow Semantic Versioning for releases. Check the changelog for the latest changes.

Release Managements

To learn more about our release philosophy and steps, check here

Contribution Guidelines

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.

Read the Contribution Guideline for a detailed overview on how to contribute.

As contributors and maintainers to this project, you should abide by the Contributor Code of Conduct.

License

Apache Licence 2.0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

astro-projects / astro

Programming Languages

Labels

Projects that are alternatives of or similar to astro

astro

workflows made easy

Install

Quick-start

Requirements

Supported technologies

Available operations

Documentation

Changelog

Release Managements

Contribution Guidelines

License