Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Stars: ✭ 2,385 (+2918.99%)

Mutual labels: data-science, etl, data-engineering

Ether sql

A python library to push ethereum blockchain data into an sql database.

Stars: ✭ 41 (-48.1%)

Mutual labels: sql, analytics, etl

Superset

Apache Superset is a Data Visualization and Data Exploration Platform

Stars: ✭ 42,634 (+53867.09%)

Mutual labels: data-science, analytics, data-engineering

Dagster

An orchestration platform for the development, production, and observation of data assets.

Stars: ✭ 4,099 (+5088.61%)

Mutual labels: data-science, analytics, etl

Trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Stars: ✭ 4,581 (+5698.73%)

Mutual labels: data-science, sql, analytics

Aws Serverless Data Lake Framework

Enterprise-grade, production-hardened, serverless data lake on AWS

Stars: ✭ 179 (+126.58%)

Mutual labels: analytics, etl, data-engineering

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (+701.27%)

Mutual labels: data-science, etl, data-engineering

Web Database Analytics

Web scrapping and related analytics using Python tools

Stars: ✭ 175 (+121.52%)

Mutual labels: data-science, sql, analytics

beneath

Beneath is a serverless real-time data platform ⚡️

Stars: ✭ 65 (-17.72%)

Mutual labels: etl, analytics, data-engineering

Dataform

Dataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift

Stars: ✭ 342 (+332.91%)

Mutual labels: analytics, etl, data-engineering

Prefect

The easiest way to automate your data

Stars: ✭ 7,956 (+9970.89%)

Mutual labels: automation, data-science, data-engineering

Data Science On Gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

Stars: ✭ 864 (+993.67%)

Mutual labels: data-science, data-engineering

Walkoff

A flexible, easy to use, automation framework allowing users to integrate their capabilities and devices to cut through the repetitive, tedious tasks slowing them down. #nsacyber

Stars: ✭ 855 (+982.28%)

Mutual labels: automation, analytics

Aws Auto Terminate Idle Emr

AWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.

Stars: ✭ 21 (-73.42%)

Mutual labels: automation, etl

Ethereum Etl

Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ

Stars: ✭ 956 (+1110.13%)

Mutual labels: sql, etl

View All Similar Projects ➔

SAYN is a modern data processing and modelling framework. Users define tasks (incl. Python, automated SQL transformations and more) and their relationships, SAYN takes care of the rest. It is designed for simplicity, flexibility and centralisation in order to bring significant efficiency gains to the data engineering workflow.

Use Cases

SAYN can be used for multiple purposes across the data engineering and analytics workflows:

Data extraction: complement tools such as Fivetran or Stitch with customised extraction processes.
Data modelling: transform raw data in your data warehouse (e.g. aggregate activity or sessions, calculate marketing campaign ROI, etc.).
Data science: integrate and execute data science models.

Key Features

SAYN has the following key features:

YAML based DAG (Direct Acyclic Graph) creation. This means all analysts, including non Python proficient ones, can easily add tasks to ETL processes with SAYN.
Automated SQL transformations: write your SELECT statement. SAYN turns it into a table/view and manages everything for you.
Jinja parameters: switch easily between development and product environment and other tricks with Jinja templating.
Python tasks: use Python scripts to complement your extraction and loading layer and build data science models.
Multiple databases supported.
and much more... See the Documentation.

Design Principles

SAYN aims to empower data engineers and analysts through its three core design principles:

Simplicity: data processes should be easy to create, scale and maintain. So your team can focus on data transformation instead of writing processes. SAYN orchestrates all your tasks systematically and provides a lot of automation features.
Flexibility: the power of data is unlimited and so should your tooling. SAYN supports both SQL and Python so your analysts can choose the most optimal solution for each process.
Centralisation: all analytics code should live in one place, making your life easier and allowing dependencies throughout the whole analytics process.

Quick Start

$ pip install sayn
$ sayn init test_sayn
$ cd test_sayn
$ sayn run

This is it! You completed your first SAYN run on the example project. Continue with the Tutorial: Part 1 which will give you a good overview of SAYN's true power!

Release Updates

If you want to receive update emails about SAYN releases, you can sign up here.

Support

If you need any help with SAYN, or simply want to know more, please contact the team at [email protected].

License

SAYN is open source under the Apache 2.0 license.

Made with ❤️ by 173tech.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 79

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (22) 🔗