Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Stars: ✭ 2,385 (+525.98%)

Mutual labels: data-engineering

Spark Alchemy

Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive

Stars: ✭ 122 (-67.98%)

Mutual labels: data-engineering

D6t Python

Accelerate data science

Stars: ✭ 118 (-69.03%)

Mutual labels: data-engineering

Just Dashboard

📊 📋 Dashboards using YAML or JSON files

Stars: ✭ 1,511 (+296.59%)

Mutual labels: data-engineering

Superset

Apache Superset is a Data Visualization and Data Exploration Platform

Stars: ✭ 42,634 (+11090.03%)

Mutual labels: data-engineering

Applied Ml

📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.

Stars: ✭ 17,824 (+4578.22%)

Mutual labels: data-engineering

Dataengineeringproject

Example end to end data engineering project.

Stars: ✭ 82 (-78.48%)

Mutual labels: data-engineering

Setl

A simple Spark-powered ETL framework that just works 🍺

Stars: ✭ 79 (-79.27%)

Mutual labels: data-engineering

Sayn

Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).

Stars: ✭ 79 (-79.27%)

Mutual labels: data-engineering

Ansible Playbook

Ansible playbook to deploy distributed technologies

Stars: ✭ 61 (-83.99%)

Mutual labels: data-engineering

Waimak

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.

Stars: ✭ 60 (-84.25%)

Mutual labels: data-engineering

Quilt

Quilt is a self-organizing data hub for S3

Stars: ✭ 1,007 (+164.3%)

Mutual labels: data-engineering

Dbt Sqlserver

dbt adapter for SQL Server and Azure SQL

Stars: ✭ 41 (-89.24%)

Mutual labels: data-engineering

Data Science On Gcp

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

Stars: ✭ 864 (+126.77%)

Mutual labels: data-engineering

Lakefs

Git-like capabilities for your object storage

Stars: ✭ 847 (+122.31%)

Mutual labels: data-engineering

Goodreads etl pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Stars: ✭ 793 (+108.14%)

Mutual labels: data-engineering

Prefect

The easiest way to automate your data

Stars: ✭ 7,956 (+1988.19%)

Mutual labels: data-engineering

Pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor

Stars: ✭ 647 (+69.82%)

Mutual labels: data-engineering

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (+66.14%)

Mutual labels: data-engineering

Pointblank

Data validation and organization of metadata for data frames and database tables

Stars: ✭ 480 (+25.98%)

Mutual labels: data-engineering

Data Engineering Book

Accumulated knowledge and experience in the field of Data Engineering

Stars: ✭ 471 (+23.62%)

Mutual labels: data-engineering

Udacity Data Engineering Projects

Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

Stars: ✭ 458 (+20.21%)

Mutual labels: data-engineering

Great expectations

Always know what to expect from your data.

Stars: ✭ 5,808 (+1424.41%)

Mutual labels: data-engineering

Active workflow

Turn complex requirements to workflows without leaving the comfort of your technology stack.

Stars: ✭ 413 (+8.4%)

Mutual labels: data-engineering

Airbyte

Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.

Stars: ✭ 4,919 (+1191.08%)

Mutual labels: data-engineering

Feast

Feature Store for Machine Learning

Stars: ✭ 2,576 (+576.12%)

Mutual labels: data-engineering

Cookbook

The Data Engineering Cookbook

Stars: ✭ 9,829 (+2479.79%)

Mutual labels: data-engineering

61-95 of 95 similar projects

‹