ml-in-productionThe practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.
jobAnalytics and searchJobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
gallia-coreA schema-aware Scala library for data transformation
versatile-data-kitVersatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
viewflowViewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.
neon-workshopA Pachyderm deep learning tutorial for conference workshops
hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
pyjanitorClean APIs for data cleaning. Python implementation of R package Janitor
dbt-sugardbt-sugar is a CLI tool that allows users of dbt to have fun and ease performing actions around dbt models
uptasticsearchAn Elasticsearch client tailored to data science workflows.
funsiesfunsies is a lightweight workflow engine 🔧
preprocessyPython package for Customizable Data Preprocessing Pipelines
polygon-etlETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
datartDatart is a next generation Data Visualization Open Platform
morph-kgcPowerful RDF Knowledge Graph Generation with [R2]RML Mappings
deordie-meetupsDE or DIE meetup made by data engineers for data engineers. Currently in Russian only.
get smartiesDummy variable generation with fit/transform capabilities
contessaEasy way to define, execute and store quality rules for your data.
Everything-TechA collection of online resources to help you on your Tech journey.
papiloDEPRECATED: Stream data processing micro-framework
lrmrLess-Resilient MapReduce framework for Go
soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
etl[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library
qsvCSVs sliced, diced & analyzed.
airflow-dbt-pythonA collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.