versatile-data-kitVersatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+657.89%)
beneathBeneath is a serverless real-time data platform ⚡️
Stars: ✭ 65 (+242.11%)
ml-in-productionThe practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.
Stars: ✭ 29 (+52.63%)
get smartiesDummy variable generation with fit/transform capabilities
Stars: ✭ 23 (+21.05%)
PloomberA convention over configuration workflow orchestrator. Develop locally (Jupyter or your favorite editor), deploy to Airflow or Kubernetes.
Stars: ✭ 221 (+1063.16%)
AuptimizerAn automatic ML model optimization tool.
Stars: ✭ 166 (+773.68%)
Gcp Data Engineer ExamStudy materials for the Google Cloud Professional Data Engineering Exam
Stars: ✭ 144 (+657.89%)
papiloDEPRECATED: Stream data processing micro-framework
Stars: ✭ 24 (+26.32%)
PipelinexPipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Stars: ✭ 127 (+568.42%)
Every Single Day I TldrA daily digest of the articles or videos I've found interesting, that I want to share with you.
Stars: ✭ 249 (+1210.53%)
blockchain-etl-streamingStreaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Stars: ✭ 57 (+200%)
GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (+700%)
smart-data-lakeSmart Automation Tool for building modern Data Lakes and Data Pipelines
Stars: ✭ 79 (+315.79%)
AcceleratorThe Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (+621.05%)
dbt-sugardbt-sugar is a CLI tool that allows users of dbt to have fun and ease performing actions around dbt models
Stars: ✭ 139 (+631.58%)
lrmrLess-Resilient MapReduce framework for Go
Stars: ✭ 32 (+68.42%)
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+12452.63%)
D6t PythonAccelerate data science
Stars: ✭ 118 (+521.05%)
SupersetApache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+224289.47%)
datartDatart is a next generation Data Visualization Open Platform
Stars: ✭ 1,042 (+5384.21%)
etl[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library
Stars: ✭ 279 (+1368.42%)
airflow-dbt-pythonA collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
Stars: ✭ 111 (+484.21%)
prefect-saturnPython client for using Prefect Cloud with Saturn Cloud
Stars: ✭ 15 (-21.05%)
preprocessyPython package for Customizable Data Preprocessing Pipelines
Stars: ✭ 34 (+78.95%)
Gspread PandasA package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
Stars: ✭ 226 (+1089.47%)
deordie-meetupsDE or DIE meetup made by data engineers for data engineers. Currently in Russian only.
Stars: ✭ 48 (+152.63%)
Soda SqlMetric collection, data testing and monitoring for SQL accessible data
Stars: ✭ 173 (+810.53%)
datajoint-pythonRelational data pipelines for the science lab
Stars: ✭ 140 (+636.84%)
YuniqlFree and open source schema versioning and database migration made natively with .NET Core.
Stars: ✭ 156 (+721.05%)
contessaEasy way to define, execute and store quality rules for your data.
Stars: ✭ 17 (-10.53%)
CogStack-NiFiBuilding data processing pipelines for documents processing with NLP using Apache NiFi and related services
Stars: ✭ 22 (+15.79%)
Data Engineering HowtoA list of useful resources to learn Data Engineering from scratch
Stars: ✭ 2,056 (+10721.05%)
Everything-TechA collection of online resources to help you on your Tech journey.
Stars: ✭ 396 (+1984.21%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (+563.16%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (+542.11%)
polygon-etlETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (+178.95%)
Just Dashboard📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+7852.63%)
soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (+205.26%)
Applied Ml📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Stars: ✭ 17,824 (+93710.53%)
uptasticsearchAn Elasticsearch client tailored to data science workflows.
Stars: ✭ 47 (+147.37%)
qsvCSVs sliced, diced & analyzed.
Stars: ✭ 438 (+2205.26%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (+315.79%)
SaynData processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (+315.79%)
Ansible PlaybookAnsible playbook to deploy distributed technologies
Stars: ✭ 61 (+221.05%)
WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (+215.79%)
QuiltQuilt is a self-organizing data hub for S3
Stars: ✭ 1,007 (+5200%)
AirflowETLBlog post on ETL pipelines with Airflow
Stars: ✭ 20 (+5.26%)
hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+3121.05%)