dbt-sugardbt-sugar is a CLI tool that allows users of dbt to have fun and ease performing actions around dbt models
Stars: ✭ 139 (-73.27%)
airflow-dbt-pythonA collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
Stars: ✭ 111 (-78.65%)
Udacity Data Engineering ProjectsFew projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (-11.92%)
Applied Ml📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Stars: ✭ 17,824 (+3327.69%)
PyjanitorClean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 647 (+24.42%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-76.54%)
Learn Something Every Day📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->
Stars: ✭ 362 (-30.38%)
etl managerA python package to create a database on the platform using our moj data warehousing framework
Stars: ✭ 14 (-97.31%)
Ansible PlaybookAnsible playbook to deploy distributed technologies
Stars: ✭ 61 (-88.27%)
growthbookOpen Source Feature Flagging and A/B Testing Platform
Stars: ✭ 2,342 (+350.38%)
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+52.5%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (-75.77%)
PointblankData validation and organization of metadata for data frames and database tables
Stars: ✭ 480 (-7.69%)
YuniqlFree and open source schema versioning and database migration made natively with .NET Core.
Stars: ✭ 156 (-70%)
Active workflowTurn complex requirements to workflows without leaving the comfort of your technology stack.
Stars: ✭ 413 (-20.58%)
Just Dashboard📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+190.58%)
EgeriaOpen Metadata and Governance
Stars: ✭ 328 (-36.92%)
Gspread PandasA package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
Stars: ✭ 226 (-56.54%)
FeastFeature Store for Machine Learning
Stars: ✭ 2,576 (+395.38%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-84.81%)
Data Engineering HowtoA list of useful resources to learn Data Engineering from scratch
Stars: ✭ 2,056 (+295.38%)
QuiltQuilt is a self-organizing data hub for S3
Stars: ✭ 1,007 (+93.65%)
LakefsGit-like capabilities for your object storage
Stars: ✭ 847 (+62.88%)
PipelinexPipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Stars: ✭ 127 (-75.58%)
PrefectThe easiest way to automate your data
Stars: ✭ 7,956 (+1430%)
AuptimizerAn automatic ML model optimization tool.
Stars: ✭ 166 (-68.08%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+21.73%)
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+358.65%)
Data Engineering BookAccumulated knowledge and experience in the field of Data Engineering
Stars: ✭ 471 (-9.42%)
PloomberA convention over configuration workflow orchestrator. Develop locally (Jupyter or your favorite editor), deploy to Airflow or Kubernetes.
Stars: ✭ 221 (-57.5%)
D6t PythonAccelerate data science
Stars: ✭ 118 (-77.31%)
GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-70.77%)
DataformDataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Stars: ✭ 342 (-34.23%)
SupersetApache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+8098.85%)
BenthosFancy stream processing made operationally mundane
Stars: ✭ 3,705 (+612.5%)
Every Single Day I TldrA daily digest of the articles or videos I've found interesting, that I want to share with you.
Stars: ✭ 249 (-52.12%)
AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+845.96%)
CookbookThe Data Engineering Cookbook
Stars: ✭ 9,829 (+1790.19%)
Gcp Data Engineer ExamStudy materials for the Google Cloud Professional Data Engineering Exam
Stars: ✭ 144 (-72.31%)
ClassifyBotAutomate building ML classification pipelines in .NET
Stars: ✭ 16 (-96.92%)
SaynData processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (-84.81%)
beneathBeneath is a serverless real-time data platform ⚡️
Stars: ✭ 65 (-87.5%)
WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-88.46%)
yt-channels-DS-AI-ML-CSA comprehensive list of 180+ YouTube Channels for Data Science, Data Engineering, Machine Learning, Deep learning, Computer Science, programming, software engineering, etc.
Stars: ✭ 1,038 (+99.62%)
AcceleratorThe Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (-73.65%)
Dbt Sqlserverdbt adapter for SQL Server and Azure SQL
Stars: ✭ 41 (-92.12%)
mpc-DL-controllerDeep Neural Network architecture as a predictive optimal controller for {HVAC+Solar cell + battery} disturbance afflicted system vs classic Model Predictive Control
Stars: ✭ 37 (-92.88%)
Soda SqlMetric collection, data testing and monitoring for SQL accessible data
Stars: ✭ 173 (-66.73%)