Every Single Day I TldrA daily digest of the articles or videos I've found interesting, that I want to share with you.
PloomberA convention over configuration workflow orchestrator. Develop locally (Jupyter or your favorite editor), deploy to Airflow or Kubernetes.
Gspread PandasA package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
Soda SqlMetric collection, data testing and monitoring for SQL accessible data
AuptimizerAn automatic ML model optimization tool.
YuniqlFree and open source schema versioning and database migration made natively with .NET Core.
GeniA Clojure dataframe library that runs on Spark
AcceleratorThe Accelerator is a tool for fast and reproducible processing of large amounts of data.
PipelinexPipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
SupersetApache Superset is a Data Visualization and Data Exploration Platform
Applied Ml📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
SetlA simple Spark-powered ETL framework that just works 🍺
SaynData processing and modelling framework for automating tasks (incl. Python & SQL transformations).
WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
QuiltQuilt is a self-organizing data hub for S3
Data Science On GcpSource code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
LakefsGit-like capabilities for your object storage
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
PrefectThe easiest way to automate your data
PyjanitorClean APIs for data cleaning. Python implementation of R package Janitor
PointblankData validation and organization of metadata for data frames and database tables
Udacity Data Engineering ProjectsFew projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Active workflowTurn complex requirements to workflows without leaving the comfort of your technology stack.
Learn Something Every Day📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->
DataformDataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
EgeriaOpen Metadata and Governance
BenthosFancy stream processing made operationally mundane
AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
FeastFeature Store for Machine Learning
etl managerA python package to create a database on the platform using our moj data warehousing framework
ClassifyBotAutomate building ML classification pipelines in .NET
beneathBeneath is a serverless real-time data platform ⚡️
growthbookOpen Source Feature Flagging and A/B Testing Platform
yt-channels-DS-AI-ML-CSA comprehensive list of 180+ YouTube Channels for Data Science, Data Engineering, Machine Learning, Deep learning, Computer Science, programming, software engineering, etc.
mpc-DL-controllerDeep Neural Network architecture as a predictive optimal controller for {HVAC+Solar cell + battery} disturbance afflicted system vs classic Model Predictive Control
DataEngineeringThis repo contains commands that data engineers use in day to day work.