PipelinexPipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Stars: ✭ 127 (+337.93%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (+334.48%)
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+8124.14%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (+320.69%)
D6t PythonAccelerate data science
Stars: ✭ 118 (+306.9%)
Just Dashboard📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+5110.34%)
SupersetApache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+146913.79%)
Applied Ml📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Stars: ✭ 17,824 (+61362.07%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (+172.41%)
SaynData processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (+172.41%)
Ansible PlaybookAnsible playbook to deploy distributed technologies
Stars: ✭ 61 (+110.34%)
WaimakWaimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (+106.9%)
QuiltQuilt is a self-organizing data hub for S3
Stars: ✭ 1,007 (+3372.41%)
Dbt Sqlserverdbt adapter for SQL Server and Azure SQL
Stars: ✭ 41 (+41.38%)
Data Science On GcpSource code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (+2879.31%)
LakefsGit-like capabilities for your object storage
Stars: ✭ 847 (+2820.69%)
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+2634.48%)
PrefectThe easiest way to automate your data
Stars: ✭ 7,956 (+27334.48%)
PyjanitorClean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 647 (+2131.03%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+2082.76%)
PointblankData validation and organization of metadata for data frames and database tables
Stars: ✭ 480 (+1555.17%)
Data Engineering BookAccumulated knowledge and experience in the field of Data Engineering
Stars: ✭ 471 (+1524.14%)
Udacity Data Engineering ProjectsFew projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (+1479.31%)
Active workflowTurn complex requirements to workflows without leaving the comfort of your technology stack.
Stars: ✭ 413 (+1324.14%)
Learn Something Every Day📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->
Stars: ✭ 362 (+1148.28%)
DataformDataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Stars: ✭ 342 (+1079.31%)
EgeriaOpen Metadata and Governance
Stars: ✭ 328 (+1031.03%)
BenthosFancy stream processing made operationally mundane
Stars: ✭ 3,705 (+12675.86%)
AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+16862.07%)
FeastFeature Store for Machine Learning
Stars: ✭ 2,576 (+8782.76%)
CookbookThe Data Engineering Cookbook
Stars: ✭ 9,829 (+33793.1%)
etl managerA python package to create a database on the platform using our moj data warehousing framework
Stars: ✭ 14 (-51.72%)
ClassifyBotAutomate building ML classification pipelines in .NET
Stars: ✭ 16 (-44.83%)
growthbookOpen Source Feature Flagging and A/B Testing Platform
Stars: ✭ 2,342 (+7975.86%)
yt-channels-DS-AI-ML-CSA comprehensive list of 180+ YouTube Channels for Data Science, Data Engineering, Machine Learning, Deep learning, Computer Science, programming, software engineering, etc.
Stars: ✭ 1,038 (+3479.31%)
mpc-DL-controllerDeep Neural Network architecture as a predictive optimal controller for {HVAC+Solar cell + battery} disturbance afflicted system vs classic Model Predictive Control
Stars: ✭ 37 (+27.59%)
DataEngineeringThis repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (+62.07%)
DagsterAn orchestration platform for the development, production, and observation of data assets.
Stars: ✭ 4,099 (+14034.48%)
HubDataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai
Stars: ✭ 4,003 (+13703.45%)
arakatARAKAT - Big Data Analysis and Business Intelligence Application Development Platform
Stars: ✭ 23 (-20.69%)
spark-transformersSpark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.
Stars: ✭ 39 (+34.48%)
AirflowApache Airflow - A platform to programmatically author, schedule, and monitor workflows
Stars: ✭ 24,101 (+83006.9%)
airflow-code-editorA plugin for Apache Airflow that allows you to edit DAGs in browser
Stars: ✭ 195 (+572.41%)
openverse-catalogIdentifies and collects data on cc-licensed content across web crawl data and public apis.
Stars: ✭ 27 (-6.9%)