All Projects → DanilBaibak → ml-in-production

DanilBaibak / ml-in-production

Licence: MIT license
The practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects
Dockerfile
14818 projects
TSQL
950 projects
shell
77523 projects

Projects that are alternatives of or similar to ml-in-production

versatile-data-kit
Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+396.55%)
Mutual labels:  data-engineering, data-pipelines
beneath
Beneath is a serverless real-time data platform ⚡️
Stars: ✭ 65 (+124.14%)
Mutual labels:  data-engineering, data-pipelines
AirflowDataPipeline
Example of an ETL Pipeline using Airflow
Stars: ✭ 24 (-17.24%)
Mutual labels:  data-engineering, data-pipelines
viewflow
Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.
Stars: ✭ 110 (+279.31%)
Mutual labels:  data-engineering, apache-airflow
neon-workshop
A Pachyderm deep learning tutorial for conference workshops
Stars: ✭ 19 (-34.48%)
Mutual labels:  data-engineering, data-pipelines
redshift plugin
No description or website provided.
Stars: ✭ 22 (-24.14%)
Mutual labels:  apache-airflow
h4sci-course
ETH PhD Program course
Stars: ✭ 19 (-34.48%)
Mutual labels:  data-engineering
airflow-client-python
Apache Airflow - OpenApi Client for Python
Stars: ✭ 172 (+493.1%)
Mutual labels:  apache-airflow
pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 970 (+3244.83%)
Mutual labels:  data-engineering
practical-data-engineering
Real estate dagster pipeline
Stars: ✭ 110 (+279.31%)
Mutual labels:  data-engineering
airflow-boilerplate
A complete development environment setup for working with Airflow
Stars: ✭ 94 (+224.14%)
Mutual labels:  apache-airflow
preprocessy
Python package for Customizable Data Preprocessing Pipelines
Stars: ✭ 34 (+17.24%)
Mutual labels:  data-engineering
hamilton
A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+2010.34%)
Mutual labels:  data-engineering
blockchain-etl-streaming
Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Stars: ✭ 57 (+96.55%)
Mutual labels:  data-engineering
gallia-core
A schema-aware Scala library for data transformation
Stars: ✭ 44 (+51.72%)
Mutual labels:  data-engineering
CogStack-NiFi
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
Stars: ✭ 22 (-24.14%)
Mutual labels:  data-pipelines
datajoint-python
Relational data pipelines for the science lab
Stars: ✭ 140 (+382.76%)
Mutual labels:  data-pipelines
dbt-sugar
dbt-sugar is a CLI tool that allows users of dbt to have fun and ease performing actions around dbt models
Stars: ✭ 139 (+379.31%)
Mutual labels:  data-engineering
Data-Engineering-Projects
Personal Data Engineering Projects
Stars: ✭ 167 (+475.86%)
Mutual labels:  data-engineering
rivery cli
Rivery CLI
Stars: ✭ 16 (-44.83%)
Mutual labels:  data-pipelines

Machine Learning in production using Apache Airflow

To build a solution using Machine Learning is a complex task by itself. Whilst academic Machine Learning has its roots in research from the 1980s, the practical implementation of Machine Learning Systems in production is still relatively new.

This project is an example of how you can improve the two parts of any Machine Learning project - Data Validation and Model Evaluation. The goal is to share practical ideas, that you can introduce in your project relatively simple, but still achieve great benefits.

  • Data Validation is the process of ensuring that data is present, correct, and meaningful. Ensuring the quality of your data through automated validation checks is a critical step in building data pipelines at any organization.
  • Model validation occurs after you successfully train the model given the new data. We evaluate and validate the model before it's promoted to production. Ideally, the offline model validation step should include.

You can read more details in the article on Medium.

Installation

The project is dockerized and you have two options to run it:

  • make pull - the prebuilt image will be pulled from the Docker Hub;
  • make build - you can also build the Docker image by yourself;
  • make init_config will initialize all necessary configs;
  • make up_d will start up your application detached mode. After the application is started, you can easily have access to the project by the link http://localhost:8080/

Useage

  • make bash will create a new Bash session in the container.
  • make stop stops running containers without removing them.
  • make down stops and removes containers.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].