A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.

Stars: ✭ 612 (+2010.34%)

Mutual labels: data-engineering

blockchain-etl-streaming

Streaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes

Stars: ✭ 57 (+96.55%)

Mutual labels: data-engineering

gallia-core

A schema-aware Scala library for data transformation

Stars: ✭ 44 (+51.72%)

Mutual labels: data-engineering

CogStack-NiFi

Building data processing pipelines for documents processing with NLP using Apache NiFi and related services

Stars: ✭ 22 (-24.14%)

Mutual labels: data-pipelines

datajoint-python

Relational data pipelines for the science lab

Stars: ✭ 140 (+382.76%)

Mutual labels: data-pipelines

dbt-sugar

dbt-sugar is a CLI tool that allows users of dbt to have fun and ease performing actions around dbt models

Stars: ✭ 139 (+379.31%)

Mutual labels: data-engineering

Data-Engineering-Projects

Personal Data Engineering Projects

Stars: ✭ 167 (+475.86%)

Mutual labels: data-engineering

rivery cli

Rivery CLI

Stars: ✭ 16 (-44.83%)

Mutual labels: data-pipelines

View All Similar Projects ➔

Machine Learning in production using Apache Airflow

To build a solution using Machine Learning is a complex task by itself. Whilst academic Machine Learning has its roots in research from the 1980s, the practical implementation of Machine Learning Systems in production is still relatively new.

This project is an example of how you can improve the two parts of any Machine Learning project - Data Validation and Model Evaluation. The goal is to share practical ideas, that you can introduce in your project relatively simple, but still achieve great benefits.

Data Validation is the process of ensuring that data is present, correct, and meaningful. Ensuring the quality of your data through automated validation checks is a critical step in building data pipelines at any organization.
Model validation occurs after you successfully train the model given the new data. We evaluate and validate the model before it's promoted to production. Ideally, the offline model validation step should include.

You can read more details in the article on Medium.

Installation

The project is dockerized and you have two options to run it:

make pull - the prebuilt image will be pulled from the Docker Hub;
make build - you can also build the Docker image by yourself;
make init_config will initialize all necessary configs;
make up_d will start up your application detached mode. After the application is started, you can easily have access to the project by the link http://localhost:8080/

Useage

make bash will create a new Bash session in the container.
make stop stops running containers without removing them.
make down stops and removes containers.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

DanilBaibak / ml-in-production

Programming Languages

Labels

Projects that are alternatives of or similar to ml-in-production

Machine Learning in production using Apache Airflow

Installation

Useage