Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → ml-tooling → ml-project-template

ml-tooling / ml-project-template

Licence: other

ML project template facilitating both research and production phases.

Programming Languages

139335 projects - #7 most used programming language

14818 projects

Labels

docker machine-learning reproducibility

Projects that are alternatives of or similar to ml-project-template

open-solution-googleai-object-detection

Open solution to the Google AI Object Detection Challenge 🍁

Stars: ✭ 46 (-33.33%)

Mutual labels: reproducibility

Ten Years Reproducibility Challenge

Stars: ✭ 59 (-14.49%)

Mutual labels: reproducibility

Reproducible Data Science in Python (SciPy 2019 Tutorial)

Stars: ✭ 12 (-82.61%)

Mutual labels: reproducibility

ReproducibleScience

Short course on reproducible science: what, why, how

Stars: ✭ 23 (-66.67%)

Mutual labels: reproducibility

researchcompendium

NOTE: This repo is archived. Please see https://github.com/benmarwick/rrtools for my current approach

Stars: ✭ 26 (-62.32%)

Mutual labels: reproducibility

Synchronize your working directory efficiently to a remote place without committing the changes.

Stars: ✭ 61 (-11.59%)

Mutual labels: reproducibility

targets-minimal

A minimal example data analysis project with the targets R package

Stars: ✭ 50 (-27.54%)

Mutual labels: reproducibility

an initiative to provide infrastructure for reproducible workflows around open data

Stars: ✭ 26 (-62.32%)

Mutual labels: reproducibility

mlr3-learndrake

Template for using mlr3 with drake

Stars: ✭ 18 (-73.91%)

Mutual labels: reproducibility

A set of tools for R that enhance reproducibility beyond package management

Stars: ✭ 33 (-52.17%)

Mutual labels: reproducibility

Reproducible Bayesian data analysis pipelines with targets and cmdstanr

Stars: ✭ 31 (-55.07%)

Mutual labels: reproducibility

restlessdata.com.au/ggtrack

Stars: ✭ 39 (-43.48%)

Mutual labels: reproducibility

Reproducibilty-Challenge-ECANET

Unofficial Implementation of ECANets (CVPR 2020) for the Reproducibility Challenge 2020.

Stars: ✭ 27 (-60.87%)

Mutual labels: reproducibility

Purely functional build system and package manager

Stars: ✭ 173 (+150.72%)

Mutual labels: reproducibility

reprozip-examples

Examples and demos for ReproZip

Stars: ✭ 13 (-81.16%)

Mutual labels: reproducibility

Open Science, Open Data, Open Source

Stars: ✭ 23 (-66.67%)

Mutual labels: reproducibility

rr-organization1

The Organization lesson for the Reproducible Science Curriculum

Stars: ✭ 36 (-47.83%)

Mutual labels: reproducibility

papers-as-modules

Software Papers as Software Modules: Towards a Culture of Reusable Results

Stars: ✭ 18 (-73.91%)

Mutual labels: reproducibility

Execute and document benchmarks reproducibly.

Stars: ✭ 48 (-30.43%)

Mutual labels: reproducibility

🐶 🕵️ Great Dane turned Python environment detective

Stars: ✭ 36 (-47.83%)

Mutual labels: reproducibility

View All Similar Projects ➔

ML Project Template

This repository contains a template project that can be easily adapted for all kinds of Machine Learning tasks. Typically, solving such task entails two main phases, research and production with very different focuses. The template intends to faciliatate work on ML projects by guiding practitioners to adopt some best practices.

research: exploratory data analyses, model prototyping and experiments are dumped here in a structured way

production: distilled utils lib, training job and inference service are implemented here

It is recommended to simply clone this repo and customize it to the specific use-case at hand.

Repository Structure

research: Scripts and Notebooks for experimentation.
- develop (Python): Experimental code to try out new ideas and experiments. Use Jupyter notebooks wherever you can. Naming convention: YYYY-MM-DD_userid_short-description. If you cannot use a notebook and have multiple scripts/files for an experiment, create a folder with the same naming convention. Each file should be handled by one person only.
- deliver (Python): Refactored notebooks that contain valuable insights or results (e.g. visualizations, training runs). Notebooks should be refactored, documented, contain outputs, and use the following naming schema: YYYY-MM-DD_short-description. Notebooks in deliver should not be changed or rerun. If you want to rerun a deliver Notebook, please duplicate it into the develop folder.
- templates (Python): Refactored Notebooks that are reusable for a specific task (e.g. model training, data exploration). Notebooks should be refactored, documented, not contain any output, and use the following naming schema: short-description. If you like to make use of a template Notebook, duplicate the notebook into develop folder.
production: The production-ready solution(s) composed of libraries, services, and jobs.
- python-utils-lib (Python): Utility functions that are distilled from the research phase and used across multiple scripts. Should only contain refactored and tested Python scripts/modules. Installable via pip.
- training-job (Python/Docker): Combines required data exports, preprocessing and training scripts into a Docker container. This makes results reproducible and the production model retrainable in any ennvironment.
- inference-service (Python/Docker): Docker container that provides the final model prediction capabilities via a REST API.

Naming Conventions

Code Artifacts

develop notebooks/scripts: YYYY-MM-DD_userid_short-description
deliver notebooks/scripts: YYYY-MM-DD_short-description
template notebooks/scripts: short-description
services: -service suffix
jobs: -job suffix
libraries: -lib suffix

Files

<dataset-desc>_<preprocessing-desc>_<training-desc>.<filetype>

Examples:

blogs-metadata.csv
blogs-metadata_cl-rs_ft-vec.vectors
categories2blogs_cl-rs-lm_tfidf-lsvm.model.zip
categories2blogs-questions_cl-rs-lm_tfidf-lsvm.model.zip

Name Identifier Descriptions:

Name	Description
Dataset Identifiers:
categories2blogs	Dataset containing blogs with the text content, blogs item URI, and connected primary tags.
blogs-metadata	Dataset containing all blogs and related metadata (properties).
Preprocessing Identifiers:
cl	Default text cleaning (lowercasing, regex cleaning).
rs	Remove Stopwords.
lm	Text lemmatization.
Training Identifiers:
ft-vec	Text vectorizer using Fasttext.
tfidf	Text vectorizer using TFIDF.
lsvm	Classifier using linear SVM.
Filetype Identifiers:
.model	Model file.
.vectors	Binary vectors file.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 69

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗