All Projects β†’ capitalone β†’ rubicon-ml

capitalone / rubicon-ml

Licence: Apache-2.0 license
Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
CSS
56736 projects

Projects that are alternatives of or similar to rubicon-ml

narps
Code related to Neuroimaging Analysis Replication and Prediction Study
Stars: ✭ 31 (-61.73%)
Mutual labels:  reproducibility
EasyGitianBuilder
πŸ”¨ Gitian Building made simpler on any Windows Debian/Ubuntu MacOS with Vagrant, lxc, and virtualbox
Stars: ✭ 18 (-77.78%)
Mutual labels:  reproducibility
ck
Portable automation meta-framework to manage, describe, connect and reuse any artifacts, scripts, tools and workflows on any platform with any software and hardware in a non-intrusive way and with minimal effort. Try it using this tutorial to modularize and automate ML Systems benchmarking from the Student Cluster Competition at SC'22:
Stars: ✭ 501 (+518.52%)
Mutual labels:  reproducibility
hydra-zen
Pythonic functions for creating and enhancing Hydra applications
Stars: ✭ 165 (+103.7%)
Mutual labels:  reproducibility
binderhub-deploy
Deploy a BinderHub from scratch on Microsoft Azure
Stars: ✭ 27 (-66.67%)
Mutual labels:  reproducibility
rna-seq-kallisto-sleuth
A Snakemake workflow for differential expression analysis of RNA-seq data with Kallisto and Sleuth.
Stars: ✭ 56 (-30.86%)
Mutual labels:  reproducibility
targets-tutorial
Short course on the targets R package
Stars: ✭ 87 (+7.41%)
Mutual labels:  reproducibility
alchemy
Experiments logging & visualization
Stars: ✭ 49 (-39.51%)
Mutual labels:  reproducibility
caltech samaritan
πŸšγ€°οΈ Drone SLAM project for Caltech's ME 134 Autonomy class.
Stars: ✭ 35 (-56.79%)
Mutual labels:  exploration
postr
Prepare reproducible R Markdown posters
Stars: ✭ 68 (-16.05%)
Mutual labels:  reproducibility
benchmark VAE
Unifying Variational Autoencoder (VAE) implementations in Pytorch (NeurIPS 2022)
Stars: ✭ 1,211 (+1395.06%)
Mutual labels:  reproducibility
software-dev
Coding Standards for the USC Biostats group
Stars: ✭ 33 (-59.26%)
Mutual labels:  reproducibility
analysis-flow
Data Analysis Workflows & Reproducibility Learning Resources
Stars: ✭ 108 (+33.33%)
Mutual labels:  reproducibility
synthesizing-robust-adversarial-examples
My entry for ICLR 2018 Reproducibility Challenge for paper Synthesizing robust adversarial examples https://openreview.net/pdf?id=BJDH5M-AW
Stars: ✭ 60 (-25.93%)
Mutual labels:  reproducibility
mlreef
The collaboration workspace for Machine Learning
Stars: ✭ 1,409 (+1639.51%)
Mutual labels:  reproducibility
ase exploration
Planning for robotic exploration based on forward simulation
Stars: ✭ 82 (+1.23%)
Mutual labels:  exploration
contextual
Contextual Bandits in R - simulation and evaluation of Multi-Armed Bandit Policies
Stars: ✭ 72 (-11.11%)
Mutual labels:  exploration
lightning-hydra-template
PyTorch Lightning + Hydra. A very user-friendly template for rapid and reproducible ML experimentation with best practices. ⚑πŸ”₯⚑
Stars: ✭ 1,905 (+2251.85%)
Mutual labels:  reproducibility
ukbrest
ukbREST: efficient and streamlined data access for reproducible research of large biobanks
Stars: ✭ 32 (-60.49%)
Mutual labels:  reproducibility
daskperiment
Reproducibility for Humans: A lightweight tool to perform reproducible machine learning experiment.
Stars: ✭ 25 (-69.14%)
Mutual labels:  reproducibility

rubicon-ml

Test Package Publish Package Publish Docs edgetest

Conda Version PyPi Version Binder

Purpose

rubicon-ml is a data science tool that captures and stores model training and execution information, like parameters and outcomes, in a repeatable and searchable way. Its git integration associates these inputs and outputs directly with the model code that produced them to ensure full auditability and reproducibility for both developers and stakeholders alike. While experimenting, the dashboard makes it easy to explore, filter, visualize, and share recorded work.

p.s. If you're looking for Rubicon, the Java/ObjC Python bridge, visit this instead.


Components

rubicon-ml is composed of three parts:

  • A Python library for storing and retrieving model inputs, outputs, and analyses to filesystems that’s powered by fsspec
  • A dashboard for exploring, comparing, and visualizing logged data built with dash
  • And a process for sharing a selected subset of logged data with collaborators or reviewers that leverages intake

Workflow

Use rubicon_ml to capture model inputs and outputs over time. It can be easily integrated into existing Python models or pipelines and supports both concurrent logging (so multiple experiments can be logged in parallel) and asynchronous communication with S3 (so network reads and writes won’t block).

Meanwhile, periodically review the logged data within the Rubicon dashboard to steer the model tweaking process in the right direction. The dashboard lets you quickly spot trends by exploring and filtering your logged results and visualizes how the model inputs impacted the model outputs.

When the model is ready for review, Rubicon makes it easy to share specific subsets of the data with model reviewers and stakeholders, giving them the context necessary for a complete model review and approval.

Use

Check out the interactive notebooks in this Binder to try rubicon_ml for yourself.

Here's a simple example:

from rubicon_ml import Rubicon

rubicon = Rubicon(
    persistence="filesystem", root_dir="/rubicon-root", auto_git_enabled=True
)

project = rubicon.create_project(
    "Hello World", description="Using rubicon to track model results over time."
)

experiment = project.log_experiment(
    training_metadata=[SklearnTrainingMetadata("sklearn.datasets", "my-data-set")],
    model_name="My Model Name",
    tags=["my_model_name"],
)

experiment.log_parameter("n_estimators", n_estimators)
experiment.log_parameter("n_features", n_features)
experiment.log_parameter("random_state", random_state)

accuracy = rfc.score(X_test, y_test)
experiment.log_metric("accuracy", accuracy)

Then explore the project by running the dashboard:

rubicon_ml ui --root-dir /rubicon-root

Documentation

For a full overview, visit the docs. If you have suggestions or find a bug, please open an issue.

Install

The Python library is available on Conda Forge via conda and PyPi via pip.

conda config --add channels conda-forge
conda install rubicon-ml

or

pip install rubicon-ml

Develop

The project uses conda to manage environments. First, install conda. Then use conda to setup a development environment:

conda env create -f environment.yml
conda activate rubicon-ml-dev

Finally, install rubicon_ml locally into the newly created environment.

pip install -e ".[all]"

Testing

The tests are separated into unit and integration tests. They can be run directly in the activated dev environment via pytest tests/unit or pytest tests/integration. Or by simply running pytest to execute all of them.

Note: some integration tests are intentionally marked to control when they are run (i.e. not during CICD). These tests include:

  • Integration tests that write to physical filesystems - local and S3. Local files will be written to ./test-rubicon relative to where the tests are run. An S3 path must also be provided to run these tests. By default, these tests are disabled. To enable them, run:

    pytest -m "write_files" --s3-path "s3://my-bucket/my-key"
    
  • Integration tests that run Jupyter notebooks. These tests are a bit slower than the rest of the tests in the suite as they need to launch Jupyter servers. By default, they are enabled. To disable them, run:

    pytest -m "not run_notebooks and not write_files"
    

    Note: When simply running pytest, -m "not write_files" is the default. So, we need to also apply it when disabling notebook tests.

Code Formatting

Install and configure pre-commit to automatically run black, flake8, and isort during commits:

Now pre-commit will run automatically on git commit and will ensure consistent code format throughout the project. You can format without committing via pre-commit run or skip these checks with git commit --no-verify.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].