All Projects → sinhrks → daskperiment

sinhrks / daskperiment

Licence: BSD-3-Clause license
Reproducibility for Humans: A lightweight tool to perform reproducible machine learning experiment.

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
HTML
75241 projects

Projects that are alternatives of or similar to daskperiment

Mars
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
Stars: ✭ 2,308 (+9132%)
Mutual labels:  dask
hydra-zen
Pythonic functions for creating and enhancing Hydra applications
Stars: ✭ 165 (+560%)
Mutual labels:  reproducibility
xarray-beam
Distributed Xarray with Apache Beam
Stars: ✭ 83 (+232%)
Mutual labels:  dask
graphchain
⚡️ An efficient cache for the execution of dask graphs.
Stars: ✭ 63 (+152%)
Mutual labels:  dask
coiled-resources
Notebooks that support blog posts and tech talks on Dask / Coiled.
Stars: ✭ 33 (+32%)
Mutual labels:  dask
reproducibility-guide
⛔ ARCHIVED ⛔
Stars: ✭ 119 (+376%)
Mutual labels:  reproducibility
Stumpy
STUMPY is a powerful and scalable Python library for modern time series analysis
Stars: ✭ 2,019 (+7976%)
Mutual labels:  dask
dask-sql
Distributed SQL Engine in Python using Dask
Stars: ✭ 271 (+984%)
Mutual labels:  dask
synthesizing-robust-adversarial-examples
My entry for ICLR 2018 Reproducibility Challenge for paper Synthesizing robust adversarial examples https://openreview.net/pdf?id=BJDH5M-AW
Stars: ✭ 60 (+140%)
Mutual labels:  reproducibility
binderhub-deploy
Deploy a BinderHub from scratch on Microsoft Azure
Stars: ✭ 27 (+8%)
Mutual labels:  reproducibility
targets-tutorial
Short course on the targets R package
Stars: ✭ 87 (+248%)
Mutual labels:  reproducibility
narps
Code related to Neuroimaging Analysis Replication and Prediction Study
Stars: ✭ 31 (+24%)
Mutual labels:  reproducibility
qhub
🪴 Nebari - your open source data science platform
Stars: ✭ 175 (+600%)
Mutual labels:  dask
fertile
creating optimal conditions for reproducibility
Stars: ✭ 52 (+108%)
Mutual labels:  reproducibility
EasyGitianBuilder
🔨 Gitian Building made simpler on any Windows Debian/Ubuntu MacOS with Vagrant, lxc, and virtualbox
Stars: ✭ 18 (-28%)
Mutual labels:  reproducibility
Xarray
N-D labeled arrays and datasets in Python
Stars: ✭ 2,353 (+9312%)
Mutual labels:  dask
benchmark VAE
Unifying Variational Autoencoder (VAE) implementations in Pytorch (NeurIPS 2022)
Stars: ✭ 1,211 (+4744%)
Mutual labels:  reproducibility
analysis-flow
Data Analysis Workflows & Reproducibility Learning Resources
Stars: ✭ 108 (+332%)
Mutual labels:  reproducibility
rna-seq-kallisto-sleuth
A Snakemake workflow for differential expression analysis of RNA-seq data with Kallisto and Sleuth.
Stars: ✭ 56 (+124%)
Mutual labels:  reproducibility
software-dev
Coding Standards for the USC Biostats group
Stars: ✭ 33 (+32%)
Mutual labels:  reproducibility

daskperiment

Latest Docs https://travis-ci.org/sinhrks/daskperiment.svg?branch=master

Overview

daskperiment is a tool to perform reproducible machine learning experiment. It allows users to define and manage the history of trials (given parameters, results and execution environment).

The package is built on Dask, a package for parallel computing with task scheduling. Each experiment trial is internally expressed as Dask computation graph, and can be executed in parallel.

Benefits

  • Compatibility with standard Python/Jupyter environment (and optionally with standard KVS).
    • No need to set up server applications
    • No need to registrate on any cloud services
    • Run on standard / customized Python shells
  • Intuitive user interface
    • Few modifications on existing codes are needed
    • Trial histories are logged automatically (no need to write additional codes for logging)
    • Dask compatible API
    • Easily accessible experiments history (with pandas basic operations)
    • Less managiment works on Git (no need to make branch per trials)
    • (Experimental) Web dashboard to manage trial history
  • Traceability of experiment related information
    • Trial result and its (hyper) parameters.
    • Code contexts
    • Environment information
      • Device information
      • OS information
      • Python version
      • Installed Python packages and its version
      • Git information
  • Reproducibility
    • Check function purity (each step should return the same output for the same inputs)
    • Automatic random seeding
  • Auto saving and loading of previous experiment history
  • Parallel execution of experiment steps
  • Experiment sharing
    • Redis backend
    • MongoDB backend

Future Scope

  • More efficient execution.
    • Omit execution if depending parameters are the same
    • Distributed execution
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].