daskperiment
Overview
daskperiment is a tool to perform reproducible machine learning experiment. It allows users to define and manage the history of trials (given parameters, results and execution environment).
The package is built on Dask, a package for parallel computing with task scheduling. Each experiment trial is internally expressed as Dask computation graph, and can be executed in parallel.
Benefits
- Compatibility with standard Python/Jupyter environment (and optionally with standard KVS).
- No need to set up server applications
- No need to registrate on any cloud services
- Run on standard / customized Python shells
- Intuitive user interface
- Few modifications on existing codes are needed
- Trial histories are logged automatically (no need to write additional codes for logging)
- Dask compatible API
- Easily accessible experiments history (with pandas basic operations)
- Less managiment works on Git (no need to make branch per trials)
- (Experimental) Web dashboard to manage trial history
- Traceability of experiment related information
- Trial result and its (hyper) parameters.
- Code contexts
- Environment information
- Device information
- OS information
- Python version
- Installed Python packages and its version
- Git information
- Reproducibility
- Check function purity (each step should return the same output for the same inputs)
- Automatic random seeding
- Auto saving and loading of previous experiment history
- Parallel execution of experiment steps
- Experiment sharing
- Redis backend
- MongoDB backend
Future Scope
- More efficient execution.
- Omit execution if depending parameters are the same
- Distributed execution