All Projects → datmo → Datmo

datmo / Datmo

Licence: mit
Open source production model management tool for data scientists

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Datmo

Deep Learning Book
Repository for "Introduction to Artificial Neural Networks and Deep Learning: A Practical Guide with Applications in Python"
Stars: ✭ 2,705 (+709.88%)
Mutual labels:  artificial-intelligence, data-science
Shogun
Shōgun
Stars: ✭ 2,859 (+755.99%)
Mutual labels:  artificial-intelligence, data-science
Reproducibilidad
Reproducible Science: what, why, how
Stars: ✭ 39 (-88.32%)
Mutual labels:  version-control, reproducibility
Igel
a delightful machine learning tool that allows you to train, test, and use models without writing code
Stars: ✭ 2,956 (+785.03%)
Mutual labels:  artificial-intelligence, data-science
Datascience course
Curso de Data Science em Português
Stars: ✭ 294 (-11.98%)
Mutual labels:  artificial-intelligence, data-science
Voice Gender
Gender recognition by voice and speech analysis
Stars: ✭ 248 (-25.75%)
Mutual labels:  artificial-intelligence, data-science
Polyaxon
Machine Learning Platform for Kubernetes (MLOps tools for experimentation and automation)
Stars: ✭ 2,966 (+788.02%)
Mutual labels:  artificial-intelligence, data-science
Ml Auto Baseball Pitching Overlay
⚾🤖⚾ Automatic baseball pitching overlay in realtime
Stars: ✭ 200 (-40.12%)
Mutual labels:  artificial-intelligence, data-science
Machine Learning For Trading
Code for Machine Learning for Algorithmic Trading, 2nd edition.
Stars: ✭ 4,979 (+1390.72%)
Mutual labels:  artificial-intelligence, data-science
Cryptocurrency Price Prediction
Cryptocurrency Price Prediction Using LSTM neural network
Stars: ✭ 271 (-18.86%)
Mutual labels:  artificial-intelligence, data-science
Datascience
Curated list of Python resources for data science.
Stars: ✭ 3,051 (+813.47%)
Mutual labels:  artificial-intelligence, data-science
Ai Learn
人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
Stars: ✭ 4,387 (+1213.47%)
Mutual labels:  artificial-intelligence, data-science
Prodigy Recipes
🍳 Recipes for the Prodigy, our fully scriptable annotation tool
Stars: ✭ 229 (-31.44%)
Mutual labels:  artificial-intelligence, data-science
Darwinexlabs
Datasets, tools and more from Darwinex Labs - Prop Investing Arm & Quant Team @ Darwinex
Stars: ✭ 248 (-25.75%)
Mutual labels:  artificial-intelligence, data-science
Lale
Library for Semi-Automated Data Science
Stars: ✭ 198 (-40.72%)
Mutual labels:  artificial-intelligence, data-science
Atlas
An Open Source, Self-Hosted Platform For Applied Deep Learning Development
Stars: ✭ 259 (-22.46%)
Mutual labels:  artificial-intelligence, data-science
Imodels
Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
Stars: ✭ 194 (-41.92%)
Mutual labels:  artificial-intelligence, data-science
Pytorch Geometric Yoochoose
This is a tutorial for PyTorch Geometric on the YooChoose dataset
Stars: ✭ 198 (-40.72%)
Mutual labels:  artificial-intelligence, data-science
Gophernotes
The Go kernel for Jupyter notebooks and nteract.
Stars: ✭ 3,100 (+828.14%)
Mutual labels:  artificial-intelligence, data-science
Targets
Function-oriented Make-like declarative workflows for R
Stars: ✭ 293 (-12.28%)
Mutual labels:  data-science, reproducibility

Datmo Logo

PyPI version Coverage Status Documentation Status Codacy Badge

OS CI testing on master
Linux
CircleCI branch
Windows

Datmo Alpha Release

Datmo is an open source production model management tool for data scientists. Use datmo init to turn any repository into a trackable experiment record. Sync using your own cloud.

Note: The current version of Datmo is an alpha release. This means commands are subject to change and more features will be added. If you find any bugs please feel free contribute by adding issues so the contributors can address them.

Features

  • One command environment setup (languages, frameworks, packages, etc)
  • Tracking and logging for model config and results
  • Project versioning (model state tracking)
  • Experiment reproducibility (re-run tasks)
  • Visualize + export experiment history
  • (coming soon) Dashboards to visualize experiments
Feature Commands
Initializing a Project $ datmo init
Setup a new environment $ datmo environment setup
Run an experiment $ datmo run "python filename.py"
Reproduce a previous experiment $ datmo ls (Find the desired ID)
$ datmo rerun EXPERIMENT_ID
Open a workspace $ datmo notebook (Jupyter Notebook)
$ datmo jupyterlab (JupyterLab)
$ datmo rstudio (RStudio)
$ datmo terminal (Terminal)
Record your project state
(Files, code, env, config, stats)
$ datmo snapshot create -m "My first snapshot!"
Switch to a previous project state $ datmo snapshot ls (Find the desired ID)
$ datmo snapshot checkout SNAPSHOT_ID
Visualize project entities $ datmo ls (Experiments)
$ datmo snapshot ls (Snapshots)
$ datmo environment ls (Environments)

Table of Contents

Installation

Requirements:

docker (installed and running before starting) : Instructions for Ubuntu, MacOS, Windows

$ pip install datmo

Hello-World

Our hello world guide includes showing environment setup and changes, as well as experiment reproducibility. It's available in our docs here.

Examples

In the /examples folder we have a few scripts you can run to get a feel for datmo. You can navigate to Examples to learn more about how you can run the examples and get started with your own projects.

For more advanced tutorials, check out our dedicated tutorial repository here.

Environment Setup

Setting up an environment is extremely easy in datmo. Simply respond with y when asked about environment setup during initialization, or use datmo environment setup at any point. Then follow the resulting prompts.

One example is shown below, for setting up a Python 2.7 TensorFlow with CPU reqs/drivers.

For the full guide on setting up your environment with datmo, see this page in our documentation here.

Opening a workspace

After getting your environment setup, most data scientists want to open what we call a workspace (IDE or Notebook programming environment)

One example is shown below, for quickly opening a Jupyter Notebook and showing the import of TensorFlow working as intended.

Experiment Running and Tracking

Here's a comparison of a typical logistic regression model with one leveraging Datmo.

Normal Script With Datmo
# train.py
#
from sklearn import datasets
from sklearn import linear_model as lm
from sklearn import model_selection as ms
from sklearn import externals as ex
#
#
#
#
#
#
iris_dataset = datasets.load_iris()
X = iris_dataset.data
y = iris_dataset.target
data = ms.train_test_split(X, y)
X_train, X_test, y_train, y_test = data
#
model = lm.LogisticRegression(solver="newton-cg")
model.fit(X_train, y_train)
ex.joblib.dump(model, 'model.pkl')
#
train_acc = model.score(X_train, y_train)
test_acc = model.score(X_test, y_test)
#
print(train_acc)
print(test_acc)
#
#
#
#
#
#
#
#
#
# train.py
#
from sklearn import datasets
from sklearn import linear_model as lm
from sklearn import model_selection as ms
from sklearn import externals as ex
import datmo # extra line
#
config = {
    "solver": "newton-cg"
} # extra line
#
iris_dataset = datasets.load_iris()
X = iris_dataset.data
y = iris_dataset.target
data = ms.train_test_split(X, y)
X_train, X_test, y_train, y_test = data
#
model = lm.LogisticRegression(**config)
model.fit(X_train, y_train)
ex.joblib.dump(model, "model.pkl")
#
train_acc = model.score(X_train, y_train)
test_acc = model.score(X_test, y_test)
#
stats = {
    "train_accuracy": train_acc,
    "test_accuracy": test_acc
} # extra line
#
datmo.snapshot.create(
    message="my first snapshot",
    filepaths=["model.pkl"],
    config=config,
    stats=stats
) # extra line

In order to run the above code you can do the following.

  1. Navigate to a directory with a project

     $ mkdir MY_PROJECT
     $ cd MY_PROJECT
    
  2. Initialize a datmo project

     $ datmo init
    
  3. Copy the datmo code above into a train.py file in your MY_PROJECT directory

  4. Run the script like you normally would in python

     $ python train.py
    
  5. Congrats! You just created your first snapshot :) Now run an ls command for snapshots to see your first snapshot.

     $ datmo snapshot ls
    

How it works

Project Structure

When running datmo init, Datmo adds a hidden .datmo directory which keeps track of all of the various entities at play. This is ncessary to render a repository datmo-enabled.

Environments, Snapshots, and Runs

See our concepts page in the documentation to see how the moving parts work together in datmo.

Documentation

The full docs are hosted here. If you wish to contribute to the docs (source code located here in /docs), follow the procedure outlined in CONTRIBUTING.md.

Transform a Current Project

You can transform your existing repository into a datmo enabled repository with the following command

$ datmo init

If at any point you would like to remove datmo you can just remove the .datmo directory from your repository or you can run the following command

$ datmo cleanup

Sharing (Workaround)

DISCLAIMER: This is not currently an officially supported option and only works for file-based storage layers (as set in the configuration) as a workaround to share datmo projects.

Although datmo is made to track changes locally, you can share a project by pushing to a remote server by doing the following (this is shown only for git, if you are using another SCM tracking tool, you can likely do something similar). If your files are too big or cannot be added to SCM then this may not work for you.

The below has been tested on BASH terminals only. If you are using another terminal, you may run into some errors.

Push to remote

$ git add -f .datmo/*  # add in .datmo to your scm
$ git commit -m "adding .datmo to tracking"  # commit it to your scm
$ git push  # push to remote
$ git push origin +refs/datmo/*:refs/datmo/*  # push datmo refs to remote

The above will allow you to share datmo results and entities with yourself or others on other machines. NOTE: you will have to remove .datmo/ from tracking to start using datmo on the other machine or another location. See the instructions below to see how to replicate it at another location

Pull from remote

$ git clone YOUR_REMOTE_URL
$ cd YOUR_REPO 
$ echo '.datmo/*' > .git/info/exclude  # include .datmo into your .git exclude
$ git rm -r --cached .datmo  # remove cached versions of .datmo from scm
$ git commit -m "removed .datmo from tracking"  # clean up your scm so datmo can work 
$ git pull origin +refs/datmo/*:refs/datmo/*  # pull datmo refs from remote
$ datmo init  # This enables datmo in the new location. If you enter blanks, no project information will be updated

If you are interested in sharing using the datmo protocol, you can visit Datmo's website

FAQs

Q: What do I do if the datmo stop --all doesn't work and I cannot start a new container due to port reallocation?
A: This could be caused by a ghost container running from another datmo project or another container. Either you can create a docker image with a specific port allocation (other than 8888), find the docker image, stop it, and remove it using docker ps --all and docker conntainer stop <ID> and docker container rm <ID>. Or you can stop and remove all images running on the machine [NOTE: This may affect other docker processes on your machine so PROCEED WITH CAUTION] docker container stop $(docker ps -a -q) and docker container rm $(docker ps -a -q)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].