All Projects → replicate → Keepsake

replicate / Keepsake

Licence: apache-2.0
Version control for machine learning

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Keepsake

Gojot
A command-line journal that is distributed and encrypted, making it easy to jot notes 📓
Stars: ✭ 340 (-74.94%)
Mutual labels:  version-control
Gitcommands
Here is a list of some basic Git commands to get you going with Git
Stars: ✭ 11 (-99.19%)
Mutual labels:  version-control
Git2rdata
An R package for storing and retrieving data.frames in git repositories.
Stars: ✭ 84 (-93.81%)
Mutual labels:  version-control
Centraldogma
Highly-available version-controlled service configuration repository based on Git, ZooKeeper and HTTP/2
Stars: ✭ 378 (-72.14%)
Mutual labels:  version-control
Snowfs
SnowFS - a fast, scalable version control file storage for graphic files 🎨
Stars: ✭ 590 (-56.52%)
Mutual labels:  version-control
Attaca
Robust, distributed version control for large files.
Stars: ✭ 41 (-96.98%)
Mutual labels:  version-control
Code Forensics
A toolset for code analysis and report visualisation
Stars: ✭ 277 (-79.59%)
Mutual labels:  version-control
Vbasync
Cross-platform tool to synchronize macros from an Office VBA-enabled file with a version-controlled folder
Stars: ✭ 98 (-92.78%)
Mutual labels:  version-control
Pailab
a package for versioning, automatization and analysis of machine learning development
Stars: ✭ 25 (-98.16%)
Mutual labels:  version-control
Dotfile
Simple version control made for tracking single files
Stars: ✭ 71 (-94.77%)
Mutual labels:  version-control
Jupytext
Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts
Stars: ✭ 4,969 (+266.18%)
Mutual labels:  version-control
Diff Hl
Emacs package for highlighting uncommitted changes
Stars: ✭ 553 (-59.25%)
Mutual labels:  version-control
Python Aos Lesson
Python for Atmosphere and Ocean Scientists
Stars: ✭ 49 (-96.39%)
Mutual labels:  version-control
Dbngin
DB Engine
Stars: ✭ 344 (-74.65%)
Mutual labels:  version-control
Nodist
Natural node.js and npm version manager for windows.
Stars: ✭ 1,276 (-5.97%)
Mutual labels:  version-control
Datmo
Open source production model management tool for data scientists
Stars: ✭ 334 (-75.39%)
Mutual labels:  version-control
Nhversion
NHVersion for version your api
Stars: ✭ 13 (-99.04%)
Mutual labels:  version-control
Sno
Distributed version-control for geospatial and tabular data
Stars: ✭ 100 (-92.63%)
Mutual labels:  version-control
S3git
s3git: git for Cloud Storage. Distributed Version Control for Data. Create decentralized and versioned repos that scale infinitely to 100s of millions of files. Clone huge PB-scale repos on your local SSD to make changes, commit and push back. Oh yeah, it dedupes too and offers directory versioning.
Stars: ✭ 1,287 (-5.16%)
Mutual labels:  version-control
Libgit2
A cross-platform, linkable library implementation of Git that you can use in your application.
Stars: ✭ 8,208 (+504.86%)
Mutual labels:  version-control

Keepsake

Version control for machine learning.

Keepsake is a Python library that uploads files and metadata (like hyperparameters) to Amazon S3 or Google Cloud Storage. You can get the data back out using the command-line interface or a notebook.

  • Track experiments: Automatically track code, hyperparameters, training data, weights, metrics, Python dependencies — everything.
  • Go back in time: Get back the code and weights from any checkpoint if you need to replicate your results or commit to Git after the fact.
  • Version your models: Model weights are stored on your own Amazon S3 or Google Cloud bucket, so it's really easy to feed them into production systems.

How it works

Just add two lines to your training code:

import torch
import keepsake

def train():
    # Save training code and hyperparameters
    experiment = keepsake.init(path=".", params={...})
    model = Model()

    for epoch in range(num_epochs):
        # ...

        torch.save(model, "model.pth")
        # Save model weights and metrics
        experiment.checkpoint(path="model.pth", metrics={...})

Then Keepsake will start tracking everything: code, hyperparameters, training data, weights, metrics, Python dependencies, and so on.

  • Open source & community-built: We’re trying to pull together the ML community so we can build this foundational piece of technology together.
  • You're in control of your data: All the data is stored on your own Amazon S3 or Google Cloud Storage as plain old files. There's no server to run.
  • It works with everything: Tensorflow, PyTorch, scikit-learn, XGBoost, you name it. It's just saving files and dictionaries – export however you want.

Features

Throw away your spreadsheet

Your experiments are all in one place, with filter and sort. Because the data's stored on S3, you can even see experiments that were run on other machines.

$ keepsake ls --filter "val_loss<0.2"
EXPERIMENT   HOST         STATUS    BEST CHECKPOINT
e510303      10.52.2.23   stopped   49668cb (val_loss=0.1484)
9e97e07      10.52.7.11   running   41f0c60 (val_loss=0.1989)

Analyze in a notebook

Don't like the CLI? No problem. You can retrieve, analyze, and plot your results from within a notebook. Think of it like a programmable Tensorboard.

Compare experiments

It diffs everything, all the way down to versions of dependencies, just in case that latest Tensorflow version did something weird.

$ keepsake diff 49668cb 41f0c60
Checkpoint:       49668cb     41f0c60
Experiment:       e510303     9e97e07

Params
learning_rate:    0.001       0.002

Python Packages
tensorflow:       2.3.0       2.3.1

Metrics
train_loss:       0.4626      0.8155
train_accuracy:   0.7909      0.7254
val_loss:         0.1484      0.1989
val_accuracy:     0.9607      0.9411

Commit to Git, after the fact

If you eventually want to store your code on Git, there's no need to commit everything as you go. Keepsake lets you get back to any point you called experiment.checkpoint() so, you can commit to Git once you've found something that works.

$ keepsake checkout f81069d
Copying code and weights to working directory...

# save the code to git
$ git commit -am "Use hinge loss"

Load models in production

You can use Keepsake to feed your models into production systems. Connect them back to how they were trained, who trained them, and what their metrics were.

import keepsake
model = torch.load(keepsake.experiments.get("e45a203").best().open("model.pth"))

Install

pip install -U keepsake

Get started

If you prefer training scripts and the CLI, follow the our tutorial to learn how Keepsake works.

If you prefer working in notebooks, follow our notebook tutorial on Colab.

If you like to learn concepts first, read our guide about how Keepsake works.

Get involved

Everyone uses version control for software, but it is much less common in machine learning.

Why is this? We spent a year talking to people in the ML community and this is what we found out:

  • Git doesn’t work well with machine learning. It can’t handle large files, it can’t handle key/value metadata like metrics, and it can’t commit automatically in your training script. There are some solutions for this, but they feel like band-aids.
  • It should be open source. There are a number of proprietary solutions, but something so foundational needs to be built by and for the ML community.
  • It needs to be small, easy to use, and extensible. We found people struggling to integrate with “AI Platforms”. We want to make a tool that does one thing well and can be combined with other tools to produce the system you need.

We think the ML community needs a good version control system. But, version control systems are complex, and to make this a reality we need your help.

Have you strung together some shell scripts to build this for yourself? Are you interested in the problem of making machine learning reproducible?

Here are some ways you can help out:

Contributing & development environment

Take a look at our contributing instructions.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].