All Projects → ucla-labx → Distbelief

ucla-labx / Distbelief

Licence: gpl-3.0
Implementing Google's DistBelief paper

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Distbelief

Objstore
A Multi-Master Distributed Caching Layer for Amazon S3.
Stars: ✭ 69 (-21.59%)
Mutual labels:  distributed-systems
Harvey
A distributed operating system
Stars: ✭ 1,204 (+1268.18%)
Mutual labels:  distributed-systems
Dolphin
Distributed API Gateway
Stars: ✭ 84 (-4.55%)
Mutual labels:  distributed-systems
Cause
An EDN-like CRDT (Causal Tree) for Clojure & ClojureScript that automatically tracks history and resolves conflicts.
Stars: ✭ 68 (-22.73%)
Mutual labels:  distributed-systems
Go Craq
CRAQ (Chain Replication with Apportioned Queries) in Go
Stars: ✭ 75 (-14.77%)
Mutual labels:  distributed-systems
Sfs
The distributed object storage server used by PitchPoint Solutions to securely store billions of large and small files using minimal resources. Object data is stored in replicated volumes implemented like Facebooks Haystack Object Store. Object metadata which essentially maps an object name to a volume position is stored in an elasticsearch index.
Stars: ✭ 78 (-11.36%)
Mutual labels:  distributed-systems
Distributedsystems
My Distributed Systems references
Stars: ✭ 67 (-23.86%)
Mutual labels:  distributed-systems
Uuid Random
Fastest UUID with cryptographic PRNG for JS
Stars: ✭ 87 (-1.14%)
Mutual labels:  distributed-systems
Trustgraph
Decentralized trust ratings using signed claims
Stars: ✭ 75 (-14.77%)
Mutual labels:  distributed-systems
Go2p
Simple to use but full configurable p2p framework
Stars: ✭ 80 (-9.09%)
Mutual labels:  distributed-systems
Gnes
GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep neural network.
Stars: ✭ 1,178 (+1238.64%)
Mutual labels:  distributed-systems
Testing Distributed Systems
Curated list of resources on testing distributed systems
Stars: ✭ 1,187 (+1248.86%)
Mutual labels:  distributed-systems
Kbfs
Keybase Filesystem (KBFS)
Stars: ✭ 1,218 (+1284.09%)
Mutual labels:  distributed-systems
Crail
[Archived] A Fast Multi-tiered Distributed Storage System based on User-Level I/O
Stars: ✭ 69 (-21.59%)
Mutual labels:  distributed-systems
Disel
Distributed Separation Logic: a framework for compositional verification of distributed protocols and their implementations in Coq
Stars: ✭ 85 (-3.41%)
Mutual labels:  distributed-systems
Distkv
A light weight distributed key-value database system with table concept.
Stars: ✭ 69 (-21.59%)
Mutual labels:  distributed-systems
Rsf
已作为 Hasor 的子项目,迁移到:http://git.oschina.net/zycgit/hasor
Stars: ✭ 77 (-12.5%)
Mutual labels:  distributed-systems
Storj
Ongoing Storj v3 development. Decentralized cloud object storage that is affordable, easy to use, private, and secure.
Stars: ✭ 1,278 (+1352.27%)
Mutual labels:  distributed-systems
Docker Superset
Repository for Docker Image of Apache-Superset. [Docker Image: https://hub.docker.com/r/abhioncbr/docker-superset]
Stars: ✭ 86 (-2.27%)
Mutual labels:  distributed-systems
Microdot
Microdot: An open source .NET microservices framework
Stars: ✭ 1,222 (+1288.64%)
Mutual labels:  distributed-systems

distbelief

Implementing Google's DistBelief paper.

Check out the blog post!

Installation/Development instructions

To install the latest stable version (pytorch-distbelief 0.1.0), run pip install pytorch-distbelief

Otherwise, you can build and run the latest master with the instructions below.

You'll want to create a python3 virtualenv first by running make setup, after which, you should run make install.

You'll then be able to use distbelief by importing distbelief

from distbelief.optim import DownpourSGD

optimizer = DownpourSGD(net.parameters(), lr=0.1, n_push=5, n_pull=5, model=net)

As an example, you can see our implementation running by using the script provided in example/main.py.

To run a 2-training node setup locally, open up three terminal windows, source the venv and then run make first, make second, and make server. This will begin training AlexNet on CIFAR10 locally with all default params.

Benchmarking

NOTE: we graph the train/test accuracy of each node, hence node1, node2, node3. A better comparison would be to evaluate the parameter server's params and use that value. However we can see that the accuracy between the three nodes is fairly consistent, and adding an evaluator might put too much stress on our server.

We scale the learning rate of the nodes to be learning_rate/freq (.03) .

train

test

We used AWS c4.xlarge instances to compare the CPU runs, and a GTX 1060 for the GPU run.

DownpourSGD for PyTorch

Diagram

Here 2 and 3 happen concurrently.

You can read more about our implementation here.

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].