All Projects → lsds → Crossbow

lsds / Crossbow

Licence: Apache-2.0 license
Crossbow: A Multi-GPU Deep Learning System for Training with Small Batch Sizes

Programming Languages

java
68154 projects - #9 most used programming language
c
50402 projects - #5 most used programming language
Cuda
1817 projects
shell
77523 projects
python
139335 projects - #7 most used programming language
C++
36643 projects - #6 most used programming language
Dockerfile
14818 projects

Projects that are alternatives of or similar to Crossbow

Malware Samples
Malware samples, analysis exercises and other interesting resources.
Stars: ✭ 241 (+363.46%)
Mutual labels:  training
Guided Missile Simulation
Guided Missile, Radar and Infrared EOS Simulation Framework written in Fortran.
Stars: ✭ 33 (-36.54%)
Mutual labels:  gpu-acceleration
ARGoal
Get more goals. | Virtual Goals & Goal Distance | App Doctor Hu
Stars: ✭ 14 (-73.08%)
Mutual labels:  training
Dawn Bench Entries
DAWNBench: An End-to-End Deep Learning Benchmark and Competition
Stars: ✭ 254 (+388.46%)
Mutual labels:  training
wptrainingteam.github.io
WordPress.org Training Team Home for Lists of Lesson Plans and Workshop Ideas
Stars: ✭ 39 (-25%)
Mutual labels:  training
authlab
A lab to play with authentication and authorisation problems
Stars: ✭ 80 (+53.85%)
Mutual labels:  training
Training
🐝 A fast, easy and collaborative open source image annotation tool for teams and individuals.
Stars: ✭ 2,615 (+4928.85%)
Mutual labels:  training
HackerOne-Lessons
Transcribed video lessons of HackerOne to pdf's
Stars: ✭ 104 (+100%)
Mutual labels:  training
pytorch-accelerated
A lightweight library designed to accelerate the process of training PyTorch models by providing a minimal, but extensible training loop which is flexible enough to handle the majority of use cases, and capable of utilizing different hardware options with no code changes required. Docs: https://pytorch-accelerated.readthedocs.io/en/latest/
Stars: ✭ 125 (+140.38%)
Mutual labels:  training
repo-template
Template for creating lesson plan repos, including the lesson plan template. Fork this or use it as a template to create new lesson plans.
Stars: ✭ 15 (-71.15%)
Mutual labels:  training
R-Bridge-Tutorial-Notebooks
Jupyter notebooks demonstrating setup and use of the R-ArcGIS bridge. The repo includes datasets required to run the Jupyter notebooks.
Stars: ✭ 56 (+7.69%)
Mutual labels:  training
adsy-trainings
Workshop and training materials
Stars: ✭ 13 (-75%)
Mutual labels:  training
KataContactsSwift
KataContacts written in Swift. The main goal is to practice Clean Architecture Development.
Stars: ✭ 60 (+15.38%)
Mutual labels:  training
Tesstrain
Train Tesseract LSTM with make
Stars: ✭ 251 (+382.69%)
Mutual labels:  training
gpuvmem
GPU Framework for Radio Astronomical Image Synthesis
Stars: ✭ 27 (-48.08%)
Mutual labels:  gpu-acceleration
Symfony Pack
A series of questions to prepare for the Symfony certification
Stars: ✭ 241 (+363.46%)
Mutual labels:  training
restful-booker
A free to use Web API for practising API testing on
Stars: ✭ 104 (+100%)
Mutual labels:  training
brian2cuda
A brian2 extension to simulate spiking neural networks on GPUs
Stars: ✭ 46 (-11.54%)
Mutual labels:  gpu-acceleration
csharpworkshop
Workshop content on Learning C# on Linux using .NET Core
Stars: ✭ 33 (-36.54%)
Mutual labels:  training
cef-mixer
High Performance off-screen rendering (OSR) demo using CEF
Stars: ✭ 183 (+251.92%)
Mutual labels:  gpu-acceleration

Crossbow: A Multi-GPU Deep Learning System for Training with Small Batch Sizes

Crossbow is a multi-GPU system for training deep learning models that allows users to choose freely their preferred batch size, however small, while scaling to multiple GPUs.

Crossbow utilises modern GPUs better than other systems by training multiple model replicas on the same GPU. When the batch size is sufficiently small to leave GPU resources unused, Crossbow trains a second model replica, a third, etc., as long as training throughput increases.

To synchronise many model replicas, Crossbow uses synchronous model averaging to adjust the trajectory of each individual replica based on the average of all. With model averaging, the batch size does not increase linearly with the number of model replicas, as it would with synchronous SGD. This yields better statistical efficiency without cumbersome hyper-parameter tuning when trying to scale training to a larger number of GPUs.

See our VLDB 2019 paper for more details.

The system supports a variety of training algorithms, including synchronous SGD. We are working to seemlesly port existing TensorFlow models to Crossbow.

Installing Crossbow

Prerequisites

Crossbow has been primarily tested on Ubuntu Linux 16.04. It requires the following Linux packages:

$ sudo apt-get install build-essential git openjdk-8-jdk maven libboost-all-dev graphviz wget

Crossbow requires NVIDIA's CUDA toolkit, the cuDDN library and the NCCL library (currently using versions 8.0, 6.0, and 2.1.15, respectively). After successful installation, make sure that:

  • CUDA_HOME is set (the default location is /usr/local/cuda)
  • NCCL_HOME is set

and that:

  • PATH includes $CUDA_HOME/bin and
  • LD_LIBRARY_PATH includes $CUDA_HOME/lib64 and $NCCL_HOME/lib

Crossbow also requires the OpenBLAS and libjpeg-turbo libraries. After successful installation, make sure that:

  • BLAS_HOME is set (the default location is /opt/OpenBLAS)
  • JPEG_HOME is set

and that:

  • LD_LIBRARY_PATH includes $BLAS_HOME/lib and $JPEG_HOME/lib

Configure OS

Crossbow uses page-locked memory regions to speed up data transfers from CPU to GPU and vice versa. The amount of memory locked by the system usually exceeds the default OS limit. Edit /etc/security/limits.conf and append the following lines to the end of the file:

*	hard	memlock	unlimited
* 	soft	memlock	unlimited

Save changes and reboot the machine.

Building Crossbow

Assuming all enviromental variables have been set, build Crossbow's Java and C/C++ library:

$ git clone http://github.com/lsds/Crossbow.git
$ cd Crossbow
$ export CROSSBOW_HOME=`pwd`
$ ./scripts/build.sh

Note: We will shortly add an installation script as well as a Docker image to simplify the installation process and avoid library conflicts.

Training one of our benchmark models

ResNet-50

Crossbow serialises ImageNet images and their labels into a binary format similar to TensorFlow's TFRecord. Follow TensorFlow's instructions to download and convert the dataset to TFRecord format. You will end up with 1,024 training and 128 validation record files in a directory of your choice (say, /data/imagenet/tfrecords). Then, run:

$ cd $CROSSBOW_HOME
$ ./scripts/datasets/imagenet/prepare-imagenet.sh /data/imagenet/tfrecords /data/imagenet/crossbow

The script will convert TensorFlow's record files to Crossbow's own binary format and store them in /data/imagenet/crossbow. You are now ready to train ResNet-50 with the ImageNet data set:

$ ./scripts/benchmarks/resnet-50.sh

LeNet

The first script downloads the MNIST data set and converts it to Crossbow's binary record format. Output files are written in $CROSSBOW_HOME/data/mnist/b-001 and they are tailored to a specific batch size (in this case, 1). The second script will train LeNet with the MNIST data set.

$ cd $CROSSBOW_HOME
$ ./scripts/datasets/mnist/prepare-mnist.sh
$ ./scripts/benchmarks/lenet.sh

Others

Crossbow supports the entire ResNet family of neural networks. It also supports VGG-16 based on the implementation here. It supports the convnet-benchmarks suite of micro-benchmarks too.

Note: We will shortly add a page describing how to configure Crossbow's system parameters.

Trying your first Crossbow program

Crossbow represents a deep learning application as a data flow graph: nodes represent operations and edges the data (multi-dimensional arrays, also known as tensors) that flow among them. The most notable operators are inner-product, pooling, convolutional layers and activation functions. Some of these operators have learnable parameters (also multi-dimensional arrays) that form part of the model being trained. An inner-product operator, for example, has two learnable parameters, weights and bias:

InnerProductConf conf = new InnerProductConf ();

/* Let's assume that there are 10 possible output labels, as in MNIST */
conf.setNumberOfOutputs (10);

/* Initialise weights with values drawn a random Gaussian distribution; 
 * and all of bias elements with the same value */
conf.setWeightInitialiser (new InitialiserConf ().setType (InitialiserType.GAUSSIAN).setStd(0.1F));
conf.setBiasInitialiser   (new InitialiserConf ().setType (InitialiserType.CONSTANT).setValue(1F));

/* Create inner-product operator and wrap it in a graph node */
Operator op = new Operator ("InnerProduct", new InnerProduct (conf));
DataflowNode innerproduct = new DataflowNode (op);

Connect data flow nodes together to form a neural network. For example, we can connect the forward layers of a logistic regression model:

innerproduct.connectTo(softmax).connectTo(loss);

At the end, we can construct our model and train it for 1 epoch:

SubGraph subgraph = new SubGraph (innerproduct);
Dataflow dataflow = new Dataflow (subgraph).setPhase(Phase.TRAIN);
ExecutionContext context = new ExecutionContext (new Dataflow [] { dataflow, null });
context.init();
context.train(1, TrainingUnit.EPOCHS);

The full source code is available here.

For more information

Licence

Apache License 2.0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].