Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → eladhoffer → Imagenet Training

eladhoffer / Imagenet Training

Licence: mit

ImageNet training using torch

Programming Languages

lua

6591 projects

Deep Learning on ImageNet using Torch

This is a complete training example for Deep Convolutional Networks on the ILSVRC classification task.

Data is preprocessed and cached as a LMDB data-base for fast reading. A separate thread buffers images from the LMDB record in the background.

Multiple GPUs are also supported by using nn.DataParallelTable (https://github.com/torch/cunn/blob/master/docs/cunnmodules.md).

This code allows training at 4ms/sample with the AlexNet model and 2ms for testing on a single GPU (using Titan Z with 1 active gpu)

Dependencies

Torch (http://torch.ch)
"eladtools" (https://github.com/eladhoffer/eladtools) for optimizer.
"lmdb.torch" (http://github.com/eladhoffer/lmdb.torch) for LMDB usage.
"DataProvider.torch" (https://github.com/eladhoffer/DataProvider.torch) for DataProvider class.
"cudnn.torch" (https://github.com/soumith/cudnn.torch) for faster training. Can be avoided by changing "cudnn" to "nn" in models.

To install all dependencies (assuming torch is installed) use:

luarocks install https://raw.githubusercontent.com/eladhoffer/eladtools/master/eladtools-scm-1.rockspec
luarocks install https://raw.githubusercontent.com/eladhoffer/lmdb.torch/master/lmdb.torch-scm-1.rockspec
luarocks install https://raw.githubusercontent.com/eladhoffer/DataProvider.torch/master/dataprovider-scm-1.rockspec

Data

To get the ILSVRC data, you should register on their site for access: http://www.image-net.org/
Extract all archives and configure the data location and save dir in Config.lua. You can also change the saved image size by editing the default value ImageMinSide=256.
LMDB records for fast read access are created by running CreateLMDBs.lua. It defaults to saving the compressed jpgs (about ~24GB for training data, ~1GB for validation data when smallest image dimension is 256).
To validate the LMDB configuration and test its loading speed, you can run TestLMDBs.lua.
All data related functions used for training are available at Data.lua.

Model configuration

Network model is defined by writing a .lua file in Models folder, and selecting it using the network flag. The model file must return a trainable network. It can also specify additional training options such optimization regime, input size modifications.

e.g for a model file:

local model = nn.Sequential():add(...)
return  --optional: you can also simply return model
{
  model = model,
  regime = {
    epoch        = {1,    19,   30,   44,   53  },
    learningRate = {1e-2, 5e-3, 1e-3, 5e-4, 1e-4},
    weightDecay  = {5e-4, 5e-4, 0,    0,    0   }
  }
}

Currently available in Models folder are: AlexNet, MattNet, OverFeat, GoogLeNet, CaffeRef, NiN. Some are available with a batch normalized version (denoted with _BN)

Training

You can start training using Main.lua by typing:

th Main.lua -network AlexNet -LR 0.01

or if you have 2 gpus availiable,

th Main.lua -network AlexNet -LR 0.01 -nGPU 2 -batchSize 256

A more elaborate example continuing a pretrained network and saving intermediate results

th Main.lua -network GoogLeNet_BN -batchSize 64 -nGPU 2 -save GoogLeNet_BN -bufferSize 9600 -LR 0.01 -checkpoint 320000 -weightDecay 1e-4 -load ./pretrainedNet.t7

Buffer size should be adjusted to suit the used hardware and configuration. Default value is 5120 (40 batches of 128) which works well when using a non SSD drive and 16GB ram. Bigger buffer size allows better sample shuffling.

Output

Training output will be saved to folder defined with save flag.

The complete netowork will be saved on each epoch as Net_<#epoch>.t7 along with

A complete log Log.txt
Error rate summary ErrorRate.log and accompanying ErrorRate.log.eps graph

Additional flags

Flag	Default Value	Description
modelsFolder	./Models/	Models Folder
network	AlexNet	Model file - must return valid network.
LR	0.01	learning rate
LRDecay	0	learning rate decay (in # samples)
weightDecay	5e-4	L2 penalty on the weights
momentum	0.9	momentum
batchSize	128,	batch size
optimization	'sgd'	optimization method
seed	123	torch manual random number generator seed
epoch	-1	number of epochs to train, -1 for unbounded
testonly	false	Just test loaded net on validation set
threads	8	number of threads
type	'cuda'	float or cuda
bufferSize	5120	buffer size
devid	1	device ID (if using CUDA)
nGPU	1	num of gpu devices used
constBatchSize	false	do not allow varying batch sizes - e.g for ccn2 kernel
load	''	load existing net weights
save	time-identifier	save directory
optState	false	Save optimization state every epoch
checkpoint	0	Save a weight check point every n samples. 0 for off
augment	1	data augmentation level - {1 - simple mirror and crops, 2 +scales, 3 +rotations}
estMeanStd	preDef	estimate mean and std. Options: {preDef, simple, channel, image}
shuffle	true	shuffle training samples

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 103

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (5) 🔗