Densely Connected Convolutional Networks

This is a PyTorch implementation of the DenseNet architecture as described in Densely Connected Convolutional Networks by G. Huang, Z. Liu, K. Weinberger, and L. van der Maaten.

To-do

Multi-GPU support
Unique model checkpointing (clashes can currently occur)

Requirements

Python 3
PyTorch (newest version)
tqdm
tensorboard_logger

Usage

This implementation currently supports training on the CIFAR-10 and CIFAR-100 datasets (support for ImageNet coming soon).

Basically, when training a model, you should specify whether to use the bottleneck variant of the dense block or not --bottleneck and if so, what compression factor to use --compression. You should also specify the total number of layers num_layers_total in the model. By default, data augmentation is performed which means no dropout, so if you choose to turn that off, you should specify a desired dropout rate --dropout_rate.

Furthermore, checkpoints of the model are saved at the end of every epoch. This means that you can resume training from your latest epoch by using the --resume=True argument. Note that this will work only after you've run at least 1 epoch of training. When testing a model, reuse whatever command you used to train the model and add the --is_train=False argument. This will load the model with the best validation accuracy and test it on the test set.

Note that you can use tensorboard to view losses and accuracy by setting the use_tensorboard argument in which case you need to run tensorboard --logdir=./logs/ in a separate shell.

Finally, to see all possible options, run:

python main.py --help

which will print:

usage: main.py [-h] [--num_blocks NUM_BLOCKS]
               [--num_layers_total NUM_LAYERS_TOTAL]
               [--growth_rate GROWTH_RATE] [--bottleneck BOTTLENECK]
               [--compression COMPRESSION] [--dataset DATASET]
               [--valid_size VALID_SIZE] [--batch_size BATCH_SIZE]
               [--num_worker NUM_WORKER] [--augment AUGMENT]
               [--shuffle SHUFFLE] [--show_sample SHOW_SAMPLE]
               [--is_train IS_TRAIN] [--epochs EPOCHS] [--init_lr INIT_LR]
               [--momentum MOMENTUM] [--weight_decay WEIGHT_DECAY]
               [--lr_decay LR_DECAY] [--dropout_rate DROPOUT_RATE]
               [--random_seed RANDOM_SEED] [--data_dir DATA_DIR]
               [--ckpt_dir CKPT_DIR] [--logs_dir LOGS_DIR] [--num_gpu NUM_GPU]
               [--use_tensorboard USE_TENSORBOARD] [--resume RESUME]
               [--print_freq PRINT_FREQ]

DenseNet

optional arguments:
  -h, --help            show this help message and exit

Network:
  --num_blocks NUM_BLOCKS
                        # of Dense blocks to use in the network
  --num_layers_total NUM_LAYERS_TOTAL
                        Total # of layers in the network
  --growth_rate GROWTH_RATE
                        Growth rate (k) of the network
  --bottleneck BOTTLENECK
                        Whether to use bottleneck layers
  --compression COMPRESSION
                        Compression factor theta in the range [0, 1]

Data:
  --dataset DATASET     Which dataset to work with. Can be CIFAR10, CIFAR100
                        or Imagenet
  --valid_size VALID_SIZE
                        Proportion of training set used for validation
  --batch_size BATCH_SIZE
                        # of images in each batch of data
  --num_worker NUM_WORKER
                        # of subprocesses to use for data loading
  --augment AUGMENT     Whether to apply data augmentation or not
  --shuffle SHUFFLE     Whether to shuffle the dataset after every epoch
  --show_sample SHOW_SAMPLE
                        Whether to visualize a sample grid of the data

Training:
  --is_train IS_TRAIN   Whether to train or test the model
  --epochs EPOCHS       # of epochs to train for
  --init_lr INIT_LR     Initial learning rate value
  --momentum MOMENTUM   Nesterov momentum value
  --weight_decay WEIGHT_DECAY
                        weight decay penalty
  --lr_decay LR_DECAY, --list LR_DECAY
                        List containing fractions of the total number of
                        epochs in which the learning rate is decayed. Enter
                        empty string if you want a constant lr.
  --dropout_rate DROPOUT_RATE
                        Dropout rate used with non-augmented datasets

Misc:
  --random_seed RANDOM_SEED
                        Seed to ensure reproducibility
  --data_dir DATA_DIR   Directory in which data is stored
  --ckpt_dir CKPT_DIR   Directory in which to save model checkpoints
  --logs_dir LOGS_DIR   Directory in which Tensorboard logs wil be stored
  --num_gpu NUM_GPU     # of GPU's to use. A value of 0 will run on the CPU
  --use_tensorboard USE_TENSORBOARD
                        Whether to use tensorboard for visualization
  --resume RESUME       Whether to resume training from most recent checkpoint
  --print_freq PRINT_FREQ
                        How frequently to display training details on screen

You can edit the default values of these arguments in the config.py file.

Here's an example command for training a DenseNet-BC-100 architecture with a growth rate of 12, data augmentation, tensorboard visualization and with GPU:

python main.py \
--num_layers_total=100 \
--bottleneck=True \
--compression=0.5 \
--num_gpu=1 \
--use_tensorboard=True

Performance

I trained DenseNet-40 and DenseNet-BC-100 variants on the CIFAR-10 dataset but was not able to reproduce the author's results. I don't know if this stems from an error in the implementation or an unlucky seed... Training the 2 models in parallel took 2 days to complete on a p2.xlarge AWS instance with 1 GPU. If I get the time, I'll create clean, minimal instructions for setting up a similar instance for free.

Model	Test Error
Densenet-40	9%
Densenet-BC-100	~ 7%

Here are some tensorboard visualizations comparing the two models:

From looking at the losses and accuracies, it's clearly visible that decreasing the learning rate at earlier times than mentioned in the paper can shorten the training time by a large factor. In fact, I noticed during training that the train accuracy and loss would just stagnate for a dozen epochs and then have a significant jump when the learning rate was decreased halway through. I'll be testing out this intuition and report my findings at a later date.

References

Thanks to Taehoon Kim for inspiring the general file hierarchy and layout of this project.
Thanks to the PyTorch ImageNet training example for helping me code the Trainer class.

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

kevinzakka / densenet

Programming Languages