All Projects → cvjena → Alpha_pooling

cvjena / Alpha_pooling

Licence: bsd-2-clause
Code for our paper "Generalized Orderless Pooling Performs Implicit Salient Matching" published at ICCV 2017.

Projects that are alternatives of or similar to Alpha pooling

Deep Learning With Python
Deep learning codes and projects using Python
Stars: ✭ 195 (+282.35%)
Mutual labels:  jupyter-notebook, image-classification, vgg16
Tensorflow2.0 Examples
🙄 Difficult algorithm, Simple code.
Stars: ✭ 1,397 (+2639.22%)
Mutual labels:  jupyter-notebook, image-classification, vgg16
Pytorch classification
利用pytorch实现图像分类的一个完整的代码,训练,预测,TTA,模型融合,模型部署,cnn提取特征,svm或者随机森林等进行分类,模型蒸馏,一个完整的代码
Stars: ✭ 395 (+674.51%)
Mutual labels:  jupyter-notebook, image-classification
Computer Vision
Programming Assignments and Lectures for Stanford's CS 231: Convolutional Neural Networks for Visual Recognition
Stars: ✭ 408 (+700%)
Mutual labels:  jupyter-notebook, image-classification
Dogs vs cats
猫狗大战
Stars: ✭ 570 (+1017.65%)
Mutual labels:  jupyter-notebook, image-classification
VGG16 Keras TensorFlow
# This is a image classification by VGG16 pre-trained model.#
Stars: ✭ 40 (-21.57%)
Mutual labels:  image-classification, vgg16
Grad Cam Tensorflow
tensorflow implementation of Grad-CAM (CNN visualization)
Stars: ✭ 261 (+411.76%)
Mutual labels:  jupyter-notebook, vgg16
Pba
Efficient Learning of Augmentation Policy Schedules
Stars: ✭ 461 (+803.92%)
Mutual labels:  jupyter-notebook, image-classification
Pixel level land classification
Tutorial demonstrating how to create a semantic segmentation (pixel-level classification) model to predict land cover from aerial imagery. This model can be used to identify newly developed or flooded land. Uses ground-truth labels and processed NAIP imagery provided by the Chesapeake Conservancy.
Stars: ✭ 217 (+325.49%)
Mutual labels:  jupyter-notebook, image-classification
Concise Ipython Notebooks For Deep Learning
Ipython Notebooks for solving problems like classification, segmentation, generation using latest Deep learning algorithms on different publicly available text and image data-sets.
Stars: ✭ 23 (-54.9%)
Mutual labels:  jupyter-notebook, image-classification
Grocery Product Classification
Implementation of the paper "A Hierarchical Grocery Store Image Dataset with Visual and Semantic Labels"
Stars: ✭ 23 (-54.9%)
Mutual labels:  jupyter-notebook, image-classification
Neural Image Captioning
Implementation of Neural Image Captioning model using Keras with Theano backend
Stars: ✭ 12 (-76.47%)
Mutual labels:  jupyter-notebook, vgg16
Deep-Learning
It contains the coursework and the practice I have done while learning Deep Learning.🚀 👨‍💻💥 🚩🌈
Stars: ✭ 21 (-58.82%)
Mutual labels:  image-classification, vgg16
Skin Lesions Classification DCNNs
Transfer Learning with DCNNs (DenseNet, Inception V3, Inception-ResNet V2, VGG16) for skin lesions classification
Stars: ✭ 47 (-7.84%)
Mutual labels:  image-classification, vgg16
Pytorch Image Classification
Tutorials on how to implement a few key architectures for image classification using PyTorch and TorchVision.
Stars: ✭ 272 (+433.33%)
Mutual labels:  jupyter-notebook, image-classification
Image-Classification
Pre-trained VGG-Net Model for image classification using tensorflow
Stars: ✭ 29 (-43.14%)
Mutual labels:  image-classification, vgg16
Food Recipe Cnn
food image to recipe with deep convolutional neural networks.
Stars: ✭ 448 (+778.43%)
Mutual labels:  jupyter-notebook, vgg16
Cvpr18 Inaturalist Transfer
Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning. CVPR 2018
Stars: ✭ 164 (+221.57%)
Mutual labels:  jupyter-notebook, image-classification
Seismic Transfer Learning
Deep-learning seismic facies on state-of-the-art CNN architectures
Stars: ✭ 32 (-37.25%)
Mutual labels:  jupyter-notebook, vgg16
Food 101 Keras
Food Classification with Deep Learning in Keras / Tensorflow
Stars: ✭ 646 (+1166.67%)
Mutual labels:  jupyter-notebook, image-classification

Alpha pooling for fine-grained recognition

This repository contains code for our International Conference on Computer Vision publication ``Generalized Orderless Pooling Performs Implicit Salient Matching''. It contains scripts for fine-tuning a pre-trained VGG16 model with our presented alpha-pooling approach.

Abstract of the paper

Most recent CNN architectures use average pooling as a final feature encoding step. In the field of fine-grained recognition, however, recent global representations like bilinear pooling offer improved performance. In this paper, we generalize average and bilinear pooling to "alpha-pooling", allowing for learning the pooling strategy during training. In addition, we present a novel way to visualize decisions made by these approaches. We identify parts of training images having the highest influence on the prediction of a given test image. This allows for justifying decisions to users and also for analyzing the influence of semantic parts. For example, we can show that the higher capacity VGG16 model focuses much more on the bird's head than, e.g., the lower-capacity VGG-M model when recognizing fine-grained bird categories. Both contributions allow us to analyze the difference when moving between average and bilinear pooling. In addition, experiments show that our generalized approach can outperform both across a variety of standard datasets.

Getting started

You need our custom caffe located at https://github.com/cvjena/caffe_pp2, which has our own SignedPowerLayer with learnable power as well as a spatial transformer layer used for on-the-fly image resizing and a compact bilinear layer for computing the outer product in an efficient manner. Please clone and compile caffe_pp2 as well as its python interface. We use python 3 in all our experiments.

Preparation of the dataset

We use an ImageData layer in our experiments. This layer is required in order to use the scripts provided here. Hence you will need a list of train images and a list of test images. Each file should contain the path to the respective images relative to --image_root and the label as integer separated by comma. This means, the files should look like

/path/to/dataset/class1/image1.jpg 1
/path/to/dataset/class1/image2.jpg 1
/path/to/dataset/class2/image1.jpg 2
/path/to/dataset/class2/image2.jpg 2

The path to these files is used in the following scripts and are called train_imagelist and val_imagelist.

How to learn an alpha-pooling model

We provide a batch script and an Jupyter notebook to prepare the fine-tuning. The usage of the batch script is described in the --help message:

usage: prepare-finetuning-batchscript.py [-h] [--init_weights INIT_WEIGHTS]
                                        [--label LABEL] [--gpu_id GPU_ID]
                                        [--num_classes NUM_CLASSES]
                                        [--image_root IMAGE_ROOT]
                                        train_imagelist val_imagelist

Prepare fine-tuning of multiscale alpha pooling. The working directory should
contain train_val.prototxt of vgg16. The models will be created in the
subfolders.

positional arguments:
train_imagelist       Path to imagelist containing the training images. Each
                        line should contain the path to an image followed by a
                        space and the class ID.
val_imagelist         Path to imagelist containing the validation images.
                        Each line should contain the path to an image followed
                        by a space and the class ID.

optional arguments:
-h, --help            show this help message and exit
--init_weights INIT_WEIGHTS
                        Path to the pre-trained vgg16 model
--label LABEL         Label of the created output folder
--gpu_id GPU_ID       ID of the GPU to use
--num_classes NUM_CLASSES
                        Number of object categories
--image_root IMAGE_ROOT
                        Image root folder, used to set the root_folder
                        parameter of the ImageData layer of caffe.

The explanation for the usage of the notebook is described in the comments of it. Please note that gamma in the scripts refers to alpha in the paper due to last minute renaming of the approach before submission.

The script preprare the prototxt and solver for learning the model. In addition, they also learn the last classification layer already. After the preparation, you can fine-tune the network using the created ft.solver file in the finetuning subfolder. Please note that our implementation only supports GPU computation, as the SignedPowerLayer in caffe_pp2 has only a GPU implementation at the moment.

How to learn another architecture

The code shows the fine-tuning preparation for VGG16. If you want to learn another model, you will need a train_val.prototxt, which has two ImageData layers. It is probably the best to take your existing train_val.prototxt and replace your data layers with the ImageData layers of our VGG16 train_val.prototxt. Our script does not support LMDB or any other types of layers, but could be probably adapted for it. After these adjustments, you might also need to adjust the notebook or prepare-finetuning-batchscript.py, depending on what you are using.

Feel free to try any other model, for example our caffe implementation of ResNet50 from https://github.com/cvjena/cnn-models/tree/master/ResNet_preact/ResNet50_cvgj

Accuracy

With VGG16 and a resolution of 224 and 560 pixels on the smaller side of the image, you should achieve the 85.3% top-1 accuracy reported in the paper. Complete list of results:

Dataset CUB200-2011 Aircraft 40 actions-
classes / images 200 / 12k 89 / 10k 40 / 9.5k
Previous 81.0% [24] 72.5% [6] 72.0% [36]
82.0% [17] 78.0% [22] 80.9% [4]
84.5% [34] 80.7% [13] 81.7% [22]
Special case: bilinear [19] 84.1% 84.1% -
Learned strategy (Ours) 85.3% 85.5% 86.0%

Note: running the training longer the the predefined number of itertions leads to a higher accuracy and is necessary to reproduce the paper results.

Citation

Please cite the corresponding ICCV 2017 publication if our models helped your research:

@inproceedings{Simon17_GOP,
title = {Generalized orderless pooling performs implicit salient matching},
booktitle = {International Conference on Computer Vision (ICCV)},
author = {Marcel Simon and Yang Gao and Trevor Darrell and Joachim Denzler and Erik Rodner},
year = {2017},
}

License and support

The code is released under BSD 2-clause license allowing both academic and commercial use. I would appreciate if you give credit to this work by citing our paper in academic works and referencing to this Github repository in commercial works. If you need any support, please open an issue or contact Marcel Simon.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].