Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → peteanderson80 → Bottom Up Attention

peteanderson80 / Bottom Up Attention

Licence: mit

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

Labels

jupyter-notebook caffe faster-rcnn image-captioning vqa

Projects that are alternatives of or similar to Bottom Up Attention

Up Down Captioner

Automatic image captioning model based on Caffe, using features from bottom-up attention.

Stars: ✭ 195 (-80.28%)

Mutual labels: jupyter-notebook, caffe, image-captioning

Py Rfcn Priv

code for py-R-FCN-multiGPU maintained by bupt-priv

Stars: ✭ 153 (-84.53%)

Mutual labels: jupyter-notebook, caffe, faster-rcnn

Keras realtime multi Person pose estimation

Keras version of Realtime Multi-Person Pose Estimation project

Stars: ✭ 728 (-26.39%)

Mutual labels: jupyter-notebook, caffe

Face Mask Detection

Face Mask Detection system based on computer vision and deep learning using OpenCV and Tensorflow/Keras

Stars: ✭ 774 (-21.74%)

Mutual labels: jupyter-notebook, caffe

All Classifiers 2019

A collection of computer vision projects for Acute Lymphoblastic Leukemia classification/early detection.

Stars: ✭ 22 (-97.78%)

Mutual labels: jupyter-notebook, caffe

Deep Learning Traffic Lights

Code and files of the deep learning model used to win the Nexar Traffic Light Recognition challenge

Stars: ✭ 457 (-53.79%)

Mutual labels: jupyter-notebook, caffe

Realtime multi Person pose estimation

Code repo for realtime multi-person pose estimation in CVPR'17 (Oral)

Stars: ✭ 4,760 (+381.29%)

Mutual labels: jupyter-notebook, caffe

Visual Question Answering

📷 ❓ Visual Question Answering Demo and Algorithmia API

Stars: ✭ 18 (-98.18%)

Mutual labels: vqa, jupyter-notebook

Tbd Nets

PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

Stars: ✭ 345 (-65.12%)

Mutual labels: vqa, jupyter-notebook

Faster Rcnn Cplusplus2

faster-rcnn c++ python model

Stars: ✭ 14 (-98.58%)

Mutual labels: caffe, faster-rcnn

Neural Image Captioning

Implementation of Neural Image Captioning model using Keras with Theano backend

Stars: ✭ 12 (-98.79%)

Mutual labels: jupyter-notebook, image-captioning

Keras Faster Rcnn

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Stars: ✭ 28 (-97.17%)

Mutual labels: jupyter-notebook, faster-rcnn

Oscar

Oscar and VinVL

Stars: ✭ 396 (-59.96%)

Mutual labels: vqa, image-captioning

Deepnetsforeo

Deep networks for Earth Observation

Stars: ✭ 393 (-60.26%)

Mutual labels: jupyter-notebook, caffe

Caffenet Benchmark

Evaluation of the CNN design choices performance on ImageNet-2012.

Stars: ✭ 700 (-29.22%)

Mutual labels: jupyter-notebook, caffe

Vpgnet

VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition (ICCV 2017)

Stars: ✭ 382 (-61.38%)

Mutual labels: jupyter-notebook, caffe

Fundamentals Of Deep Learning For Computer Vision Nvidia

The repository includes Notebook files and documents of the course I completed in NVIDIA Deep Learning Institute. Feel free to acess and work with the Notebooks and other files.

Stars: ✭ 16 (-98.38%)

Mutual labels: jupyter-notebook, caffe

Teacher Student Training

This repository stores the files used for my summer internship's work on "teacher-student learning", an experimental method for training deep neural networks using a trained teacher model.

Stars: ✭ 34 (-96.56%)

Mutual labels: jupyter-notebook, caffe

Learning

The data is the future of oil, digging the potential value of the data is very meaningful. This library records my road of machine learning study.

Stars: ✭ 330 (-66.63%)

Mutual labels: jupyter-notebook, caffe

Pytorchnethub

项目注释+论文复现+算法竞赛

Stars: ✭ 341 (-65.52%)

Mutual labels: jupyter-notebook, faster-rcnn

View All Similar Projects ➔

bottom-up-attention

This code implements a bottom-up attention model, based on multi-gpu training of Faster R-CNN with ResNet-101, using object and attribute annotations from Visual Genome.

The pretrained model generates output features corresponding to salient image regions. These bottom-up attention features can typically be used as a drop-in replacement for CNN features in attention-based image captioning and visual question answering (VQA) models. This approach was used to achieve state-of-the-art image captioning performance on MSCOCO (CIDEr 117.9, BLEU_4 36.9) and to win the 2017 VQA Challenge (70.3% overall accuracy), as described in:

Some example object and attribute predictions for salient image regions are illustrated below.

Note: This repo only includes code for training the bottom-up attention / Faster R-CNN model (section 3.1 of the paper). The actual captioning model (section 3.2) is available in a separate repo here.

Reference

If you use our code or features, please cite our paper:

@inproceedings{Anderson2017up-down,
  author = {Peter Anderson and Xiaodong He and Chris Buehler and Damien Teney and Mark Johnson and Stephen Gould and Lei Zhang},
  title = {Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering},
  booktitle={CVPR},
  year = {2018}
}

Disclaimer

This code is modified from py-R-FCN-multiGPU, which is in turn modified from py-faster-rcnn code. Please refer to these links for further README information (for example, relating to other models and datasets included in the repo) and appropriate citations for these works. This README only relates to Faster R-CNN trained on Visual Genome.

License

bottom-up-attention is released under the MIT License (refer to the LICENSE file for details).

Pretrained features

For ease-of-use, we make pretrained features available for the entire MSCOCO dataset. It is not necessary to clone or build this repo to use features downloaded from the links below. Features are stored in tsv (tab-separated-values) format that can be read with tools/read_tsv.py.

LINKS HAVE BEEN UPDATED TO GOOGLE CLOUD STORAGE (14 Feb 2021)

10 to 100 features per image (adaptive):

36 features per image (fixed):

Both sets of features can be recreated by using tools/generate_tsv.py with the appropriate pretrained model and with MIN_BOXES/MAX_BOXES set to either 10/100 or 36/36 respectively - refer Demo.

Requirements: software
Requirements: hardware
Basic installation
Demo
Training
Testing

Requirements: software

Important Please use the version of caffe contained within this repository.
Requirements for Caffe and pycaffe (see: Caffe installation instructions)

Note: Caffe must be built with support for Python layers and NCCL!

# In your Makefile.config, make sure to have these lines uncommented
WITH_PYTHON_LAYER := 1
USE_NCCL := 1
# Unrelatedly, it's also recommended that you use CUDNN
USE_CUDNN := 1

Python packages you might not have: cython, python-opencv, easydict
Nvidia's NCCL library which is used for multi-GPU training https://github.com/NVIDIA/nccl

Requirements: hardware

Any NVIDIA GPU with 12GB or larger memory is OK for training Faster R-CNN ResNet-101.

Installation

Clone the repository

git clone https://github.com/peteanderson80/bottom-up-attention/

Build the Cython modules
```
cd $REPO_ROOT/lib
make
```

Build Caffe and pycaffe

cd $REPO_ROOT/caffe
# Now follow the Caffe installation instructions here:
#   http://caffe.berkeleyvision.org/installation.html

# If you're experienced with Caffe and have all of the requirements installed
# and your Makefile.config in place, then simply do:
make -j8 && make pycaffe

Demo

Download pretrained model, and put it under data\faster_rcnn_models.
Run tools/demo.ipynb to show object and attribute detections on demo images.
Run tools/generate_tsv.py to extract bounding box features to a tab-separated-values (tsv) file. This will require modifying the load_image_ids function to suit your data locations. To recreate the pretrained feature files with 10 to 100 features per image, set MIN_BOXES=10 and MAX_BOXES=100. To recreate the pretrained feature files with 36 features per image, set MIN_BOXES=36 and MAX_BOXES=36 use this alternative pretrained model instead. The alternative pretrained model was trained for fewer iterations but performance is similar.

Training

Download the Visual Genome dataset. Extract all the json files, as well as the image directories VG_100K and VG_100K_2 into one folder $VGdata.
Create symlinks for the Visual Genome dataset
```
cd $REPO_ROOT/data
ln -s $VGdata vg
```
Generate xml files for each image in the pascal voc format (this will take some time). This script will extract the top 2500/1000/500 objects/attributes/relations and also does basic cleanup of the visual genome data. Note however, that our training code actually only uses a subset of the annotations in the xml files, i.e., only 1600 object classes and 400 attribute classes, based on the hand-filtered vocabs found in data/genome/1600-400-20. The relevant part of the codebase is lib/datasets/vg.py. Relation labels can be included in the data layers but are currently not used.
```
cd $REPO_ROOT
./data/genome/setup_vg.py
```
Please download the ImageNet-pre-trained ResNet-100 model manually, and put it into $REPO_ROOT/data/imagenet_models
You can train your own model using ./experiments/scripts/faster_rcnn_end2end_multi_gpu_resnet_final.sh (see instructions in file). The train (95k) / val (5k) / test (5k) splits are in data/genome/{split}.txt and have been determined using data/genome/create_splits.py. To avoid val / test set contamination when pre-training for MSCOCO tasks, for images in both datasets these splits match the 'Karpathy' COCO splits.

Trained Faster-RCNN snapshots are saved under:
```
output/faster_rcnn_resnet/vg/
```
Logging outputs are saved under:
```
experiments/logs/
```
Run tools/review_training.ipynb to visualize the training data and predictions.

Testing

The model will be tested on the validation set at the end of training, or models can be tested directly using tools/test_net.py, e.g.:

./tools/test_net.py --gpu 0 --imdb vg_1600-400-20_val --def models/vg/ResNet-101/faster_rcnn_end2end_final/test.prototxt --cfg experiments/cfgs/faster_rcnn_end2end_resnet.yml --net data/faster_rcnn_models/resnet101_faster_rcnn_final.caffemodel > experiments/logs/eval.log 2<&1

Mean AP is reported separately for object prediction and attibute prediction (given ground-truth object detections). Test outputs are saved under:

output/faster_rcnn_resnet/vg_1600-400-20_val/<network snapshot name>/

Expected detection results for the pretrained model

	objects [email protected]	objects weighted [email protected]	attributes [email protected]	attributes weighted [email protected]
Faster R-CNN, ResNet-101	10.2%	15.1%	7.8%	27.8%

Note that mAP is relatively low because many classes overlap (e.g. person / man / guy), some classes can't be precisely located (e.g. street, field) and separate classes exist for singular and plural objects (e.g. person / people). We focus on performance in downstream tasks (e.g. image captioning, VQA) rather than detection performance.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 989

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (57) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

peteanderson80 / Bottom Up Attention

Labels

Projects that are alternatives of or similar to Bottom Up Attention

bottom-up-attention

Reference

Disclaimer

License

Pretrained features

Contents

Requirements: software

Requirements: hardware

Installation

Demo

Training

Testing

Expected detection results for the pretrained model