All Projects → MILVLG → Bottom Up Attention.pytorch

MILVLG / Bottom Up Attention.pytorch

Licence: apache-2.0
An PyTorch reimplementation of bottom-up-attention models

Projects that are alternatives of or similar to Bottom Up Attention.pytorch

Math of machine learning
This is the code for "Mathematcs of Machine Learning" by Siraj Raval on Youtube
Stars: ✭ 99 (-1%)
Mutual labels:  jupyter-notebook
Deeplearning
Python implementation of Deep Learning book
Stars: ✭ 99 (-1%)
Mutual labels:  jupyter-notebook
Hands On Machine Learning
A series of Jupyter notebooks with Chinese comment that walk you through the fundamentals of Machine Learning and Deep Learning in python using Scikit-Learn and TensorFlow.
Stars: ✭ 1,363 (+1263%)
Mutual labels:  jupyter-notebook
Organic
Code repo for optimizing distributions of molecules.
Stars: ✭ 99 (-1%)
Mutual labels:  jupyter-notebook
Libpysal
Core components of Python Spatial Analysis Library
Stars: ✭ 99 (-1%)
Mutual labels:  jupyter-notebook
Junosautomation
To contain example scripts for different tools.
Stars: ✭ 99 (-1%)
Mutual labels:  jupyter-notebook
Ucl Deep Learning Ans Reinforcement Learning
Deep learning and Reinforcement learning lecture and course work
Stars: ✭ 99 (-1%)
Mutual labels:  jupyter-notebook
Fortran magic
An extension for IPython/Jupyter that helps to use Fortran in your interactive session.
Stars: ✭ 99 (-1%)
Mutual labels:  jupyter-notebook
Noisy labels
TRAINING DEEP NEURAL-NETWORKS USING A NOISE ADAPTATION LAYER
Stars: ✭ 99 (-1%)
Mutual labels:  jupyter-notebook
Neural Tangents
Fast and Easy Infinite Neural Networks in Python
Stars: ✭ 1,357 (+1257%)
Mutual labels:  jupyter-notebook
Mv3d tf
Tensorflow implementation of Multi-View 3D Object Detection Network (in progress)
Stars: ✭ 99 (-1%)
Mutual labels:  jupyter-notebook
Btctrading
Time Series Forecast with Bitcoin value, to detect upward/down trends with Machine Learning Algorithms
Stars: ✭ 99 (-1%)
Mutual labels:  jupyter-notebook
Quant at
Python version of Dr. Ernie Chan's Matlab code and some inspired from Robert Carver's, plus some raw data downloaders
Stars: ✭ 99 (-1%)
Mutual labels:  jupyter-notebook
Delf enhanced
Wrapper of DELF Tensorflow Model
Stars: ✭ 98 (-2%)
Mutual labels:  jupyter-notebook
Recommenders
Best Practices on Recommendation Systems
Stars: ✭ 11,818 (+11718%)
Mutual labels:  jupyter-notebook
100days Ml Code
100天机器学习 (翻译+ 实操)
Stars: ✭ 98 (-2%)
Mutual labels:  jupyter-notebook
Bayarea Dl Summerschool
Torch notebooks and slides for the Bay Area Deep Learning Summer School
Stars: ✭ 99 (-1%)
Mutual labels:  jupyter-notebook
Synaptic Flow
Stars: ✭ 100 (+0%)
Mutual labels:  jupyter-notebook
Dopamine
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
Stars: ✭ 9,681 (+9581%)
Mutual labels:  jupyter-notebook
Advisor
Open-source implementation of Google Vizier for hyper parameters tuning
Stars: ✭ 1,359 (+1259%)
Mutual labels:  jupyter-notebook

bottom-up-attention.pytorch

This repository contains a PyTorch reimplementation of the bottom-up-attention project based on Caffe.

We use Detectron2 as the backend to provide completed functions including training, testing and feature extraction. Furthermore, we migrate the pre-trained Caffe-based model from the original repository which can extract the same visual features as the original model (with deviation < 0.01).

Some example object and attribute predictions for salient image regions are illustrated below. The script to obtain the following visualizations can be found here

example-image

Table of Contents

  1. Prerequisites
  2. Training
  3. Testing
  4. Feature Extraction
  5. Pre-trained models

Prerequisites

Requirements

Note that most of the requirements above are needed for Detectron2.

Installation

  1. Clone the project including the required version (v0.2.1) of Detectron2

    # clone the repository inclduing Detectron2(@be792b9) 
    $ git clone --recursive https://github.com/MILVLG/bottom-up-attention.pytorch
    
  2. Install Detectron2

    $ cd detectron2
    $ pip install -e .
    $ cd ..
    

We recommend using Detectron2 v0.2.1 (@be792b9) as backend for this project, which has been cloned in step 1. We believe a newer Detectron2 version is also compatible with this project unless their interface has been changed (we have tested v0.3 with PyTorch 1.5).

  1. Compile the rest tools using the following script:

    # install apex
    $ git clone https://github.com/NVIDIA/apex.git
    $ cd apex
    $ python setup.py install
    $ cd ..
    # install the rest modules
    $ python setup.py build develop
    $ pip install ray
    

Setup

If you want to train or test the model, you need to download the images and annotation files of the Visual Genome (VG) dataset. If you only need to extract visual features using the pre-trained model, you can skip this part.

The original VG images (part1 and part2) are to be downloaded and unzipped to the datasets folder.

The generated annotation files in the original repository are needed to be transformed to a COCO data format required by Detectron2. The preprocessed annotation files can be downloaded here and unzipped to the dataset folder.

Finally, the datasets folders will have the following structure:

|-- datasets
   |-- vg
   |  |-- images
   |  |  |-- VG_100K
   |  |  |  |-- 2.jpg
   |  |  |  |-- ...
   |  |  |-- VG_100K_2
   |  |  |  |-- 1.jpg
   |  |  |  |-- ...
   |  |-- annotations
   |  |  |-- train.json
   |  |  |-- val.json

Training

The following script will train a bottom-up-attention model on the train split of VG. We are still working on this part to reproduce the same results as the Caffe version.

$ python3 train_net.py --mode detectron2 \
         --config-file configs/bua-caffe/train-bua-caffe-r101.yaml \ 
         --resume
  1. mode = {'caffe', 'detectron2'} refers to the used mode. We only support the mode with Detectron2, which refers to detectron2 mode, since we think it is unnecessary to train a new model using the caffe mode.

  2. config-file refers to all the configurations of the model.

  3. resume refers to a flag if you want to resume training from a specific checkpoint.

Testing

Given the trained model, the following script will test the performance on the val split of VG:

$ python3 train_net.py --mode caffe \
         --config-file configs/bua-caffe/test-bua-caffe-r101.yaml \ 
         --eval-only
  1. mode = {'caffe', 'detectron2'} refers to the used mode. For the converted model from Caffe, you need to use the caffe mode. For other models trained with Detectron2, you need to use the detectron2 mode.

  2. config-file refers to all the configurations of the model, which also include the path of the model weights.

  3. eval-only refers to a flag to declare the testing phase.

Feature Extraction

With highly-optimized multi-process parallelism, the following script will extract the bottom-up-attention visual features in a fast manner (about 7 imgs/s on a workstation with 4 Titan-V GPUs and 32 CPU cores).

And we also provide a faster version of the script of extract features, which will extract the bottom-up-attention visual features in an extremely fast manner! (about 16 imgs/s on a workstation with 4 Titan-V GPUs and 32 cores) However, it has a drawback that it could cause memory leakage problem when the computing capability of GPUs and CPUs is mismatched (More details and some matched examples in here).

To use this faster version, just replace 'extract_features.py' with 'extract_features_faster.py' in the following script. MAKE SURE YOU HAVE ENOUGH CPUS.

$ python3 extract_features.py --mode caffe \
         --num-cpus 32 --gpus '0,1,2,3' \
         --extract-mode roi_feats \
         --min-max-boxes '10,100' \
         --config-file configs/bua-caffe/extract-bua-caffe-r101.yaml \ 
         --image-dir <image_dir> --bbox-dir <out_dir> --out-dir <out_dir>
  1. mode = {'caffe', 'detectron2'} refers to the used mode. For the converted model from Caffe, you need to use the caffe mode. For other models trained with Detectron2, you need to use the detectron2 mode. 'caffe' is the default value.

  2. num-cpus refers to the number of cpu cores to use for accelerating the cpu computation. 0 stands for using all possible cpus and 1 is the default value.

  3. gpus refers to the ids of gpus to use. '0' is the default value.

  4. config-file refers to all the configurations of the model, which also include the path of the model weights.

  5. extract-mode refers to the modes for feature extraction, including {roi_feats, bboxes and bbox_feats}.

  6. min-max-boxes refers to the min-and-max number of features (boxes) to be extracted.

  7. image-dir refers to the input image directory.

  8. bbox-dir refers to the pre-proposed bbox directory. Only be used if the extract-mode is set to 'bbox_feats'.

  9. out-dir refers to the output feature directory.

Using the same pre-trained model, we provide an alternative two-stage strategy for extracting visual features, which results in (slightly) more accurate bboxes and visual features:

# extract bboxes only:
$ python3 extract_features.py --mode caffe \
         --num-cpus 32 --gpu '0,1,2,3' \
         --extract-mode bboxes \
         --config-file configs/bua-caffe/extract-bua-caffe-r101.yaml \ 
         --image-dir <image_dir> --out-dir <out_dir>  --resume 

# extract visual features with the pre-extracted bboxes:
$ python3 extract_features.py --mode caffe \
         --num-cpus 32 --gpu '0,1,2,3' \
         --extract-mode bbox_feats \
         --config-file configs/bua-caffe/extract-bua-caffe-r101.yaml \ 
         --image-dir <image_dir> --bbox-dir <bbox_dir> --out-dir <out_dir>  --resume 

Pre-trained models

We provided pre-trained models as follows, including the models converted from the original Caffe repo (the standard dynamic 10-100 model and the alternative fix36 model). The evaluation metrics are exactly the same as those in the original Caffe project.

Model Mode Backbone Objects [email protected] Objects weighted [email protected] Download
Faster R-CNN-k36 Caffe ResNet-101 9.3% 14.0% model
Faster R-CNN-k10-100 Caffe ResNet-101 10.2% 15.1% model
Faster R-CNN Caffe ResNet-152 11.1% 15.7% model

License

This project is released under the Apache 2.0 license.

Contact

This repo is currently maintained by Zhou Yu (@yuzcccc), Tongan Luo (@Zoroaster97), and Jing Li (@J1mL3e_).

Citation

If this repository is helpful for your research or you want to refer the provided pretrained models, you could cite the work using the following BibTeX entry:

@misc{yu2020buapt,
  author = {Yu, Zhou and Li, Jing and Luo, Tongan and Yu, Jun},
  title = {A PyTorch Implementation of Bottom-Up-Attention},
  howpublished = {\url{https://github.com/MILVLG/bottom-up-attention.pytorch}},
  year = {2020}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].