All Projects → henghuiding → Vision-Language-Transformer

henghuiding / Vision-Language-Transformer

Licence: MIT license
Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Vision-Language-Transformer

TRAR-VQA
[ICCV 2021] TRAR: Routing the Attention Spans in Transformers for Visual Question Answering -- Official Implementation
Stars: ✭ 49 (-61.42%)
Mutual labels:  transformer, iccv2021
laravel5-hal-json
Laravel 5 HAL+JSON API Transformer Package
Stars: ✭ 15 (-88.19%)
Mutual labels:  transformer
towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
Stars: ✭ 821 (+546.46%)
Mutual labels:  transformer
speech-transformer
Transformer implementation speciaized in speech recognition tasks using Pytorch.
Stars: ✭ 40 (-68.5%)
Mutual labels:  transformer
Transformer-ocr
Handwritten text recognition using transformers.
Stars: ✭ 92 (-27.56%)
Mutual labels:  transformer
FNet-pytorch
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms
Stars: ✭ 204 (+60.63%)
Mutual labels:  transformer
YOLOv5-Lite
🍅🍅🍅YOLOv5-Lite: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 930+kb (int8) and 1.7M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size is 320×320~
Stars: ✭ 1,230 (+868.5%)
Mutual labels:  transformer
gnerf
[ ICCV 2021 Oral ] Our method can estimate camera poses and neural radiance fields jointly when the cameras are initialized at random poses in complex scenarios (outside-in scenes, even with less texture or intense noise )
Stars: ✭ 152 (+19.69%)
Mutual labels:  iccv2021
Relation-Extraction-Transformer
NLP: Relation extraction with position-aware self-attention transformer
Stars: ✭ 63 (-50.39%)
Mutual labels:  transformer
Xpersona
XPersona: Evaluating Multilingual Personalized Chatbot
Stars: ✭ 54 (-57.48%)
Mutual labels:  transformer
laravel-scene
Laravel Transformer
Stars: ✭ 27 (-78.74%)
Mutual labels:  transformer
renet
[ICCV'21] Official PyTorch implementation of Relational Embedding for Few-Shot Classification
Stars: ✭ 72 (-43.31%)
Mutual labels:  iccv2021
ilvr adm
ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (ICCV 2021 Oral)
Stars: ✭ 133 (+4.72%)
Mutual labels:  iccv2021
image-classification
A collection of SOTA Image Classification Models in PyTorch
Stars: ✭ 70 (-44.88%)
Mutual labels:  transformer
visualization
a collection of visualization function
Stars: ✭ 189 (+48.82%)
Mutual labels:  transformer
transform-graphql
⚙️ Transformer function to transform GraphQL Directives. Create model CRUD directive for example
Stars: ✭ 23 (-81.89%)
Mutual labels:  transformer
PDN
The official PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing" (WebConf '21)
Stars: ✭ 44 (-65.35%)
Mutual labels:  transformer
graphtrans
Representing Long-Range Context for Graph Neural Networks with Global Attention
Stars: ✭ 45 (-64.57%)
Mutual labels:  transformer
transformer-slt
Sign Language Translation with Transformers (COLING'2020, ECCV'20 SLRTP Workshop)
Stars: ✭ 92 (-27.56%)
Mutual labels:  transformer
OverlapPredator
[CVPR 2021, Oral] PREDATOR: Registration of 3D Point Clouds with Low Overlap.
Stars: ✭ 293 (+130.71%)
Mutual labels:  transformer

Vision-Language Transformer and Query Generation for Referring Segmentation

Please consider citing our paper in your publications if the project helps your research.

@inproceedings{vision-language-transformer,
  title={Vision-Language Transformer and Query Generation for Referring Segmentation},
  author={Ding, Henghui and Liu, Chang and Wang, Suchen and Jiang, Xudong},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2021}
}

Introduction

Vision-Language Transformer (VLT) is a framework for referring segmentation task. Our method produces multiple query vector for one input language expression, and use each of them to “query” the input image, generating a set of responses. Then the network selectively aggregates these responses, in which queries that provide better comprehensions are spotlighted.

Installation

  1. Environment:

    • Python 3.6

    • tensorflow 1.15

    • Other dependencies in requirements.txt

    • SpaCy model for embedding:

      python -m spacy download en_vectors_web_lg

  2. Dataset preparation

    • Put the folder of COCO training set ("train2014") under data/images/.

    • Download the RefCOCO dataset from here and extract them to data/. Then run the script for data preparation under data/:

      cd data
      python data_process_v2.py --data_root . --output_dir data_v2 --dataset [refcoco/refcoco+/refcocog] --split [unc/umd/google] --generate_mask
      

Evaluating

  1. Download pretrained models & config files from here.

  2. In the config file, set:

    • evaluate_model: path to the pretrained weights
    • evaluate_set: path to the dataset for evaluation.
  3. Run

    python vlt.py test [PATH_TO_CONFIG_FILE]
    

Training

  1. Pretrained Backbones: We use the backbone weights proviede by MCN.

    Note: we use the backbone that excludes all images that appears in the val/test splits of RefCOCO, RefCOCO+ and RefCOCOg.

  2. Specify hyperparameters, dataset path and pretrained weight path in the configuration file. Please refer to the examples under /config, or config file of our pretrained models.

  3. Run

    python vlt.py train [PATH_TO_CONFIG_FILE]
    

Acknowledgement

We borrowed a lot of codes from MCN, keras-transformer, RefCOCO API and keras-yolo3. Thanks for their excellent works!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].