henghuiding / Vision-Language-Transformer

Licence: MIT license

Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Vision-Language-Transformer

TRAR-VQA

[ICCV 2021] TRAR: Routing the Attention Spans in Transformers for Visual Question Answering -- Official Implementation

Stars: ✭ 49 (-61.42%)

Mutual labels: transformer, iccv2021

laravel5-hal-json

Laravel 5 HAL+JSON API Transformer Package

Stars: ✭ 15 (-88.19%)

Mutual labels: transformer

towhee

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

Stars: ✭ 821 (+546.46%)

Mutual labels: transformer

speech-transformer

Transformer implementation speciaized in speech recognition tasks using Pytorch.

Stars: ✭ 40 (-68.5%)

Mutual labels: transformer

Transformer-ocr

Handwritten text recognition using transformers.

Stars: ✭ 92 (-27.56%)

Mutual labels: transformer

FNet-pytorch

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

Stars: ✭ 204 (+60.63%)

Mutual labels: transformer

YOLOv5-Lite

🍅🍅🍅YOLOv5-Lite: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 930+kb (int8) and 1.7M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size is 320×320~

Stars: ✭ 1,230 (+868.5%)

Mutual labels: transformer

gnerf

[ ICCV 2021 Oral ] Our method can estimate camera poses and neural radiance fields jointly when the cameras are initialized at random poses in complex scenarios (outside-in scenes, even with less texture or intense noise )

Stars: ✭ 152 (+19.69%)

Mutual labels: iccv2021

Relation-Extraction-Transformer

NLP: Relation extraction with position-aware self-attention transformer

Stars: ✭ 63 (-50.39%)

Mutual labels: transformer

Xpersona

XPersona: Evaluating Multilingual Personalized Chatbot

Stars: ✭ 54 (-57.48%)

Mutual labels: transformer

laravel-scene

Laravel Transformer

Stars: ✭ 27 (-78.74%)

Mutual labels: transformer

renet

[ICCV'21] Official PyTorch implementation of Relational Embedding for Few-Shot Classification

Stars: ✭ 72 (-43.31%)

Mutual labels: iccv2021

ilvr adm

ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (ICCV 2021 Oral)

Stars: ✭ 133 (+4.72%)

Mutual labels: iccv2021

image-classification

A collection of SOTA Image Classification Models in PyTorch

Stars: ✭ 70 (-44.88%)

Mutual labels: transformer

visualization

a collection of visualization function

Stars: ✭ 189 (+48.82%)

Mutual labels: transformer

transform-graphql

⚙️ Transformer function to transform GraphQL Directives. Create model CRUD directive for example

Stars: ✭ 23 (-81.89%)

Mutual labels: transformer

PDN

The official PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing" (WebConf '21)

Stars: ✭ 44 (-65.35%)

Mutual labels: transformer

graphtrans

Representing Long-Range Context for Graph Neural Networks with Global Attention

Stars: ✭ 45 (-64.57%)

Mutual labels: transformer

transformer-slt

Sign Language Translation with Transformers (COLING'2020, ECCV'20 SLRTP Workshop)

Stars: ✭ 92 (-27.56%)

Mutual labels: transformer

OverlapPredator

[CVPR 2021, Oral] PREDATOR: Registration of 3D Point Clouds with Low Overlap.

Stars: ✭ 293 (+130.71%)

Mutual labels: transformer

View All Similar Projects ➔

Vision-Language Transformer and Query Generation for Referring Segmentation

Please consider citing our paper in your publications if the project helps your research.

@inproceedings{vision-language-transformer,
  title={Vision-Language Transformer and Query Generation for Referring Segmentation},
  author={Ding, Henghui and Liu, Chang and Wang, Suchen and Jiang, Xudong},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2021}
}

Introduction

Vision-Language Transformer (VLT) is a framework for referring segmentation task. Our method produces multiple query vector for one input language expression, and use each of them to “query” the input image, generating a set of responses. Then the network selectively aggregates these responses, in which queries that provide better comprehensions are spotlighted.

Installation

Environment:
- Python 3.6
- tensorflow 1.15
- Other dependencies in requirements.txt
- SpaCy model for embedding:
  
  python -m spacy download en_vectors_web_lg
Dataset preparation
- Put the folder of COCO training set ("train2014") under data/images/.
- Download the RefCOCO dataset from here and extract them to data/. Then run the script for data preparation under data/:
```
cd data
python data_process_v2.py --data_root . --output_dir data_v2 --dataset [refcoco/refcoco+/refcocog] --split [unc/umd/google] --generate_mask
```

Evaluating

Download pretrained models & config files from here.
In the config file, set:
- evaluate_model: path to the pretrained weights
- evaluate_set: path to the dataset for evaluation.

Run

python vlt.py test [PATH_TO_CONFIG_FILE]

Training

Pretrained Backbones: We use the backbone weights proviede by MCN.

Note: we use the backbone that excludes all images that appears in the val/test splits of RefCOCO, RefCOCO+ and RefCOCOg.
Specify hyperparameters, dataset path and pretrained weight path in the configuration file. Please refer to the examples under /config, or config file of our pretrained models.

Run

python vlt.py train [PATH_TO_CONFIG_FILE]

Acknowledgement

We borrowed a lot of codes from MCN, keras-transformer, RefCOCO API and keras-yolo3. Thanks for their excellent works!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

henghuiding / Vision-Language-Transformer

Programming Languages

Labels

Projects that are alternatives of or similar to Vision-Language-Transformer

Vision-Language Transformer and Query Generation for Referring Segmentation

Introduction

Installation

Evaluating

Training

Acknowledgement