All Projects → wenz116 → lang2seg

wenz116 / lang2seg

Licence: MIT license
Referring Expression Object Segmentation with Caption-Aware Consistency, BMVC 2019

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
shell
77523 projects
c
50402 projects - #5 most used programming language
Cuda
1817 projects
matlab
3953 projects

Projects that are alternatives of or similar to lang2seg

etos-deepcut
Deep Extreme Cut http://www.vision.ee.ethz.ch/~cvlsegmentation/dextr . a tool to do automatically object segmentation from extreme points.
Stars: ✭ 24 (-20%)
Mutual labels:  object-segmentation
Deeplab-pytorch
Deeplab for semantic segmentation tasks
Stars: ✭ 61 (+103.33%)
Mutual labels:  object-segmentation
rt-mrcnn
Real time instance segmentation with Mask R-CNN, live from webcam feed.
Stars: ✭ 47 (+56.67%)
Mutual labels:  object-segmentation
DetectionMetrics
Tool to evaluate deep-learning detection and segmentation models, and to create datasets
Stars: ✭ 66 (+120%)
Mutual labels:  object-segmentation
TransferSeg
Unseen Object Segmentation in Videos via Transferable Representations, ACCV 2018 (oral)
Stars: ✭ 25 (-16.67%)
Mutual labels:  object-segmentation
rosita
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Stars: ✭ 36 (+20%)
Mutual labels:  vision-and-language
just-ask
[TPAMI Special Issue on ICCV 2021 Best Papers, Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Stars: ✭ 57 (+90%)
Mutual labels:  vision-and-language
robo-vln
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"
Stars: ✭ 34 (+13.33%)
Mutual labels:  vision-and-language
MIA
Code for "Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations" (NeurIPS 2019)
Stars: ✭ 57 (+90%)
Mutual labels:  vision-and-language
synse-zsl
Official PyTorch code for the ICIP 2021 paper 'Syntactically Guided Generative Embeddings For Zero Shot Skeleton Action Recognition'
Stars: ✭ 14 (-53.33%)
Mutual labels:  vision-and-language
clip playground
An ever-growing playground of notebooks showcasing CLIP's impressive zero-shot capabilities
Stars: ✭ 80 (+166.67%)
Mutual labels:  vision-and-language
stanford-cs231n-assignments-2020
This repository contains my solutions to the assignments for Stanford's CS231n "Convolutional Neural Networks for Visual Recognition" (Spring 2020).
Stars: ✭ 84 (+180%)
Mutual labels:  vision-and-language
iMIX
A framework for Multimodal Intelligence research from Inspur HSSLAB.
Stars: ✭ 21 (-30%)
Mutual labels:  vision-and-language
VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
Stars: ✭ 41 (+36.67%)
Mutual labels:  vision-and-language
CBP
Official Tensorflow Implementation of the AAAI-2020 paper "Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction"
Stars: ✭ 52 (+73.33%)
Mutual labels:  vision-and-language
wikiHow paper list
A paper list of research conducted based on wikiHow
Stars: ✭ 25 (-16.67%)
Mutual labels:  vision-and-language
TRAR-VQA
[ICCV 2021] TRAR: Routing the Attention Spans in Transformers for Visual Question Answering -- Official Implementation
Stars: ✭ 49 (+63.33%)
Mutual labels:  vision-and-language
calvin
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
Stars: ✭ 105 (+250%)
Mutual labels:  vision-and-language
X-VLM
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
Stars: ✭ 283 (+843.33%)
Mutual labels:  vision-and-language
pytorch violet
A PyTorch implementation of VIOLET
Stars: ✭ 119 (+296.67%)
Mutual labels:  vision-and-language

Referring Expression Object Segmentation with Caption-Aware Consistency

PyTorch implementation of our method for segmenting the object in an image specified by a natural language description.

Contact: Yi-Wen Chen (chenyiwena at gmail dot com)

Paper

Referring Expression Object Segmentation with Caption-Aware Consistency
Yi-Wen Chen, Yi-Hsuan Tsai, Tiantian Wang, Yen-Yu Lin and Ming-Hsuan Yang
British Machine Vision Conference (BMVC), 2019

Please cite our paper if you find it useful for your research.

@inproceedings{Chen_lang2seg_2019,
  author = {Yi-Wen Chen and Yi-Hsuan Tsai and Tiantian Wang and Yen-Yu Lin and Ming-Hsuan Yang},
  booktitle = {British Machine Vision Conference (BMVC)},
  title = {Referring Expression Object Segmentation with Caption-Aware Consistency},
  year = {2019}
}

Prerequisites

  • Python 2.7
  • Pytorch 0.2 or 0.3
  • CUDA 8.0
  • Mask R-CNN: Follow the instructions of the mask-faster-rcnn repo, preparing everything needed for pyutils/mask-faster-rcnn.
  • REFER API and data: Use the download links of REFER and go to the foloder running make. Follow data/README.md to prepare images and refcoco/refcoco+/refcocog annotations.
  • COCO training set should be downloaded in pyutils/mask-faster-rcnn/data/coco/images/train2014.

Preprocessing

The processed data is uploaded in cache/prepro/.

Training

  • <DATASET> <SPLITBY> pairs contain: refcoco unc/refcoco+ unc/refcocog umd/refcocog google

  • Output model will be saved at <DATASET>_<SPLITBY>/output_<OUTPUT_POSTFIX>. If there are trained models in this directory, the model of the latest iteratioin will be loaded.

  • The iteration when learning rate decay is specified as STEPSIZE in train_*.sh.

  1. Train the baseline segmentation model with only 1 dynamic filter:
./experiments/scripts/train_baseline.sh <GPUID> <DATASET> <SPLITBY> <OUTPUT_POSTFIX>
  1. Train the model with spatial dynamic filters:
./experiments/scripts/train_spatial.sh <GPUID> <DATASET> <SPLITBY> <OUTPUT_POSTFIX>
  1. Train the model with spatial dynamic filters and caption loss:
./experiments/scripts/train_cycle.sh <GPUID> <DATASET> <SPLITBY> <OUTPUT_POSTFIX> att2in2 <CAPTION_LOSS_WEIGHT>

The pretrained Mask R-CNN model should be placed at <DATASET>_<SPLITBY>/output_<OUTPUT_POSTFIX>. If there are multiple models in the directory, the model of the latest iteration will be loaded.

The pretrained caption model should be placed at <DATASET>_<SPLITBY>/caption_log_res5_2/, named as model-best.pth and infos-best.pkl.

  1. Train the model with spatial dynamic filters and response loss:
./experiments/scripts/train_response.sh <GPUID> <DATASET> <SPLITBY> <OUTPUT_POSTFIX>
  1. Train the model with spatial dynamic filters, response loss and caption loss:
./experiments/scripts/train_cycle_response.sh <GPUID> <DATASET> <SPLITBY> <OUTPUT_POSTFIX> att2in2 <CAPTION_LOSS_WEIGHT>

The pretrained Mask R-CNN model should be placed at <DATASET>_<SPLITBY>/output_<OUTPUT_POSTFIX>. If there are multiple models in the directory, the model of the latest iteration will be loaded.

The pretrained caption model should be placed at <DATASET>_<SPLITBY>/caption_log_response/, named as model-best.pth and infos-best.pkl.

  1. Train the model with spatial dynamic filters and response loss for VGG16 and Faster R-CNN:

Download the pre-trained Faster R-CNN model here (coco_900k-1190k.tar), and put the .pth and .pkl files in pyutils/mask-faster-rcnn/output/vgg16/coco_2014_train+coco_2014_valminusminival/

./experiments/scripts/train_vgg.sh <GPUID> <DATASET> <SPLITBY> <OUTPUT_POSTFIX>

Evaluation

  1. Evaluate the baseline segmentation model:
./experiments/scripts/eval_baseline.sh <GPUID> <DATASET> <SPLITBY> <OUTPUT_POSTFIX> <MODEL_ITER>

Evaluate the model at <DATASET>_<SPLITBY>/output_<OUTPUT_POSTFIX>, of trained iteration <MODEL_ITER>.

Detection and segmentation results will be saved at experiments/det_results.txt and experiments/mask_results.txt respectively.

  1. Evaluate the model with spatial dynamic filters (and caption loss):
./experiments/scripts/eval_spatial.sh <GPUID> <DATASET> <SPLITBY> <OUTPUT_POSTFIX> <MODEL_ITER>
  1. Evaluate the model with spatial dynamic filters and response loss (and caption loss):
./experiments/scripts/eval_response.sh <GPUID> <DATASET> <SPLITBY> <OUTPUT_POSTFIX> <MODEL_ITER>
  1. Evaluate the model with spatial dynamic filters and response loss for VGG16 and Faster R-CNN:
./experiments/scripts/eval_vgg.sh <GPUID> <DATASET> <SPLITBY> <OUTPUT_POSTFIX> <MODEL_ITER>

Acknowledgement

Thanks for the work of Licheng Yu. Our code is heavily borrowed from the implementation of MattNet.

Note

The model and code are available for non-commercial research purposes only.

  • 9/2019: code released
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].