Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → lichengunc → Mattnet

lichengunc / Mattnet

Licence: mit

MAttNet: Modular Attention Network for Referring Expression Comprehension

Labels

jupyter-notebook

Projects that are alternatives of or similar to Mattnet

Beaker Extensions for Jupyter Notebook

Stars: ✭ 2,594 (+1018.1%)

Mutual labels: jupyter-notebook

Machine Learning By Andrew Ng In Python

Documenting my python implementation of Andrew Ng's Machine Learning course

Stars: ✭ 231 (-0.43%)

Mutual labels: jupyter-notebook

朱小五写文章涉及到的数据分析，爬虫，源数据

Stars: ✭ 232 (+0%)

Mutual labels: jupyter-notebook

Structuredinference

Structured Inference Networks for Nonlinear State Space Models

Stars: ✭ 230 (-0.86%)

Mutual labels: jupyter-notebook

Explains nlp building blocks in a simple manner.

Stars: ✭ 232 (+0%)

Mutual labels: jupyter-notebook

Imaginary Numbers Are Real

Code To Accompany YouTube Series Imaginary Numbers Are Real

Stars: ✭ 231 (-0.43%)

Mutual labels: jupyter-notebook

Mydatascienceportfolio

Applying Data Science and Machine Learning to Solve Real World Business Problems

Stars: ✭ 227 (-2.16%)

Mutual labels: jupyter-notebook

add statistical annotations (pvalue significance) on an existing boxplot generated by seaborn boxplot

Stars: ✭ 228 (-1.72%)

Mutual labels: jupyter-notebook

Neural Network From Scratch

Ever wondered how to code your Neural Network using NumPy, with no frameworks involved?

Stars: ✭ 230 (-0.86%)

Mutual labels: jupyter-notebook

Introduction To Python

Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant white space. (This repository contains Python 3 Code)

Stars: ✭ 232 (+0%)

Mutual labels: jupyter-notebook

All Python Codes Of Ztm Course By Andrei Neagoie

Stars: ✭ 229 (-1.29%)

Mutual labels: jupyter-notebook

Code for our paper "Hamiltonian Neural Networks"

Stars: ✭ 229 (-1.29%)

Mutual labels: jupyter-notebook

Installations mac ubuntu windows

Installations for Data Science. Anaconda, RStudio, Spark, TensorFlow, AWS (Amazon Web Services).

Stars: ✭ 231 (-0.43%)

Mutual labels: jupyter-notebook

Quantiacs Python

Python version of Quantiacs toolbox and sample trading strategies

Stars: ✭ 230 (-0.86%)

Mutual labels: jupyter-notebook

Learn Statistical Learning Method

Implementation of Statistical Learning Method, Second Edition.《统计学习方法》第二版，算法实现。

Stars: ✭ 228 (-1.72%)

Mutual labels: jupyter-notebook

Deep fusion project of deeply-fused nets, and the study on the connection to ensembling

Stars: ✭ 230 (-0.86%)

Mutual labels: jupyter-notebook

My attempt at reproducing the paper Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection

Stars: ✭ 231 (-0.43%)

Mutual labels: jupyter-notebook

Stars: ✭ 232 (+0%)

Mutual labels: jupyter-notebook

Approximating Wasserstein distances with PyTorch

Stars: ✭ 229 (-1.29%)

Mutual labels: jupyter-notebook

Stylegan2 Face Modificator

Simple Encoder, Generator and Face Modificator with StyleGAN2

Stars: ✭ 232 (+0%)

Mutual labels: jupyter-notebook

View All Similar Projects ➔

PyTorch Implementation of MAttNet

Introduction

This repository is Pytorch implementation of MAttNet: Modular Attention Network for Referring Expression Comprehension in CVPR 2018. Refering Expressions are natural language utterances that indicate particular objects within a scene, e.g., "the woman in red sweater", "the man on the right", etc. For robots or other intelligent agents communicating with people in the world, the ability to accurately comprehend such expressions will be a necessary component for natural interactions. In this project, we address referring expression comprehension: localizing an image region described by a natural language expression. Check our paper and online demo for more details. Examples are shown as follows:

Prerequisites

Python 2.7
Pytorch 0.2 (may not work with 1.0 or higher)
CUDA 8.0

Installation

Clone the MAttNet repository

git clone --recursive https://github.com/lichengunc/MAttNet

Prepare the submodules and associated data

Mask R-CNN: Follow the instructions of my mask-faster-rcnn repo, preparing everything needed for pyutils/mask-faster-rcnn. You could use cv/mrcn_detection.ipynb to test if you've get Mask R-CNN ready.
REFER API and data: Use the download links of REFER and go to the foloder running make. Follow data/README.md to prepare images and refcoco/refcoco+/refcocog annotations.
refer-parser2: Follow the instructions of refer-parser2 to extract the parsed expressions using Vicente's R1-R7 attributes. Note this sub-module is only used if you want to train the models by yourself.

Training

Prepare the training and evaluation data by running tools/prepro.py:

python tools/prepro.py --dataset refcoco --splitBy unc

Extract features using Mask R-CNN, where the head_feats are used in subject module training and ann_feats is used in relationship module training.

CUDA_VISIBLE_DEVICES=gpu_id python tools/extract_mrcn_head_feats.py --dataset refcoco --splitBy unc
CUDA_VISIBLE_DEVICES=gpu_id python tools/extract_mrcn_ann_feats.py --dataset refcoco --splitBy unc

Detect objects/masks and extract features (only needed if you want to evaluate the automatic comprehension). We empirically set the confidence threshold of Mask R-CNN as 0.65.

CUDA_VISIBLE_DEVICES=gpu_id python tools/run_detect.py --dataset refcoco --splitBy unc --conf_thresh 0.65
CUDA_VISIBLE_DEVICES=gpu_id python tools/run_detect_to_mask.py --dataset refcoco --splitBy unc
CUDA_VISIBLE_DEVICES=gpu_id python tools/extract_mrcn_det_feats.py --dataset refcoco --splitBy unc

Train MAttNet with ground-truth annotation:

./experiments/scripts/train_mattnet.sh GPU_ID refcoco unc

During training, you may want to use cv/inpect_cv.ipynb to check the training/validation curves and do cross validation.

Evaluation

Evaluate MAttNet with ground-truth annotation:

./experiments/scripts/eval_easy.sh GPUID refcoco unc

If you detected/extracted the Mask R-CNN results already (step 3 above), now you can evaluate the automatic comprehension accuracy using Mask R-CNN detection and segmentation:

./experiments/scripts/eval_dets.sh GPU_ID refcoco unc
./experiments/scripts/eval_masks.sh GPU_ID refcoco unc

Pre-trained Models

In order to get the results in our paper, please follow Training Step 1-3 for data and feature preparation then run Evaluation Step 1. We provide the pre-trained models for RefCOCO, RefCOCO+ and RefCOCOg. Download and put them under ./output folder.

RefCOCO: Pre-trained model (56M)

Localization (gt-box)

Localization (Mask R-CNN)

Segmentation (Mask R-CNN)

val	test A	test B
85.57%	85.95%	84.36%

val	test A	test B
76.65%	81.14%	69.99%

val	test A	test B
75.16%	79.55%	68.87%

RefCOCO+: Pre-trained model (56M)

Localization (gt-box)

Localization (Mask R-CNN)

Segmentation (Mask R-CNN)

val	test A	test B
71.71%	74.28%	66.27%

val	test A	test B
65.33%	71.62%	56.02%

val	test A	test B
64.11%	70.12%	54.82%

RefCOCOg: Pre-trained model (58M)

Localization (gt-box)

Localization (Mask R-CNN)

Segmentation (Mask R-CNN)

val	test
78.96%	78.51%

val	test
66.58%	67.27%

val	test
64.48%	65.60%

Pre-computed detections/masks

We provide the detected boxes/masks for those who are interested in automatic comprehension. This was done using Training Step 3. Note our Mask R-CNN is trained on COCO’s training images, excluding those in RefCOCO, RefCOCO+, and RefCOCOg’s validation+testing. That said it is unfair to use the other off-the-shelf detectors trained on whole COCO set for this task.

Demo

Run cv/example_demo.ipynb for demo example. You can also check our Online Demo.

Citation

@inproceedings{yu2018mattnet,
  title={MAttNet: Modular Attention Network for Referring Expression Comprehension},
  author={Yu, Licheng and Lin, Zhe and Shen, Xiaohui and Yang, Jimei and Lu, Xin and Bansal, Mohit and Berg, Tamara L},
  booktitle={CVPR},
  year={2018}
}

License

MAttNet is released under the MIT License (refer to the LICENSE file for details).

A few notes

I'd like to share several thoughts after working on Referring Expressions for 3 years (since 2015):

Model Improvement: I'm satisfied with this model architecture but still feel the context information is not fully exploited. We tried the context of visual comparison in our ECCV2016. It worked well but relied too much on the detector. That's why I removed the appearance difference in this paper. (Location comparison still remains as it's too important.) I'm looking forward to seeing more robust and interesting context proposed in the future. Another direction is the end-to-end multi-task training. Current model loses some concepts after going through Mask R-CNN. For example, Mask R-CNN can perfectly detect (big) sports ball in an image but MAttNet can no longer recognize it. The reason is we are training the two models seperately and our RefCOCO dataset do not have ball-related expressions.
Borrowing External Concepts: Current datasets (RefCOCO, RefCOCO+, RefCOCOg) have bias toward person category. Around half of the expressions are related to person. However, in real life people may also be interested in referring other common objects (cup, bottle, book) or even stuff (sky, tree or building). As RefCOCO already provides common referring expression structure, the (only) piece left is getting the universal objects/stuff concepts, which could be borrowed from external datasets/tasks.
Referring Expression Generation (REG): Surprisingly few paper works on referring expression generation task so far! Dialogue is important. Referring to things is always the first step for computer-to-human interaction. (I don't think people would love to use a passive computer or robot which cannot talk.) In our CVPR2017, we actually collected more testing expressions for better REG evaluation. (Check REFER2 for the data. The only difference with REFER is it contains more testing expressions on RefCOCO and RefCOCO+.) While we achieved the SOA results in the paper, there should be plentiful space for further improvement. Our speaker model can only utter "boring" and "safe" expressions, thus cannot well specify every object in an image. GAN or a Modular Speaker might be effective weapons as future work.
Data Collection: Larger Referring Expressions dataset is apparently the most straight-forward way to improve the performance of any model. You might have two questions: 1) What data should we collect? 2) How do we collect the dataset? A larger Referring Expression dataset covering the whole MS COCO is expected (of course). This will also make end-to-end learning possible in the future. Task-specific dataset is also interesting. Since ReferIt Game, there have been several datasets in different domains, e.g., video, dialogue and spoken language. Note you may be careful about the problem setting. Randomly fitting referring expressions into a task (just for paper publication) is boring. As for the collection method, I prefer the way used in our ealy work ReferIt Game. The collected expressions might be slightly short (compared with image captioning datasets), but that is how we refer things naturally in daily life.

Authorship

This project is maintained by Licheng Yu.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 232

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (15) 🔗