All Projects → lorenmt → Mtan

lorenmt / Mtan

Licence: mit
The implementation of "End-to-End Multi-Task Learning with Attention" [CVPR 2019].

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Mtan

Generative Inpainting Pytorch
A PyTorch reimplementation for paper Generative Image Inpainting with Contextual Attention (https://arxiv.org/abs/1801.07892)
Stars: ✭ 242 (-33.52%)
Mutual labels:  attention-model
learningspoons
nlp lecture-notes and source code
Stars: ✭ 29 (-92.03%)
Mutual labels:  attention-model
attention-mechanism-keras
attention mechanism in keras, like Dense and RNN...
Stars: ✭ 19 (-94.78%)
Mutual labels:  attention-model
Sinet
Camouflaged Object Detection, CVPR 2020 (Oral & Reported by the New Scientist Magazine)
Stars: ✭ 246 (-32.42%)
Mutual labels:  attention-model
SANET
"Arbitrary Style Transfer with Style-Attentional Networks" (CVPR 2019)
Stars: ✭ 21 (-94.23%)
Mutual labels:  attention-model
HHH-An-Online-Question-Answering-System-for-Medical-Questions
HBAM: Hierarchical Bi-directional Word Attention Model
Stars: ✭ 44 (-87.91%)
Mutual labels:  attention-model
Generative inpainting
DeepFill v1/v2 with Contextual Attention and Gated Convolution, CVPR 2018, and ICCV 2019 Oral
Stars: ✭ 2,659 (+630.49%)
Mutual labels:  attention-model
Attention ocr.pytorch
This repository implements the the encoder and decoder model with attention model for OCR
Stars: ✭ 278 (-23.63%)
Mutual labels:  attention-model
GATE
The implementation of "Gated Attentive-Autoencoder for Content-Aware Recommendation"
Stars: ✭ 65 (-82.14%)
Mutual labels:  attention-model
PBAN-PyTorch
A Position-aware Bidirectional Attention Network for Aspect-level Sentiment Analysis, PyTorch implementation.
Stars: ✭ 33 (-90.93%)
Mutual labels:  attention-model
swin-transformer-pytorch
Implementation of the Swin Transformer in PyTorch.
Stars: ✭ 610 (+67.58%)
Mutual labels:  attention-model
reasoning attention
Unofficial implementation algorithms of attention models on SNLI dataset
Stars: ✭ 34 (-90.66%)
Mutual labels:  attention-model
Recognizing-Textual-Entailment
A pyTorch implementation of models used for Recognizing Textual Entailment using the SNLI corpus
Stars: ✭ 31 (-91.48%)
Mutual labels:  attention-model
Attentionalpoolingaction
Code/Model release for NIPS 2017 paper "Attentional Pooling for Action Recognition"
Stars: ✭ 248 (-31.87%)
Mutual labels:  attention-model
Caver
Caver: a toolkit for multilabel text classification.
Stars: ✭ 38 (-89.56%)
Mutual labels:  attention-model
Pytorch Batch Attention Seq2seq
PyTorch implementation of batched bi-RNN encoder and attention-decoder.
Stars: ✭ 245 (-32.69%)
Mutual labels:  attention-model
Compact-Global-Descriptor
Pytorch implementation of "Compact Global Descriptor for Neural Networks" (CGD).
Stars: ✭ 22 (-93.96%)
Mutual labels:  attention-model
Attentiongan
AttentionGAN for Unpaired Image-to-Image Translation & Multi-Domain Image-to-Image Translation
Stars: ✭ 341 (-6.32%)
Mutual labels:  attention-model
SAE-NAD
The implementation of "Point-of-Interest Recommendation: Exploiting Self-Attentive Autoencoders with Neighbor-Aware Influence"
Stars: ✭ 48 (-86.81%)
Mutual labels:  attention-model
Attention
Repository for Attention Algorithm
Stars: ✭ 39 (-89.29%)
Mutual labels:  attention-model

Multi-Task Attention Network (MTAN)

This repository contains the source code of Multi-Task Attention Network (MTAN) and baselines from the paper, End-to-End Multi-Task Learning with Attention, introduced by Shikun Liu, Edward Johns, and Andrew Davison.

Experiments

Image-to-Image Predictions (One-to-Many)

Under the folder im2im_pred, we have provided our proposed network along with all the baselines on NYUv2 dataset presented in the paper. All models were written in PyTorch, and we have updated the implementation to PyTorch version 1.5 in the latest commit.

Download our pre-processed NYUv2 dataset here which we evaluated in the paper. We use the pre-computed ground-truth normals from here. The raw 13-class NYUv2 dataset can be directly downloaded in this repo with segmentation labels defined in this repo.

I am sorry that I am not able to provide the raw pre-processing code due to an unexpected computer crash.

Update - Jun 2019: I have now released the pre-processing CityScapes dataset with 2, 7, and 19-class semantic labels (see the paper for more details) and (inverse) depth labels. Download [256x512, 2.42GB] version here and [128x256, 651MB] version here.

Update - Oct 2019: For pytorch 1.2 users: The mIoU evaluation method has now been updated to avoid "zeros issue" from computing binary masks. Also, to correctly run the code, please move the scheduler.step() after calling the optimizer.step(), e.g. one line before the last performance printing step to fit the updated pytorch requirements. See more in the official pytorch documentation here. [We have fixed this in the latest commit.]

Update - May 2020: We now have provided our official MTAN-DeepLabv3 (or ResNet-like architecture) design to support more complicated and modern multi-task network backbone. Please check out im2im_pred/model_resnet_mtan for more details. One should easily replace this model with any training template defined in im2im_pred.

Update - July 2020: We have further improved the readability and updated all implementations in im2im_pred to comply the current latest version PyTorch 1.5. We fixed a bug to exclude non-defined pixel predictions for a more accurate mean IoU computation in semantic segmentation tasks. We also provided an additional option for users applying data augmentation in NYUv2 to avoid over-fitting and achieve better performances.

Update - Nov 2020 [IMPORTANT!]: We have updated mIoU and Pixel Accuracy formulas to be consistent with the standard benchmark from the official COCO segmentation scripts. The mIoU for all methods are now expected to improve approximately 8% of performance. The new formulas compute mIoU and Pixel Accuracy based on the accumulated pixel predictions across all images, while the original formulas were based on average pixel predictions in each image across all images.

All models (files) built with SegNet (proposed in the original paper), are described in the following table:

File Name Type Flags Comments
model_segnet_single.py Single task, dataroot standard single task learning
model_segnet_stan.py Single task, dataroot our approach whilst applied on one task
model_segnet_split.py Multi weight, dataroot, temp, type multi-task learning baseline in which the shared network splits at the last layer (also known as hard-parameter sharing)
model_segnet_dense.py Multi weight, dataroot, temp multi-task learning baseline in which each task has its own paramter space (also known as soft-paramter sharing)
model_segnet_cross.py Multi weight, dataroot, temp our implementation of the Cross Stitch Network
model_segnet_mtan.py Multi weight, dataroot, temp our approach

For each flag, it represents

Flag Name Usage Comments
task pick one task to train: semantic (semantic segmentation, depth-wise cross-entropy loss), depth (depth estimation, l1 norm loss) or normal (normal prediction, cos-similarity loss) only available in single-task learning
dataroot directory root for NYUv2 dataset just put under the folder im2im_pred to avoid any concerns :D
weight weighting options for multi-task learning: equal (direct summation of all task losses), DWA (our proposal), uncert (our implementation of the Weight Uncertainty Method) only available in multi-task learning
temp hyper-parameter temperature in DWA weighting option to determine the softness of task weighting
type different versions of multi-task baseline split: standard, deep, wide only available in the baseline split
apply_augmentation toggle on to apply data augmentation in NYUv2 to avoid over-fitting available in all training models

To run any model, cd im2im_pred/ and simply run python MODEL_NAME.py --FLAG_NAME 'FLAG_OPTION' (default option is training without augmentation). Toggle on apply_augmentation flag to train with data augmentation: python MODEL_NAME.py --FLAG_NAME 'FLAG_OPTION' --apply_augmentation.

Please note that, we did not apply any data augmentation in the original paper.

Benchmarking Multi-task Learning

Benchmarking multi-task learning is always a tricky question, since the performance and evaluation method for each task is different. In the original paper, I simply averaged the performance for each task from the last 10 epochs, assuming we do not have access to the validation data.

For a more standardized and fair comparison, I would suggest researchers adopt the evaluation method defined in Section 5, Equation 4 of this paper, which computes the average relative task improvements over single task learning.

NYUv2 can be easily over-fitted due to its small sample size. In July's update, we have provided an option to apply data augmentation to alleviate the over-fitting issue (thanks to Jialong's help). We highly recommend to benchmark NYUv2 dataset with this data augmentation, to be consistent with other SOTA multi-task learning methods using the same data augmentation technique, such as PAD-Net and MTI-Net.

Visual Decathlon Challenge (Many-to-Many)

We also provided source code for Visual Decathlon Challenge for which we build MTAN based on Wide Residual Network from the implementation here.

To run the code, please follow the steps below.

  1. Download the dataset and devkit at the official Visual Decathlon Challenge website here. Move the dataset folder decathlon-1.0-data under the folder visual_decathlon. Then, move decathlon_mean_std.pickle into the folder of the dataset folder decathlon-1.0-data.

  2. Create a directory under test folder for each dataset, and move all test files into that created folder. (That is to comply the PyTorch dataloader format.)

  3. Install setup.py in decathlon devkit under code/coco/PythonAPI folder. And then move pycocotools and annotations from devkit into visual_decathlon folder.

  4. cd visual_decathlon and run python model_wrn_mtan.py --gpu [GPU_ID] --mode [eval, or all] for training. eval represents evaluating on validation dataset (normally for debugging or hyper-parameter tuning), and all represents training on all datasets (normally for final evaluating, or benchmarking).

  5. Run python model_wrn_eval.py --dataset 'imagenet' and 'notimagenet' (sequentially) for evaluating on Imagenet and other datasets. And finally, run python coco_results.py for converting into COCO format for online evaluation.

Other Notices

  1. The provided code is highly optimised for readability. If you find any unusual behaviour, please post an issue or directly contact my email below.
  2. Training the provided code will result different performances (depending on the type of task) than the reported numbers in the paper for image-to-image prediction tasks. But, the rankings stay the same. If you want to compare any models in the paper for image-to-image prediction tasks, please re-run the model on your own with your preferred training strategies (learning rate, optimiser, etc) and keep all training strategies consistent to ensure fairness. To compare results in Visual Decathlon Challenge, you may directly borrow the results presented in the paper. To fairly compare in your research, please build your multi-task network with the same backbone architecture.
  3. From my personal experience, designing a better architecture is usually more helpful (and easier) than finding a better task weighting in multi-task learning.

Citation

If you found this code/work to be useful in your own research, please considering citing the following:

@inproceedings{liu2019end,
  title={End-to-End Multi-task Learning with Attention},
  author={Liu, Shikun and Johns, Edward and Davison, Andrew J},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={1871--1880},
  year={2019}
}

Acknowledgement

We would like to thank Simon Vandenhende for his help on MTAN-DeepLabv3 design; Jialong Wu on his generous contribution to benchmarking MTAN-DeepLabv3, and implementation on data augmentation for NYUv2 dataset.

Contact

If you have any questions, please contact [email protected].

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].