All Projects → rstrudel → segmenter

rstrudel / segmenter

Licence: MIT license
[ICCV2021] Official PyTorch implementation of Segmenter: Transformer for Semantic Segmentation

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to segmenter

Visual-Transformer-Paper-Summary
Summary of Transformer applications for computer vision tasks.
Stars: ✭ 51 (-88.98%)
Mutual labels:  transformer, segmentation
Setr Pytorch
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Stars: ✭ 96 (-79.27%)
Mutual labels:  transformer, segmentation
Hrnet Semantic Segmentation
The OCR approach is rephrased as Segmentation Transformer: https://arxiv.org/abs/1909.11065. This is an official implementation of semantic segmentation for HRNet. https://arxiv.org/abs/1908.07919
Stars: ✭ 2,369 (+411.66%)
Mutual labels:  transformer, segmentation
Medical Transformer
Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation"
Stars: ✭ 153 (-66.95%)
Mutual labels:  transformer, segmentation
Pvt
Stars: ✭ 379 (-18.14%)
Mutual labels:  transformer, segmentation
HRFormer
This is an official implementation of our NeurIPS 2021 paper "HRFormer: High-Resolution Transformer for Dense Prediction".
Stars: ✭ 357 (-22.89%)
Mutual labels:  transformer, segmentation
segmentation-paper-reading-notes
segmentation paper reading notes
Stars: ✭ 39 (-91.58%)
Mutual labels:  segmentation
DeepFashion MRCNN
Fashion Item segmentation with Mask_RCNN
Stars: ✭ 29 (-93.74%)
Mutual labels:  segmentation
FCN-Segmentation-TensorFlow
FCN for Semantic Image Segmentation achieving 68.5 mIoU on PASCAL VOC
Stars: ✭ 34 (-92.66%)
Mutual labels:  segmentation
FragmentVC
Any-to-any voice conversion by end-to-end extracting and fusing fine-grained voice fragments with attention
Stars: ✭ 134 (-71.06%)
Mutual labels:  transformer
transformer
Neutron: A pytorch based implementation of Transformer and its variants.
Stars: ✭ 60 (-87.04%)
Mutual labels:  transformer
php-hal
HAL+JSON & HAL+XML API transformer outputting valid (PSR-7) API Responses.
Stars: ✭ 30 (-93.52%)
Mutual labels:  transformer
Transformer tf2.0
Transfromer tensorflow2.0版本实现
Stars: ✭ 23 (-95.03%)
Mutual labels:  transformer
DigiPathAI
Digital Pathology AI
Stars: ✭ 43 (-90.71%)
Mutual labels:  segmentation
unsupervised llamas
Code for https://unsupervised-llamas.com
Stars: ✭ 70 (-84.88%)
Mutual labels:  segmentation
Walk-Transformer
From Random Walks to Transformer for Learning Node Embeddings (ECML-PKDD 2020) (In Pytorch and Tensorflow)
Stars: ✭ 26 (-94.38%)
Mutual labels:  transformer
Image-Caption
Using LSTM or Transformer to solve Image Captioning in Pytorch
Stars: ✭ 36 (-92.22%)
Mutual labels:  transformer
mobilenet segmentation
Binary semantic segmentation with UNet based on MobileNetV2 encoder
Stars: ✭ 18 (-96.11%)
Mutual labels:  segmentation
Embedding
Embedding模型代码和学习笔记总结
Stars: ✭ 25 (-94.6%)
Mutual labels:  transformer
M3DETR
Code base for M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers
Stars: ✭ 47 (-89.85%)
Mutual labels:  transformer

Segmenter: Transformer for Semantic Segmentation

Figure 1 from paper

Segmenter: Transformer for Semantic Segmentation by Robin Strudel*, Ricardo Garcia*, Ivan Laptev and Cordelia Schmid, ICCV 2021.

*Equal Contribution

🔥 Segmenter is now available on MMSegmentation.

Installation

Define os environment variables pointing to your checkpoint and dataset directory, put in your .bashrc:

export DATASET=/path/to/dataset/dir

Install PyTorch 1.9 then pip install . at the root of this repository.

To download ADE20K, use the following command:

python -m segm.scripts.prepare_ade20k $DATASET

Model Zoo

We release models with a Vision Transformer backbone initialized from the improved ViT models.

ADE20K

Segmenter models with ViT backbone:

Name mIoU (SS/MS) # params Resolution FPS Download
Seg-T-Mask/16 38.1 / 38.8 7M 512x512 52.4 model config log
Seg-S-Mask/16 45.3 / 46.9 27M 512x512 34.8 model config log
Seg-B-Mask/16 48.5 / 50.0 106M 512x512 24.1 model config log
Seg-B/8 49.5 / 50.5 89M 512x512 4.2 model config log
Seg-L-Mask/16 51.8 / 53.6 334M 640x640 - model config log

Segmenter models with DeiT backbone:

Name mIoU (SS/MS) # params Resolution FPS Download
Seg-B/16 47.1 / 48.1 87M 512x512 27.3 model config log
Seg-B-Mask/16 48.7 / 50.1 106M 512x512 24.1 model config log

Pascal Context

Name mIoU (SS/MS) # params Resolution FPS Download
Seg-L-Mask/16 58.1 / 59.0 334M 480x480 - model config log

Cityscapes

Name mIoU (SS/MS) # params Resolution FPS Download
Seg-L-Mask/16 79.1 / 81.3 322M 768x768 - model config log

Inference

Download one checkpoint with its configuration in a common folder, for example seg_tiny_mask.

You can generate segmentation maps from your own data with:

python -m segm.inference --model-path seg_tiny_mask/checkpoint.pth -i images/ -o segmaps/ 

To evaluate on ADE20K, run the command:

# single-scale evaluation:
python -m segm.eval.miou seg_tiny_mask/checkpoint.pth ade20k --singlescale
# multi-scale evaluation:
python -m segm.eval.miou seg_tiny_mask/checkpoint.pth ade20k --multiscale

Train

Train Seg-T-Mask/16 on ADE20K on a single GPU:

python -m segm.train --log-dir seg_tiny_mask --dataset ade20k \
  --backbone vit_tiny_patch16_384 --decoder mask_transformer

To train Seg-B-Mask/16, simply set vit_base_patch16_384 as backbone and launch the above command using a minimum of 4 V100 GPUs (~12 minutes per epoch) and up to 8 V100 GPUs (~7 minutes per epoch). The code uses SLURM environment variables.

Logs

To plot the logs of your experiments, you can use

python -m segm.utils.logs logs.yml

with logs.yml located in utils/ with the path to your experiments logs:

root: /path/to/checkpoints/
logs:
  seg-t: seg_tiny_mask/log.txt
  seg-b: seg_base_mask/log.txt

Attention Maps

To visualize the attention maps for Seg-T-Mask/16 encoder layer 0 and patch (0, 21), you can use:

python -m segm.scripts.show_attn_map seg_tiny_mask/checkpoint.pth \ 
images/im0.jpg output_dir/ --layer-id 0 --x-patch 0 --y-patch 21 --enc

Different options are provided to select the generated attention maps:

  • --enc or --dec: Select encoder or decoder attention maps respectively.
  • --patch or --cls: --patch generates attention maps for the patch with coordinates (x_patch, y_patch). --cls combined with --enc generates attention maps for the CLS token of the encoder. --cls combined with --dec generates maps for each class embedding of the decoder.
  • --x-patch and --y-patch: Coordinates of the patch to draw attention maps from. This flag is ignored when --cls is used.
  • --layer-id: Select the layer for which the attention maps are generated.

For example, to generate attention maps for the decoder class embeddings, you can use:

python -m segm.scripts.show_attn_map seg_tiny_mask/checkpoint.pth \
images/im0.jpg output_dir/ --layer-id 0 --dec --cls

Attention maps for patch (0, 21) in Seg-L-Mask/16 encoder layers 1, 4, 8, 12 and 16:

Attention maps of patch x=8 and y=21 and encoder layers 1, 4, 8, 12 and 16

Attention maps for the class embeddings in Seg-L-Mask/16 decoder layer 0:

Attention maps of cls tokens 7, 15, 18, 22, 36 and 57 and Mask decoder layer 0

Video Segmentation

Zero shot video segmentation on DAVIS video dataset with Seg-B-Mask/16 model trained on ADE20K.

BibTex

@article{strudel2021,
  title={Segmenter: Transformer for Semantic Segmentation},
  author={Strudel, Robin and Garcia, Ricardo and Laptev, Ivan and Schmid, Cordelia},
  journal={arXiv preprint arXiv:2105.05633},
  year={2021}
}

Acknowledgements

The Vision Transformer code is based on timm library and the semantic segmentation training and evaluation pipeline is using mmsegmentation.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].