All Projects → HRNet → HRFormer

HRNet / HRFormer

Licence: MIT license
This is an official implementation of our NeurIPS 2021 paper "HRFormer: High-Resolution Transformer for Dense Prediction".

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects
Cuda
1817 projects
c
50402 projects - #5 most used programming language
C++
36643 projects - #6 most used programming language
cython
566 projects

Projects that are alternatives of or similar to HRFormer

SemanticSegmentation-Libtorch
Libtorch Examples
Stars: ✭ 38 (-89.36%)
Mutual labels:  vision, segmentation
Skin-Cancer-Segmentation
Classification and Segmentation with Mask-RCNN of Skin Cancer using ISIC dataset
Stars: ✭ 61 (-82.91%)
Mutual labels:  classification, segmentation
Awesome-Vision-Transformer-Collection
Variants of Vision Transformer and its downstream tasks
Stars: ✭ 124 (-65.27%)
Mutual labels:  segmentation, pose-estimation
android tflite
GPU Accelerated TensorFlow Lite applications on Android NDK. Higher accuracy face detection, Age and gender estimation, Human pose estimation, Artistic style transfer
Stars: ✭ 105 (-70.59%)
Mutual labels:  segmentation, pose-estimation
volkscv
A Python toolbox for computer vision research and project
Stars: ✭ 58 (-83.75%)
Mutual labels:  classification, segmentation
GaitGraph
Official repository for "GaitGraph: Graph Convolutional Network for Skeleton-Based Gait Recognition" (ICIP'21)
Stars: ✭ 68 (-80.95%)
Mutual labels:  pose-estimation, hrnet
verseagility
Ramp up your custom natural language processing (NLP) task, allowing you to bring your own data, use your preferred frameworks and bring models into production.
Stars: ✭ 23 (-93.56%)
Mutual labels:  transformer, classification
Medical Transformer
Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation"
Stars: ✭ 153 (-57.14%)
Mutual labels:  transformer, segmentation
Point2Sequence
Point2Sequence: Learning the Shape Representation of 3D Point Clouds with an Attention-based Sequence to Sequence Network
Stars: ✭ 34 (-90.48%)
Mutual labels:  classification, segmentation
TransPose
PyTorch Implementation for "TransPose: Keypoint localization via Transformer", ICCV 2021.
Stars: ✭ 250 (-29.97%)
Mutual labels:  transformer, pose-estimation
dd-ml-segmentation-benchmark
DroneDeploy Machine Learning Segmentation Benchmark
Stars: ✭ 179 (-49.86%)
Mutual labels:  vision, segmentation
FNet-pytorch
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms
Stars: ✭ 204 (-42.86%)
Mutual labels:  transformer, vision
TokenLabeling
Pytorch implementation of "All Tokens Matter: Token Labeling for Training Better Vision Transformers"
Stars: ✭ 385 (+7.84%)
Mutual labels:  transformer, vision
Awesome-Tensorflow2
基于Tensorflow2开发的优秀扩展包及项目
Stars: ✭ 45 (-87.39%)
Mutual labels:  classification, segmentation
nested-transformer
Nested Hierarchical Transformer https://arxiv.org/pdf/2105.12723.pdf
Stars: ✭ 174 (-51.26%)
Mutual labels:  transformer, vision
Conformer
Official code for Conformer: Local Features Coupling Global Representations for Visual Recognition
Stars: ✭ 345 (-3.36%)
Mutual labels:  transformer, classification
Setr Pytorch
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Stars: ✭ 96 (-73.11%)
Mutual labels:  transformer, segmentation
Nlp research
NLP research:基于tensorflow的nlp深度学习项目,支持文本分类/句子匹配/序列标注/文本生成 四大任务
Stars: ✭ 141 (-60.5%)
Mutual labels:  transformer, classification
COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers
Rank 1 / 216
Stars: ✭ 24 (-93.28%)
Mutual labels:  transformer, classification
mmrazor
OpenMMLab Model Compression Toolbox and Benchmark.
Stars: ✭ 644 (+80.39%)
Mutual labels:  classification, segmentation

HRFormer: High-Resolution Transformer for Dense Prediction, NeurIPS 2021

Introduction

This is the official implementation of High-Resolution Transformer (HRFormer). We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks, in contrast to the original Vision Transformer that produces low-resolution representations and has high memory and computational cost. We take advantage of the multi-resolution parallel design introduced in high-resolution convolutional networks (HRNet), along with local-window self-attention that performs self-attention over small non-overlapping image windows, for improving the memory and computation efficiency. In addition, we introduce a convolution into the FFN to exchange information across the disconnected image windows. We demonstrate the effectiveness of the High-Resolution Transformer on human pose estimation and semantic segmentation tasks.

  • The HRFormer architecture:

teaser

  • The HRFormer Unit (trans. unit):

teaser

Pose estimation

2d Human Pose Estimation

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset

Backbone Input Size AP AP50 AP75 ARM ARL AR ckpt log script
HRFormer-S 256x192 74.0% 90.2% 81.2% 70.4% 80.7% 79.4% ckpt log script
HRFormer-S 384x288 75.6% 90.3% 82.2% 71.6% 82.5% 80.7% ckpt log script
HRFormer-B 256x192 75.6% 90.8% 82.8% 71.7% 82.6% 80.8% ckpt log script
HRFormer-B 384x288 77.2% 91.0% 83.6% 73.2% 84.2% 82.0% ckpt log script

Results on COCO test-dev with detector having human AP of 56.4 on COCO val2017 dataset

Backbone Input Size AP AP50 AP75 ARM ARL AR ckpt log script
HRFormer-S 384x288 74.5% 92.3% 82.1% 70.7% 80.6% 79.8% ckpt log script
HRFormer-B 384x288 76.2% 92.7% 83.8% 72.5% 82.3% 81.2% ckpt log script

The models are first pre-trained on ImageNet-1K dataset, and then fine-tuned on COCO val2017 dataset.

Semantic segmentation

Cityscapes

Performance on the Cityscapes dataset. The models are trained and tested with input size of 512x1024 and 1024x2048 respectively.

Methods Backbone Window Size Train Set Test Set Iterations Batch Size OHEM mIoU mIoU (Multi-Scale) Log ckpt script
OCRNet HRFormer-S 7x7 Train Val 80000 8 Yes 80.0 81.0 log ckpt script
OCRNet HRFormer-B 7x7 Train Val 80000 8 Yes 81.4 82.0 log ckpt script
OCRNet HRFormer-B 15x15 Train Val 80000 8 Yes 81.9 82.6 log ckpt script

PASCAL-Context

The models are trained with the input size of 520x520, and tested with original size.

Methods Backbone Window Size Train Set Test Set Iterations Batch Size OHEM mIoU mIoU (Multi-Scale) Log ckpt script
OCRNet HRFormer-S 7x7 Train Val 60000 16 Yes 53.8 54.6 log ckpt script
OCRNet HRFormer-B 7x7 Train Val 60000 16 Yes 56.3 57.1 log ckpt script
OCRNet HRFormer-B 15x15 Train Val 60000 16 Yes 57.6 58.5 log ckpt script

COCO-Stuff

The models are trained with input size of 520x520, and tested with original size.

Methods Backbone Window Size Train Set Test Set Iterations Batch Size OHEM mIoU mIoU (Multi-Scale) Log ckpt script
OCRNet HRFormer-S 7x7 Train Val 60000 16 Yes 37.9 38.9 log ckpt script
OCRNet HRFormer-B 7x7 Train Val 60000 16 Yes 41.6 42.5 log ckpt script
OCRNet HRFormer-B 15x15 Train Val 60000 16 Yes 42.4 43.3 log ckpt script

ADE20K

The models are trained with input size of 520x520, and tested with original size. The results with window size 15x15 will be updated latter.

Methods Backbone Window Size Train Set Test Set Iterations Batch Size OHEM mIoU mIoU (Multi-Scale) Log ckpt script
OCRNet HRFormer-S 7x7 Train Val 150000 8 Yes 44.0 45.1 log ckpt script
OCRNet HRFormer-B 7x7 Train Val 150000 8 Yes 46.3 47.6 log ckpt script
OCRNet HRFormer-B 13x13 Train Val 150000 8 Yes 48.7 50.0 log ckpt script
OCRNet HRFormer-B 15x15 Train Val 150000 8 Yes - - - - -

Classification

Results on ImageNet-1K

Backbone acc@1 acc@5 #params FLOPs ckpt log script
HRFormer-T 78.6% 94.2% 8.0M 1.83G ckpt log script
HRFormer-S 81.2% 95.6% 13.5M 3.56G ckpt log script
HRFormer-B 82.8% 96.3% 50.3M 13.71G ckpt log script

Citation

If you find this project useful in your research, please consider cite:

@article{YuanFHLZCW21,
  title={HRFormer: High-Resolution Transformer for Dense Prediction},
  author={Yuhui Yuan and Rao Fu and Lang Huang and Weihong Lin and Chao Zhang and Xilin Chen and Jingdong Wang},
  booktitle={NeurIPS},
  year={2021}
}

Acknowledgment

This project is developed based on the Swin-Transformer, openseg.pytorch, and mmpose.

git diff-index HEAD
git subtree add -P pose <url to sub-repo> <sub-repo branch>
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].