All Projects → hustvl → YOLOS

hustvl / YOLOS

Licence: MIT license
You Only Look at One Sequence (NeurIPS 2021)

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to YOLOS

image-classification
A collection of SOTA Image Classification Models in PyTorch
Stars: ✭ 70 (-88.56%)
Mutual labels:  transformer, vision-transformer
LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Stars: ✭ 1,566 (+155.88%)
Mutual labels:  transformer, vision-transformer
towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
Stars: ✭ 821 (+34.15%)
Mutual labels:  transformer, vision-transformer
TransMorph Transformer for Medical Image Registration
TransMorph: Transformer for Unsupervised Medical Image Registration (PyTorch)
Stars: ✭ 130 (-78.76%)
Mutual labels:  transformer, vision-transformer
keras-vision-transformer
The Tensorflow, Keras implementation of Swin-Transformer and Swin-UNET
Stars: ✭ 91 (-85.13%)
Mutual labels:  transformer, vision-transformer
SwinIR
SwinIR: Image Restoration Using Swin Transformer (official repository)
Stars: ✭ 1,260 (+105.88%)
Mutual labels:  transformer, vision-transformer
visualization
a collection of visualization function
Stars: ✭ 189 (-69.12%)
Mutual labels:  transformer, vision-transformer
transformer-ls
Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).
Stars: ✭ 201 (-67.16%)
Mutual labels:  transformer, vision-transformer
VT-UNet
[MICCAI2022] This is an official PyTorch implementation for A Robust Volumetric Transformer for Accurate 3D Tumor Segmentation
Stars: ✭ 151 (-75.33%)
Mutual labels:  transformer, vision-transformer
Ghostnet
CV backbones including GhostNet, TinyNet and TNT, developed by Huawei Noah's Ark Lab.
Stars: ✭ 1,744 (+184.97%)
Mutual labels:  transformer, vision-transformer
libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Stars: ✭ 284 (-53.59%)
Mutual labels:  transformer, vision-transformer
semantic-segmentation
SOTA Semantic Segmentation Models in PyTorch
Stars: ✭ 464 (-24.18%)
Mutual labels:  transformer, vision-transformer
set-transformer
A neural network architecture for prediction on sets
Stars: ✭ 18 (-97.06%)
Mutual labels:  transformer
dodrio
Exploring attention weights in transformer-based models with linguistic knowledge.
Stars: ✭ 233 (-61.93%)
Mutual labels:  transformer
paccmann proteomics
PaccMann models for protein language modeling
Stars: ✭ 28 (-95.42%)
Mutual labels:  transformer
Kevinpro-NLP-demo
All NLP you Need Here. 个人实现了一些好玩的NLP demo,目前包含13个NLP应用的pytorch实现
Stars: ✭ 117 (-80.88%)
Mutual labels:  transformer
wxml-transformer
将微信小程序的wxml代码转换成js object或html片段
Stars: ✭ 18 (-97.06%)
Mutual labels:  transformer
ICON
(TPAMI2022) Salient Object Detection via Integrity Learning.
Stars: ✭ 125 (-79.58%)
Mutual labels:  transformer
Evo-ViT
Official implement of Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer
Stars: ✭ 50 (-91.83%)
Mutual labels:  vision-transformer
NLP-paper
🎨 🎨NLP 自然语言处理教程 🎨🎨 https://dataxujing.github.io/NLP-paper/
Stars: ✭ 23 (-96.24%)
Mutual labels:  transformer

You Only 👀 One Sequence

TL;DR: We study the transferability of the vanilla ViT pre-trained on mid-sized ImageNet-1k to the more challenging COCO object detection benchmark.

👨‍💻 This project is under active development 👩‍💻 :

  • May 4, 2022: 👀YOLOS is now available in 🤗HuggingFace Transformers!

  • Apr 8, 2022: If you like YOLOS, you might also like MIMDet (paper / code & models)! MIMDet can efficiently and effectively adapt a masked image modeling (MIM) pre-trained vanilla Vision Transformer (ViT) for high-performance object detection (51.5 box AP and 46.0 mask AP on COCO using ViT-Base & Mask R-CNN).

  • Oct 28, 2021: YOLOS receives an update for the NeurIPS 2021 camera-ready version. We add MoCo-v3 self-supervised pre-traineing results, study the impacts of detaching [Det] tokens, as well as add a new Discussion Section.

  • Sep 29, 2021: YOLOS is accepted to NeurIPS 2021!

  • Jun 22, 2021: We update our manuscript on arXiv including discussion about position embeddings and more visualizations, check it out!

  • Jun 9, 2021: We add a notebook to to visualize self-attention maps of [Det] tokens on different heads of the last layer, check it out!

You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection

by Yuxin Fang1 *, Bencheng Liao1 *, Xinggang Wang1 📧, Jiemin Fang2, 1, Jiyang Qi1, Rui Wu3, Jianwei Niu3, Wenyu Liu1.

1 School of EIC, HUST, 2 Institute of AI, HUST, 3 Horizon Robotics.

(*) equal contribution, (📧) corresponding author.

arXiv technical report (arXiv 2106.00666)


You Only Look at One Sequence (YOLOS)

The Illustration of YOLOS

yolos

Highlights

Directly inherited from ViT (DeiT), YOLOS is not designed to be yet another high-performance object detector, but to unveil the versatility and transferability of Transformer from image recognition to object detection. Concretely, our main contributions are summarized as follows:

  • We use the mid-sized ImageNet-1k as the sole pre-training dataset, and show that a vanilla ViT (DeiT) can be successfully transferred to perform the challenging object detection task and produce competitive COCO results with the fewest possible modifications, i.e., by only looking at one sequence (YOLOS).

  • We demonstrate that 2D object detection can be accomplished in a pure sequence-to-sequence manner by taking a sequence of fixed-sized non-overlapping image patches as input. Among existing object detectors, YOLOS utilizes minimal 2D inductive biases. Moreover, it is feasible for YOLOS to perform object detection in any dimensional space unaware the exact spatial structure or geometry.

  • For ViT (DeiT), we find the object detection results are quite sensitive to the pre-train scheme and the detection performance is far from saturating. Therefore the proposed YOLOS can be used as a challenging benchmark task to evaluate different pre-training strategies for ViT (DeiT).

  • We also discuss the impacts as wel as the limitations of prevalent pre-train schemes and model scaling strategies for Transformer in vision through transferring to object detection.

Results

Model Pre-train Epochs ViT (DeiT) Weight / Log Fine-tune Epochs Eval Size YOLOS Checkpoint / Log AP @ COCO val
YOLOS-Ti 300 FB 300 512 Baidu Drive, Google Drive / Log 28.7
YOLOS-S 200 Baidu Drive, Google Drive / Log 150 800 Baidu Drive, Google Drive / Log 36.1
YOLOS-S 300 FB 150 800 Baidu Drive, Google Drive / Log 36.1
YOLOS-S (dWr) 300 Baidu Drive, Google Drive / Log 150 800 Baidu Drive, Google Drive / Log 37.6
YOLOS-B 1000 FB 150 800 Baidu Drive, Google Drive / Log 42.0

Notes:

  • The access code for Baidu Drive is yolo.
  • The FB stands for model weights provided by DeiT (paper, code). Thanks for their wonderful works.
  • We will update other models in the future, please stay tuned :)

Requirement

This codebase has been developed with python version 3.6, PyTorch 1.5+ and torchvision 0.6+:

conda install -c pytorch pytorch torchvision

Install pycocotools (for evaluation on COCO) and scipy (for training):

conda install cython scipy
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

Data preparation

Download and extract COCO 2017 train and val images with annotations from http://cocodataset.org. We expect the directory structure to be the following:

path/to/coco/
  annotations/  # annotation json files
  train2017/    # train images
  val2017/      # val images

Training

Before finetuning on COCO, you need download the ImageNet pretrained model to the /path/to/YOLOS/ directory

To train the YOLOS-Ti model in the paper, run this command:

python -m torch.distributed.launch \
    --nproc_per_node=8 \
    --use_env main.py \
    --coco_path /path/to/coco
    --batch_size 2 \
    --lr 5e-5 \
    --epochs 300 \
    --backbone_name tiny \
    --pre_trained /path/to/deit-tiny.pth\
    --eval_size 512 \
    --init_pe_size 800 1333 \
    --output_dir /output/path/box_model
To train the YOLOS-S model with 200 epoch pretrained Deit-S in the paper, run this command:

python -m torch.distributed.launch
--nproc_per_node=8
--use_env main.py
--coco_path /path/to/coco --batch_size 1
--lr 2.5e-5
--epochs 150
--backbone_name small
--pre_trained /path/to/deit-small-200epoch.pth
--eval_size 800
--init_pe_size 512 864
--mid_pe_size 512 864
--output_dir /output/path/box_model

To train the YOLOS-S model with 300 epoch pretrained Deit-S in the paper, run this command:

python -m torch.distributed.launch \
    --nproc_per_node=8 \
    --use_env main.py \
    --coco_path /path/to/coco
    --batch_size 1 \
    --lr 2.5e-5 \
    --epochs 150 \
    --backbone_name small \
    --pre_trained /path/to/deit-small-300epoch.pth\
    --eval_size 800 \
    --init_pe_size 512 864 \
    --mid_pe_size 512 864 \
    --output_dir /output/path/box_model

To train the YOLOS-S (dWr) model in the paper, run this command:

python -m torch.distributed.launch \
    --nproc_per_node=8 \
    --use_env main.py \
    --coco_path /path/to/coco
    --batch_size 1 \
    --lr 2.5e-5 \
    --epochs 150 \
    --backbone_name small_dWr \
    --pre_trained /path/to/deit-small-dWr-scale.pth\
    --eval_size 800 \
    --init_pe_size 512 864 \
    --mid_pe_size 512 864 \
    --output_dir /output/path/box_model
To train the YOLOS-B model in the paper, run this command:

python -m torch.distributed.launch \
    --nproc_per_node=8 \
    --use_env main.py \
    --coco_path /path/to/coco
    --batch_size 1 \
    --lr 2.5e-5 \
    --epochs 150 \
    --backbone_name base \
    --pre_trained /path/to/deit-base.pth\
    --eval_size 800 \
    --init_pe_size 800 1344 \
    --mid_pe_size 800 1344 \
    --output_dir /output/path/box_model

Evaluation

To evaluate YOLOS-Ti model on COCO, run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path /path/to/coco --batch_size 2 --backbone_name tiny --eval --eval_size 512 --init_pe_size 800 1333 --resume /path/to/YOLOS-Ti

To evaluate YOLOS-S model on COCO, run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path /path/to/coco --batch_size 1 --backbone_name small --eval --eval_size 800 --init_pe_size 512 864 --mid_pe_size 512 864 --resume /path/to/YOLOS-S

To evaluate YOLOS-S (dWr) model on COCO, run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path /path/to/coco --batch_size 1 --backbone_name small_dWr --eval --eval_size 800 --init_pe_size 512 864 --mid_pe_size 512 864 --resume /path/to/YOLOS-S(dWr)

To evaluate YOLOS-B model on COCO, run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path /path/to/coco --batch_size 1 --backbone_name base --eval --eval_size 800 --init_pe_size 800 1344 --mid_pe_size 800 1344 --resume /path/to/YOLOS-B

Visualization

  • Visualize box prediction and object categories distribution:
  1. To Get visualization in the paper, you need the finetuned YOLOS models on COCO, run following command to get 100 Det-Toks prediction on COCO val split, then it will generate /path/to/YOLOS/visualization/modelname-eval-800-eval-pred.json
python cocoval_predjson_generation.py --coco_path /path/to/coco --batch_size 1 --backbone_name small --eval --eval_size 800 --init_pe_size 512 864 --mid_pe_size 512 864 --resume /path/to/yolos-s-model.pth --output_dir ./visualization
  1. To get all ground truth object categories on all images from COCO val split, run following command to generate /path/to/YOLOS/visualization/coco-valsplit-cls-dist.json
python cocoval_gtclsjson_generation.py --coco_path /path/to/coco --batch_size 1 --output_dir ./visualization
  1. To visualize the distribution of Det-Toks' bboxs and categories, run following command to generate .png files in /path/to/YOLOS/visualization/
 python visualize_dettoken_dist.py --visjson /path/to/YOLOS/visualization/modelname-eval-800-eval-pred.json --cococlsjson /path/to/YOLOS/visualization/coco-valsplit-cls-dist.json

cls cls

Det-Tok-41 Det-Tok-96

Acknowledgement ❤️

This project is based on DETR (paper, code), DeiT (paper, code), DINO (paper, code) and timm. Thanks for their wonderful works.

Citation

If you find our paper and code useful in your research, please consider giving a star and citation 📝 :

@article{YOLOS,
  title={You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection},
  author={Fang, Yuxin and Liao, Bencheng and Wang, Xinggang and Fang, Jiemin and Qi, Jiyang and Wu, Rui and Niu, Jianwei and Liu, Wenyu},
  journal={arXiv preprint arXiv:2106.00666},
  year={2021}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].