1. VitAn Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
2. Setr PytorchRethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
4. pytorch-vitAn Image is Worth 16x16 Words: Transformers for Image Recognition at Scale