All Projects → microsoft → SimMIM

microsoft / SimMIM

Licence: MIT license
This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to SimMIM

SCL
📄 Spatial Contrastive Learning for Few-Shot Classification (ECML/PKDD 2021).
Stars: ✭ 42 (-94.14%)
Mutual labels:  image-classification, self-supervised-learning
PASSL
PASSL包含 SimCLR,MoCo v1/v2,BYOL,CLIP,PixPro,BEiT,MAE等图像自监督算法以及 Vision Transformer,DEiT,Swin Transformer,CvT,T2T-ViT,MLP-Mixer,XCiT,ConvNeXt,PVTv2 等基础视觉算法
Stars: ✭ 134 (-81.31%)
Mutual labels:  self-supervised-learning, swin-transformer
Swin-Transformer
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
Stars: ✭ 8,046 (+1022.18%)
Mutual labels:  image-classification, swin-transformer
mae-scalable-vision-learners
A TensorFlow 2.x implementation of Masked Autoencoders Are Scalable Vision Learners
Stars: ✭ 54 (-92.47%)
Mutual labels:  self-supervised-learning, masked-image-modeling
al-fk-self-supervision
Official PyTorch code for CVPR 2020 paper "Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision"
Stars: ✭ 28 (-96.09%)
Mutual labels:  image-classification, self-supervised-learning
Dataturks
ML data annotations made super easy for teams. Just upload data, add your team and build training/evaluation dataset in hours.
Stars: ✭ 200 (-72.11%)
Mutual labels:  image-classification
Googlenet Inception
TensorFlow implementation of GoogLeNet and Inception for image classification.
Stars: ✭ 230 (-67.92%)
Mutual labels:  image-classification
Deep Learning With Python
Deep learning codes and projects using Python
Stars: ✭ 195 (-72.8%)
Mutual labels:  image-classification
Nude.js
Nudity detection with JavaScript and HTMLCanvas
Stars: ✭ 2,236 (+211.85%)
Mutual labels:  image-classification
Computer Vision Guide
📖 This guide is to help you understand the basics of the computerized image and develop computer vision projects with OpenCV. Includes Python, Java, JavaScript, C# and C++ examples.
Stars: ✭ 244 (-65.97%)
Mutual labels:  image-classification
Ml Classifier Ui
A UI tool for quickly training image classifiers in the browser
Stars: ✭ 224 (-68.76%)
Mutual labels:  image-classification
Group Normalization Tensorflow
A TensorFlow implementation of Group Normalization on the task of image classification
Stars: ✭ 205 (-71.41%)
Mutual labels:  image-classification
Wildcat.pytorch
PyTorch implementation of "WILDCAT: Weakly Supervised Learning of Deep ConvNets for Image Classification, Pointwise Localization and Segmentation", CVPR 2017
Stars: ✭ 238 (-66.81%)
Mutual labels:  image-classification
Imageatm
Image classification for everyone.
Stars: ✭ 201 (-71.97%)
Mutual labels:  image-classification
Pyramidnet
Pytorch implementation of pyramidnet
Stars: ✭ 27 (-96.23%)
Mutual labels:  image-classification
Deepdetect
Deep Learning API and Server in C++14 support for Caffe, Caffe2, PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE
Stars: ✭ 2,306 (+221.62%)
Mutual labels:  image-classification
Nfnets Pytorch
NFNets and Adaptive Gradient Clipping for SGD implemented in PyTorch
Stars: ✭ 215 (-70.01%)
Mutual labels:  image-classification
Mobilenetv2 Pytorch
Impementation of MobileNetV2 in pytorch
Stars: ✭ 241 (-66.39%)
Mutual labels:  image-classification
Pixel level land classification
Tutorial demonstrating how to create a semantic segmentation (pixel-level classification) model to predict land cover from aerial imagery. This model can be used to identify newly developed or flooded land. Uses ground-truth labels and processed NAIP imagery provided by the Chesapeake Conservancy.
Stars: ✭ 217 (-69.74%)
Mutual labels:  image-classification
Transfer Learning Suite
Transfer Learning Suite in Keras. Perform transfer learning using any built-in Keras image classification model easily!
Stars: ✭ 212 (-70.43%)
Mutual labels:  image-classification

SimMIM

By Zhenda Xie*, Zheng Zhang*, Yue Cao*, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai and Han Hu*.

This repo is the official implementation of "SimMIM: A Simple Framework for Masked Image Modeling".

Updates

09/29/2022

SimMIM was merged to Swin Transformer repo on GitHub.

03/02/2022

SimMIM got accepted by CVPR 2022. SimMIM was used in "Swin Transformer V2" to alleviate the data hungry problem for large-scale vision model training.

12/09/2021

Initial commits:

  1. Pre-trained and fine-tuned models on ImageNet-1K (Swin Base, Swin Large, and ViT Base) are provided.
  2. The supported code for ImageNet-1K pre-training and fine-tuneing is provided.

Introduction

SimMIM is initially described in arxiv, which serves as a simple framework for masked image modeling. From systematically study, we find that simple designs of each component have revealed very strong representation learning performance: 1) random masking of the input image with a moderately large masked patch size (e.g., 32) makes a strong pre-text task; 2) predicting raw pixels of RGB values by direct regression performs no worse than the patch classification approaches with complex designs; 3) the prediction head can be as light as a linear layer, with no worse performance than heavier ones.

Main Results on ImageNet

Swin Transformer

ImageNet-1K Pre-trained and Fine-tuned Models

name pre-train epochs pre-train resolution fine-tune resolution acc@1 pre-trained model fine-tuned model
Swin-Base 100 192x192 192x192 82.8 google/config google/config
Swin-Base 100 192x192 224x224 83.5 google/config google/config
Swin-Base 800 192x192 224x224 84.0 google/config google/config
Swin-Large 800 192x192 224x224 85.4 google/config google/config
SwinV2-Huge 800 192x192 224x224 85.7 / /
SwinV2-Huge 800 192x192 512x512 87.1 / /

Vision Transformer

ImageNet-1K Pre-trained and Fine-tuned Models

name pre-train epochs pre-train resolution fine-tune resolution acc@1 pre-trained model fine-tuned model
ViT-Base 800 224x224 224x224 83.8 google/config google/config

Citing SimMIM

@inproceedings{xie2021simmim,
  title={SimMIM: A Simple Framework for Masked Image Modeling},
  author={Xie, Zhenda and Zhang, Zheng and Cao, Yue and Lin, Yutong and Bao, Jianmin and Yao, Zhuliang and Dai, Qi and Hu, Han},
  booktitle={International Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2022}
}

Getting Started

Installation

  • Install CUDA 11.3 with cuDNN 8 following the official installation guide of CUDA and cuDNN.

  • Setup conda environment:

# Create environment
conda create -n SimMIM python=3.8 -y
conda activate SimMIM

# Install requirements
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch -y

# Install apex
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
cd ..

# Clone SimMIM
git clone https://github.com/microsoft/SimMIM
cd SimMIM

# Install other requirements
pip install -r requirements.txt

Evaluating provided models

To evaluate a provided model on ImageNet validation set, run:

python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> main_finetune.py \
--eval --cfg <config-file> --resume <checkpoint> --data-path <imagenet-path>

For example, to evaluate the Swin Base model on a single GPU, run:

python -m torch.distributed.launch --nproc_per_node 1 main_finetune.py \
--eval --cfg configs/swin_base__800ep/simmim_finetune__swin_base__img224_window7__800ep.yaml --resume simmim_finetune__swin_base__img224_window7__800ep.pth --data-path <imagenet-path>

Pre-training with SimMIM

To pre-train models with SimMIM, run:

python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> main_simmim.py \ 
--cfg <config-file> --data-path <imagenet-path>/train [--batch-size <batch-size-per-gpu> --output <output-directory> --tag <job-tag>]

For example, to pre-train Swin Base for 800 epochs on one DGX-2 server, run:

python -m torch.distributed.launch --nproc_per_node 16 main_simmim.py \ 
--cfg configs/swin_base__800ep/simmim_pretrain__swin_base__img192_window6__800ep.yaml --batch-size 128 --data-path <imagenet-path>/train [--output <output-directory> --tag <job-tag>]

Fine-tuning pre-trained models

To fine-tune models pre-trained by SimMIM, run:

python -m torch.distributed.launch --nproc_per_node <num-of-gpus-to-use> main_finetune.py \ 
--cfg <config-file> --data-path <imagenet-path> --pretrained <pretrained-ckpt> [--batch-size <batch-size-per-gpu> --output <output-directory> --tag <job-tag>]

For example, to fine-tune Swin Base pre-trained by SimMIM on one DGX-2 server, run:

python -m torch.distributed.launch --nproc_per_node 16 main_finetune.py \ 
--cfg configs/swin_base__800ep/simmim_finetune__swin_base__img224_window7__800ep.yaml --batch-size 128 --data-path <imagenet-path> --pretrained <pretrained-ckpt> [--output <output-directory> --tag <job-tag>]

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].