All Projects → Alibaba-MIIL → ImageNet21K

Alibaba-MIIL / ImageNet21K

Licence: MIT license
Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(NeurIPS, 2021) paper

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to ImageNet21K

Eartrumpet
EarTrumpet - Volume Control for Windows
Stars: ✭ 4,761 (+742.65%)
Mutual labels:  mixer
Revisiting-Contrastive-SSL
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [NeurIPS 2021]
Stars: ✭ 81 (-85.66%)
Mutual labels:  pretraining
multi-label-text-classification
Mutli-label text classification using ConvNet and graph embedding (Tensorflow implementation)
Stars: ✭ 44 (-92.21%)
Mutual labels:  multi-label-classification
COCO-LM
[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
Stars: ✭ 109 (-80.71%)
Mutual labels:  pretraining
keras-vision-transformer
The Tensorflow, Keras implementation of Swin-Transformer and Swin-UNET
Stars: ✭ 91 (-83.89%)
Mutual labels:  vision-transformer
GalaXC
GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification
Stars: ✭ 28 (-95.04%)
Mutual labels:  multi-label-classification
jack mixer
A multi-channel audio mixer desktop application for the JACK Audio Connection Kit.
Stars: ✭ 66 (-88.32%)
Mutual labels:  mixer
DECAF
DECAF: Deep Extreme Classification with Label Features
Stars: ✭ 46 (-91.86%)
Mutual labels:  multi-label-classification
mybabe
MyBB CAPTCHA Solver using Convolutional Neural Network in Keras
Stars: ✭ 18 (-96.81%)
Mutual labels:  multi-label-classification
transformer-ls
Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).
Stars: ✭ 201 (-64.42%)
Mutual labels:  vision-transformer
VT-UNet
[MICCAI2022] This is an official PyTorch implementation for A Robust Volumetric Transformer for Accurate 3D Tumor Segmentation
Stars: ✭ 151 (-73.27%)
Mutual labels:  vision-transformer
janus-ftl-plugin
A plugin for the Janus WebRTC gateway to enable relaying of audio/video streams utilizing Mixer's FTL (Faster-Than-Light) protocol.
Stars: ✭ 39 (-93.1%)
Mutual labels:  mixer
MCAR
Learning to Discover Multi-Class Attentional Regions for Multi-Label Image Recognition
Stars: ✭ 32 (-94.34%)
Mutual labels:  multi-label-classification
Ffmpegandroid
android端基于FFmpeg实现音频剪切、拼接、转码、编解码;视频剪切、水印、截图、转码、编解码、转Gif动图;音视频合成与分离,配音;音视频解码、同步与播放;FFmpeg本地推流、H264与RTMP实时推流直播;FFmpeg滤镜:素描、色彩平衡、hue、lut、模糊、九宫格等;歌词解析与显示
Stars: ✭ 2,858 (+405.84%)
Mutual labels:  mixer
Generative MLZSL
[TPAMI Under Submission] Generative Multi-Label Zero-Shot Learning
Stars: ✭ 37 (-93.45%)
Mutual labels:  multi-label-classification
FLNI KK
Creating a FL Midi Script for the Komplete Kontrol M-Series and A-Series
Stars: ✭ 44 (-92.21%)
Mutual labels:  mixer
libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Stars: ✭ 284 (-49.73%)
Mutual labels:  vision-transformer
mobilevit-pytorch
A PyTorch implementation of "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".
Stars: ✭ 349 (-38.23%)
Mutual labels:  vision-transformer
strumpract
Various tools for musicians.
Stars: ✭ 20 (-96.46%)
Mutual labels:  mixer
adonis-ally-extended
Additional auth providers for Adonis ally package
Stars: ✭ 15 (-97.35%)
Mutual labels:  mixer

ImageNet-21K Pretraining for the Masses

PWC
PWC
PWC

Paper | Pretrained models

Official PyTorch Implementation

Tal Ridnik, Emanuel Ben-Baruch, Asaf Noy, Lihi Zelnik-Manor
DAMO Academy, Alibaba Group

Abstract

ImageNet-1K serves as the primary dataset for pretraining deep learning models for computer vision tasks. ImageNet-21K dataset, which contains more pictures and classes, is used less frequently for pretraining, mainly due to its complexity, and underestimation of its added value compared to standard ImageNet-1K pretraining. This paper aims to close this gap, and make high-quality efficient pretraining on ImageNet-21K available for everyone. Via a dedicated preprocessing stage, utilizing WordNet hierarchies, and a novel training scheme called semantic softmax, we show that different models, including small mobile-oriented models, significantly benefit from ImageNet-21K pretraining on numerous datasets and tasks. We also show that we outperform previous ImageNet-21K pretraining schemes for prominent new models like ViT. Our proposed pretraining pipeline is efficient, accessible, and leads to SoTA reproducible results, from a publicly available dataset.

29/11/2021 Update - New article released, offering new classification head with state-of-the-art results

Checkout our new project, Ml-Decoder, which presents a unified classification head for multi-label, single-label and zero-shot tasks. Backbones with ML-Decoder reach SOTA results, while also improving speed-accuracy tradeoff.

01/08/2021 Update - "ImageNet-21K Pretraining for the Masses" Was Accepted to NeurIPS 2021 !

We are very happy to announce that "ImageNet-21K Pretraining for the Masses" article was accepted to NeurIPS 2021 (Datasets and Benchmarks Track). OpenReview is available here.

Getting Started

(0) Visualization and Inference Script

First you can play and do inference on dedicated images using the following script. An example result:

(1) Pretrained Models on ImageNet-21K-P Dataset

Backbone ImageNet-21K-P semantic
top-1 Accuracy
[%]
ImageNet-1K
top-1 Accuracy
[%]
Maximal
batch size
Maximal
training speed
(img/sec)
Maximal
inference speed
(img/sec)
MobilenetV3_large_100 73.1 78.0 488 1210 5980
OFA_flops_595m_s 75.0 81.0 288 500 3240
ResNet50 75.6 82.0 320 720 2760
TResNet-M 76.4 83.1 520 670 2970
TResNet-L (V2) 76.7 83.9 240 300 1460
ViT-B-16 77.6 84.4 160 340 1140

See here for more details.
We highly recommend to start working with ImageNet-21K by testing these weights against standard ImageNet-1K pretraining, and comparing results on your relevant downstream tasks. After you will see a significant improvement, proceed to pretraining new models.

Note that some of our models, with 21K and 1K pretraining, are also avaialbe via the excellent timm package:

21K:
model = timm.create_model('mobilenetv3_large_100_miil_in21k', pretrained=True)
model = timm.create_model('tresnet_m_miil_in21k', pretrained=True)
model = timm.create_model('vit_base_patch16_224_miil_in21k', pretrained=True)
model = timm.create_model('mixer_b16_224_miil_in21k', pretrained=True)


1K:
model = timm.create_model('mobilenetv3_large_100_miil', pretrained=True)
model = timm.create_model('tresnet_m', pretrained=True)
model = timm.create_model('vit_base_patch16_224_miil', pretrained=True)
model = timm.create_model('mixer_b16_224_miil', pretrained=True)

Using this link you can make sure we indeed reach the reported accuracies in the article.

(2) Obtaining and Processing the Dataset

See instructions for obtaining and processing the dataset in here.

(3) Training Code

To use the training code, first download the relevant ImageNet-21K-P semantic tree file to your local ./resources/ folder:

Example of semantic softmax training:

python train_semantic_softmax.py \
--batch_size=4 \
--data_path=/mnt/datasets/21k \
--model_name=mobilenetv3_large_100 \
--model_path=/mnt/models/mobilenetv3_large_100.pth \
--epochs=80

For shortening the training, we initialize the weights from standard ImageNet-1K. Recommended to use ImageNet-1K weights from timm repo.

(4) Transfer Learning Code

See here for reproduction code, that show how miil pretraining not only improves transfer learning results, but also makes MLP models more stable and robust for hyper-parameters selection.

Additional SoTA results

The results in the article are comparative results, with fixed hyper-parameters. In addition, using our pretrained models, and a dedicated training scheme with adjusted hyper-parameters per dataset (resolution, optimizer, learning rate), we were able to achieve SoTA results on several computer vision dataset - MS-COCO, Pascal-VOC, Stanford Cars and CIFAR-100.

We will share our models' checkpoints to validate our scores.

Citation

@misc{ridnik2021imagenet21k,
      title={ImageNet-21K Pretraining for the Masses}, 
      author={Tal Ridnik and Emanuel Ben-Baruch and Asaf Noy and Lihi Zelnik-Manor},
      year={2021},
      eprint={2104.10972},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].