All Projects → YyzHarry → Imbalanced Semi Self

YyzHarry / Imbalanced Semi Self

Licence: mit
[NeurIPS 2020] Semi-Supervision (Unlabeled Data) & Self-Supervision Improve Class-Imbalanced / Long-Tailed Learning

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Imbalanced Semi Self

AdversarialAudioSeparation
Code accompanying the paper "Semi-supervised adversarial audio source separation applied to singing voice extraction"
Stars: ✭ 70 (-81.53%)
Mutual labels:  semi-supervised-learning
CsiGAN
An implementation for our paper: CsiGAN: Robust Channel State Information-based Activity Recognition with GANs (IEEE Internet of Things Journal, 2019), which is the semi-supervised Generative Adversarial Network (GAN) for Channel State Information (CSI) -based activity recognition.
Stars: ✭ 23 (-93.93%)
Mutual labels:  semi-supervised-learning
L2c
Learning to Cluster. A deep clustering strategy.
Stars: ✭ 262 (-30.87%)
Mutual labels:  semi-supervised-learning
improving segmentation with selfsupervised depth
[CVPR21] Implementation of our work "Three Ways to Improve Semantic Segmentation with Self-Supervised Depth Estimation"
Stars: ✭ 189 (-50.13%)
Mutual labels:  semi-supervised-learning
catgan pytorch
Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks
Stars: ✭ 50 (-86.81%)
Mutual labels:  semi-supervised-learning
DST-CBC
Implementation of our paper "DMT: Dynamic Mutual Training for Semi-Supervised Learning"
Stars: ✭ 98 (-74.14%)
Mutual labels:  semi-supervised-learning
Context-Aware-Consistency
Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CVPR 2021)
Stars: ✭ 121 (-68.07%)
Mutual labels:  semi-supervised-learning
Tape
Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology.
Stars: ✭ 295 (-22.16%)
Mutual labels:  semi-supervised-learning
Temporal-Ensembling-for-Semi-Supervised-Learning
Implementation of Temporal Ensembling for Semi-Supervised Learning by Laine et al. with tensorflow eager execution
Stars: ✭ 49 (-87.07%)
Mutual labels:  semi-supervised-learning
HyperGBM
A full pipeline AutoML tool for tabular data
Stars: ✭ 172 (-54.62%)
Mutual labels:  semi-supervised-learning
Pseudo-Label-Keras
Pseudo-Label: Semi-Supervised Learning on CIFAR-10 in Keras
Stars: ✭ 36 (-90.5%)
Mutual labels:  semi-supervised-learning
fixmatch-pytorch
90%+ with 40 labels. please see the readme for details.
Stars: ✭ 27 (-92.88%)
Mutual labels:  semi-supervised-learning
DiGCN
Implement of DiGCN, NeurIPS-2020
Stars: ✭ 25 (-93.4%)
Mutual labels:  semi-supervised-learning
spear
SPEAR: Programmatically label and build training data quickly.
Stars: ✭ 81 (-78.63%)
Mutual labels:  semi-supervised-learning
Fixmatch Pytorch
Unofficial PyTorch implementation of "FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence"
Stars: ✭ 259 (-31.66%)
Mutual labels:  semi-supervised-learning
semi-supervised-NFs
Code for the paper Semi-Conditional Normalizing Flows for Semi-Supervised Learning
Stars: ✭ 23 (-93.93%)
Mutual labels:  semi-supervised-learning
SSL CR Histo
Official code for "Self-Supervised driven Consistency Training for Annotation Efficient Histopathology Image Analysis" Published in Medical Image Analysis (MedIA) Journal, Oct, 2021.
Stars: ✭ 32 (-91.56%)
Mutual labels:  semi-supervised-learning
Ssl4mis
Semi Supervised Learning for Medical Image Segmentation, a collection of literature reviews and code implementations.
Stars: ✭ 336 (-11.35%)
Mutual labels:  semi-supervised-learning
Fewshot gan Unet3d
Tensorflow implementation of our paper: Few-shot 3D Multi-modal Medical Image Segmentation using Generative Adversarial Learning
Stars: ✭ 272 (-28.23%)
Mutual labels:  semi-supervised-learning
SHOT-plus
code for our TPAMI 2021 paper "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer"
Stars: ✭ 46 (-87.86%)
Mutual labels:  semi-supervised-learning

Rethinking the Value of Labels for Improving Class-Imbalanced Learning

This repository contains the implementation code for paper:
Rethinking the Value of Labels for Improving Class-Imbalanced Learning
Yuzhe Yang, and Zhi Xu
34th Conference on Neural Information Processing Systems (NeurIPS), 2020
[Website] [arXiv] [Paper] [Slides] [Video]

If you find this code or idea useful, please consider citing our work:

@inproceedings{yang2020rethinking,
  title={Rethinking the Value of Labels for Improving Class-Imbalanced Learning},
  author={Yang, Yuzhe and Xu, Zhi},
  booktitle={Conference on Neural Information Processing Systems (NeurIPS)},
  year={2020}
}

Overview

In this work, we show theoretically and empirically that, both semi-supervised learning (using unlabeled data) and self-supervised pre-training (first pre-train the model with self-supervision) can substantially improve the performance on imbalanced (long-tailed) datasets, regardless of the imbalanceness on labeled/unlabeled data and the base training techniques.

Semi-Supervised Imbalanced Learning: Using unlabeled data helps to shape clearer class boundaries and results in better class separation, especially for the tail classes. semi

Self-Supervised Imbalanced Learning: Self-supervised pre-training (SSP) helps mitigate the tail classes leakage during testing, which results in better learned boundaries and representations. self

Installation

Prerequisites

Dependencies

  • PyTorch (>= 1.2, tested on 1.4)
  • yaml
  • scikit-learn
  • TensorboardX

Code Overview

Main Files

Main Arguments

  • --dataset: name of chosen long-tailed dataset
  • --imb_factor: imbalance factor (inverse value of imbalance ratio \rho in the paper)
  • --imb_factor_unlabel: imbalance factor for unlabeled data (inverse value of unlabel imbalance ratio \rho_U)
  • --pretrained_model: path to self-supervised pre-trained models
  • --resume: path to resume checkpoint (also for evaluation)

Getting Started

Semi-Supervised Imbalanced Learning

Unlabeled data sourcing

CIFAR-10-LT: CIFAR-10 unlabeled data is prepared following this repo using the 80M TinyImages. In short, a data sourcing model is trained to distinguish CIFAR-10 classes and an "non-CIFAR" class. For each class, images are then ranked based on the prediction confidence, and unlabeled (imbalanced) datasets are constructed accordingly. Use the following link to download the prepared unlabeled data, and place in your data_path:

SVHN-LT: Since its own dataset contains an extra part with 531.1K additional (labeled) samples, they are directly used to simulate the unlabeled dataset.

Note that the class imbalance in unlabeled data is also considered, which is controlled by --imb_factor_unlabel (\rho_U in the paper). See imbalance_cifar.py and imbalance_svhn.py for details.

Semi-supervised learning with pseudo-labeling

To perform pseudo-labeling (self-training), first a base classifier is trained on original imbalanced dataset. With the trained base classifier, pseudo-labels can be generated using

python gen_pseudolabels.py --resume <ckpt-path> --data_dir <data_path> --output_dir <output_path> --output_filename <save_name>

We provide generated pseudo label files for CIFAR-10-LT & SVHN-LT with \rho=50, using base models trained with standard cross-entropy (CE) loss:

To train with unlabeled data, for example, on CIFAR-10-LT with \rho=50 and \rho_U=50

python train_semi.py --dataset cifar10 --imb_factor 0.02 --imb_factor_unlabel 0.02

Self-Supervised Imbalanced Learning

Self-supervised pre-training (SSP)

To perform Rotation SSP on CIFAR-10-LT with \rho=100

python pretrain_rot.py --dataset cifar10 --imb_factor 0.01

To perform MoCo SSP on ImageNet-LT

python pretrain_moco.py --dataset imagenet --data <data_path>

Network training with SSP models

Train on CIFAR-10-LT with \rho=100

python train.py --dataset cifar10 --imb_factor 0.01 --pretrained_model <path_to_ssp_model>

Train on ImageNet-LT / iNaturalist 2018

python -m imagenet_inat.main --cfg <path_to_ssp_config> --model_dir <path_to_ssp_model>

Results and Models

All related data and checkpoints can be found via this link. Individual results and checkpoints are detailed as follows.

Semi-Supervised Imbalanced Learning

CIFAR-10-LT

Model Top-1 Error Download
CE + [email protected] (\rho=50 and \rho_U=1) 16.79 ResNet-32
CE + [email protected] (\rho=50 and \rho_U=25) 16.88 ResNet-32
CE + [email protected] (\rho=50 and \rho_U=50) 18.36 ResNet-32
CE + [email protected] (\rho=50 and \rho_U=100) 19.94 ResNet-32

SVHN-LT

Model Top-1 Error Download
CE + [email protected] (\rho=50 and \rho_U=1) 13.07 ResNet-32
CE + [email protected] (\rho=50 and \rho_U=25) 13.36 ResNet-32
CE + [email protected] (\rho=50 and \rho_U=50) 13.16 ResNet-32
CE + [email protected] (\rho=50 and \rho_U=100) 14.54 ResNet-32

Test a pretrained checkpoint

python train_semi.py --dataset cifar10 --resume <ckpt-path> -e

Self-Supervised Imbalanced Learning

CIFAR-10-LT

  • Self-supervised pre-trained models (Rotation)

    Dataset Setting \rho=100 \rho=50 \rho=10
    Download ResNet-32 ResNet-32 ResNet-32
  • Final models (200 epochs)

    Model \rho Top-1 Error Download
    CE(Uniform) + SSP 10 12.28 ResNet-32
    CE(Uniform) + SSP 50 21.80 ResNet-32
    CE(Uniform) + SSP 100 26.50 ResNet-32
    CE(Balanced) + SSP 10 11.57 ResNet-32
    CE(Balanced) + SSP 50 19.60 ResNet-32
    CE(Balanced) + SSP 100 23.47 ResNet-32

CIFAR-100-LT

  • Self-supervised pre-trained models (Rotation)

    Dataset Setting \rho=100 \rho=50 \rho=10
    Download ResNet-32 ResNet-32 ResNet-32
  • Final models (200 epochs)

    Model \rho Top-1 Error Download
    CE(Uniform) + SSP 10 42.93 ResNet-32
    CE(Uniform) + SSP 50 54.96 ResNet-32
    CE(Uniform) + SSP 100 59.60 ResNet-32
    CE(Balanced) + SSP 10 41.94 ResNet-32
    CE(Balanced) + SSP 50 52.91 ResNet-32
    CE(Balanced) + SSP 100 56.94 ResNet-32

ImageNet-LT

  • Self-supervised pre-trained models (MoCo)
    [ResNet-50]

  • Final models (90 epochs)

    Model Top-1 Error Download
    CE(Uniform) + SSP 54.4 ResNet-50
    CE(Balanced) + SSP 52.4 ResNet-50
    cRT + SSP 48.7 ResNet-50

iNaturalist 2018

  • Self-supervised pre-trained models (MoCo)
    [ResNet-50]

  • Final models (90 epochs)

    Model Top-1 Error Download
    CE(Uniform) + SSP 35.6 ResNet-50
    CE(Balanced) + SSP 34.1 ResNet-50
    cRT + SSP 31.9 ResNet-50

Test a pretrained checkpoint

# test on CIFAR-10 / CIFAR-100
python train.py --dataset cifar10 --resume <ckpt-path> -e

# test on ImageNet-LT / iNaturalist 2018
python -m imagenet_inat.main --cfg <path_to_ssp_config> --model_dir <path_to_model> --test

Acknowledgements

This code is partly based on the open-source implementations from the following sources: OpenLongTailRecognition, classifier-balancing, LDAM-DRW, MoCo, and semisup-adv.

Contact

If you have any questions, feel free to contact us through email ([email protected] & [email protected]) or Github issues. Enjoy!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].