All Projects → stanford-futuredata → sinkhorn-label-allocation

stanford-futuredata / sinkhorn-label-allocation

Licence: MIT license
Sinkhorn Label Allocation is a label assignment method for semi-supervised self-training algorithms. The SLA algorithm is described in full in this ICML 2021 paper: https://arxiv.org/abs/2102.08622.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to sinkhorn-label-allocation

ganbert
Enhancing the BERT training with Semi-supervised Generative Adversarial Networks
Stars: ✭ 205 (+318.37%)
Mutual labels:  semi-supervised-learning, few-shot-learning
Awesome Domain Adaptation
A collection of AWESOME things about domian adaptation
Stars: ✭ 3,357 (+6751.02%)
Mutual labels:  optimal-transport, few-shot-learning
SCL
📄 Spatial Contrastive Learning for Few-Shot Classification (ECML/PKDD 2021).
Stars: ✭ 42 (-14.29%)
Mutual labels:  image-classification, few-shot-learning
Billion-scale-semi-supervised-learning
Implementing Billion-scale semi-supervised learning for image classification using Pytorch
Stars: ✭ 81 (+65.31%)
Mutual labels:  semi-supervised-learning, image-classification
metric-transfer.pytorch
Deep Metric Transfer for Label Propagation with Limited Annotated Data
Stars: ✭ 49 (+0%)
Mutual labels:  semi-supervised-learning, image-classification
deviation-network
Source code of the KDD19 paper "Deep anomaly detection with deviation networks", weakly/partially supervised anomaly detection, few-shot anomaly detection
Stars: ✭ 94 (+91.84%)
Mutual labels:  semi-supervised-learning, few-shot-learning
LibFewShot
LibFewShot: A Comprehensive Library for Few-shot Learning.
Stars: ✭ 629 (+1183.67%)
Mutual labels:  image-classification, few-shot-learning
awesome-few-shot-meta-learning
awesome few shot / meta learning papers
Stars: ✭ 44 (-10.2%)
Mutual labels:  few-shot-learning
temporal-ensembling-semi-supervised
Keras implementation of temporal ensembling(semi-supervised learning)
Stars: ✭ 22 (-55.1%)
Mutual labels:  semi-supervised-learning
subwAI
Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification
Stars: ✭ 86 (+75.51%)
Mutual labels:  image-classification
multilingual kws
Few-shot Keyword Spotting in Any Language and Multilingual Spoken Word Corpus
Stars: ✭ 122 (+148.98%)
Mutual labels:  few-shot-learning
KerasUI
UI for Keras to implement image classification written in python and django
Stars: ✭ 40 (-18.37%)
Mutual labels:  image-classification
MixNet-PyTorch
Concise, Modular, Human-friendly PyTorch implementation of MixNet with Pre-trained Weights.
Stars: ✭ 16 (-67.35%)
Mutual labels:  image-classification
TensorFlow-Multiclass-Image-Classification-using-CNN-s
Balanced Multiclass Image Classification with TensorFlow on Python.
Stars: ✭ 57 (+16.33%)
Mutual labels:  image-classification
Skin Lesions Classification DCNNs
Transfer Learning with DCNNs (DenseNet, Inception V3, Inception-ResNet V2, VGG16) for skin lesions classification
Stars: ✭ 47 (-4.08%)
Mutual labels:  image-classification
pywsl
Python codes for weakly-supervised learning
Stars: ✭ 118 (+140.82%)
Mutual labels:  semi-supervised-learning
jpetstore-kubernetes
Modernize and Extend: JPetStore on IBM Cloud Kubernetes Service
Stars: ✭ 21 (-57.14%)
Mutual labels:  image-classification
rankpruning
🧹 Formerly for binary classification with noisy labels. Replaced by cleanlab.
Stars: ✭ 81 (+65.31%)
Mutual labels:  semi-supervised-learning
semi-supervised-paper-implementation
Reproduce some methods in semi-supervised papers.
Stars: ✭ 35 (-28.57%)
Mutual labels:  semi-supervised-learning
UniFormer
[ICLR2022] official implementation of UniFormer
Stars: ✭ 574 (+1071.43%)
Mutual labels:  image-classification

Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training

ICML 2021

[paper]

Schematic illustration of Sinkhorn Label Allocation

Overview

Semi-supervised learning (SSL) is the setting where the learner is given access to a collection of unlabeled data in addition to a labeled dataset. The goal is to learn a more accurate predictor than would otherwise be possible using the labeled data alone. Self-training is a standard approach to SSL where the learner's own predictions on unlabeled data are used as supervision during training. As one may expect, the success of self-training depends crucially on the label assignment step: if too many unlabeled examples are incorrectly labeled, we may end up in a situation where prediction errors compound over the course of training, ultimately resulting in a poor predictor. Consequently, practitioners have developed a wide range of label assignment heuristics which serve to mitigate the label noise introduced through the self-training process. For example, a commonly seen heuristic is to assign a label only if the current predictor's confidence exceeds a certain threshold.

In our paper, we reframe the label assignment process in self-training as an optimization problem which aims to find a minimum cost matching between unlabeled examples and classes, subject to a set of constraints. As it turns out, this formulation is sufficiently flexible to subsume a variety of popular label assignment heuristics, e.g., confidence thresholding, label annealing, class balancing, and others. At the same time, the particular form of the optimization problem admits an efficient approximation algorithm -- the Sinkhorn-Knopp algorithm -- thus making it possible to run this assignment procedure within the inner loop of standard stochastic optimization algorithms. We call the resulting label assignment process Sinkhorn Label Allocation, or SLA for short. When combined with consistency regularization, SLA yields a self-training algorithm that achieves strong performance on semi-supervised versions of CIFAR-10, CIFAR-100 and SVHN.

Citation

If you've found this repository useful in your own work, please consider citing our ICML paper:

@inproceedings{tai2021sinkhorn,
  title = {{Sinkhorn Label Allocation: Semi-supervised classification via annealed self-training}},
  author = {Tai, Kai Sheng and Bailis, Peter and Valiant, Gregory},
  booktitle = {International Conference on Machine Learning},
  year = {2021},
}

Environment

We recommend using a conda environment to manage dependencies:

$ conda env create -f environment.yml
$ conda activate sinkhorn-label-allocation

Usage

SLA can be run with a basic set of options using the following command:

$ python run_sla.py --dataset cifar10 --data_path /tmp/data --output_dir /tmp/sla --run_id my_sla_run --num_labeled 40 --seed 1 --num_epochs 1024 

Similarly, the FixMatch baseline can be run using run_fixmatch.py:

$ python run_fixmatch.py --dataset cifar10 --data_path /tmp/data --output_dir /tmp/sla --run_id my_fixmatch_run --num_labeled 40 --seed 1 --num_epochs 1024 

The following datasets are currently supported: cifar10, cifar100, and svhn.

A complete mixed precision SLA training run with the default parameters on CIFAR-10 takes about 35 hours on a single NVIDIA Titan V.

For additional algorithm specific options, use the --help flag:

$ python run_supervised.py -- --help
$ python run_fixmatch.py -- --help
$ python run_sla.py -- --help

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].