stanford-futuredata / sinkhorn-label-allocation

Licence: MIT license

Sinkhorn Label Allocation is a label assignment method for semi-supervised self-training algorithms. The SLA algorithm is described in full in this ICML 2021 paper: https://arxiv.org/abs/2102.08622.

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to sinkhorn-label-allocation

ganbert

Enhancing the BERT training with Semi-supervised Generative Adversarial Networks

Stars: ✭ 205 (+318.37%)

Mutual labels: semi-supervised-learning, few-shot-learning

Awesome Domain Adaptation

A collection of AWESOME things about domian adaptation

Stars: ✭ 3,357 (+6751.02%)

Mutual labels: optimal-transport, few-shot-learning

SCL

📄 Spatial Contrastive Learning for Few-Shot Classification (ECML/PKDD 2021).

Stars: ✭ 42 (-14.29%)

Mutual labels: image-classification, few-shot-learning

Billion-scale-semi-supervised-learning

Implementing Billion-scale semi-supervised learning for image classification using Pytorch

Stars: ✭ 81 (+65.31%)

Mutual labels: semi-supervised-learning, image-classification

metric-transfer.pytorch

Deep Metric Transfer for Label Propagation with Limited Annotated Data

Stars: ✭ 49 (+0%)

Mutual labels: semi-supervised-learning, image-classification

deviation-network

Source code of the KDD19 paper "Deep anomaly detection with deviation networks", weakly/partially supervised anomaly detection, few-shot anomaly detection

Stars: ✭ 94 (+91.84%)

Mutual labels: semi-supervised-learning, few-shot-learning

LibFewShot

LibFewShot: A Comprehensive Library for Few-shot Learning.

Stars: ✭ 629 (+1183.67%)

Mutual labels: image-classification, few-shot-learning

awesome-few-shot-meta-learning

awesome few shot / meta learning papers

Stars: ✭ 44 (-10.2%)

Mutual labels: few-shot-learning

temporal-ensembling-semi-supervised

Keras implementation of temporal ensembling(semi-supervised learning)

Stars: ✭ 22 (-55.1%)

Mutual labels: semi-supervised-learning

subwAI

Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

Stars: ✭ 86 (+75.51%)

Mutual labels: image-classification

multilingual kws

Few-shot Keyword Spotting in Any Language and Multilingual Spoken Word Corpus

Stars: ✭ 122 (+148.98%)

Mutual labels: few-shot-learning

KerasUI

UI for Keras to implement image classification written in python and django

Stars: ✭ 40 (-18.37%)

Mutual labels: image-classification

MixNet-PyTorch

Concise, Modular, Human-friendly PyTorch implementation of MixNet with Pre-trained Weights.

Stars: ✭ 16 (-67.35%)

Mutual labels: image-classification

TensorFlow-Multiclass-Image-Classification-using-CNN-s

Balanced Multiclass Image Classification with TensorFlow on Python.

Stars: ✭ 57 (+16.33%)

Mutual labels: image-classification

Skin Lesions Classification DCNNs

Transfer Learning with DCNNs (DenseNet, Inception V3, Inception-ResNet V2, VGG16) for skin lesions classification

Stars: ✭ 47 (-4.08%)

Mutual labels: image-classification

pywsl

Python codes for weakly-supervised learning

Stars: ✭ 118 (+140.82%)

Mutual labels: semi-supervised-learning

jpetstore-kubernetes

Modernize and Extend: JPetStore on IBM Cloud Kubernetes Service

Stars: ✭ 21 (-57.14%)

Mutual labels: image-classification

rankpruning

🧹 Formerly for binary classification with noisy labels. Replaced by cleanlab.

Stars: ✭ 81 (+65.31%)

Mutual labels: semi-supervised-learning

semi-supervised-paper-implementation

Reproduce some methods in semi-supervised papers.

Stars: ✭ 35 (-28.57%)

Mutual labels: semi-supervised-learning

UniFormer

[ICLR2022] official implementation of UniFormer

Stars: ✭ 574 (+1071.43%)

Mutual labels: image-classification

View All Similar Projects ➔

Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training

ICML 2021

[paper]

Overview

Semi-supervised learning (SSL) is the setting where the learner is given access to a collection of unlabeled data in addition to a labeled dataset. The goal is to learn a more accurate predictor than would otherwise be possible using the labeled data alone. Self-training is a standard approach to SSL where the learner's own predictions on unlabeled data are used as supervision during training. As one may expect, the success of self-training depends crucially on the label assignment step: if too many unlabeled examples are incorrectly labeled, we may end up in a situation where prediction errors compound over the course of training, ultimately resulting in a poor predictor. Consequently, practitioners have developed a wide range of label assignment heuristics which serve to mitigate the label noise introduced through the self-training process. For example, a commonly seen heuristic is to assign a label only if the current predictor's confidence exceeds a certain threshold.

In our paper, we reframe the label assignment process in self-training as an optimization problem which aims to find a minimum cost matching between unlabeled examples and classes, subject to a set of constraints. As it turns out, this formulation is sufficiently flexible to subsume a variety of popular label assignment heuristics, e.g., confidence thresholding, label annealing, class balancing, and others. At the same time, the particular form of the optimization problem admits an efficient approximation algorithm -- the Sinkhorn-Knopp algorithm -- thus making it possible to run this assignment procedure within the inner loop of standard stochastic optimization algorithms. We call the resulting label assignment process Sinkhorn Label Allocation, or SLA for short. When combined with consistency regularization, SLA yields a self-training algorithm that achieves strong performance on semi-supervised versions of CIFAR-10, CIFAR-100 and SVHN.

Citation

If you've found this repository useful in your own work, please consider citing our ICML paper:

@inproceedings{tai2021sinkhorn,
  title = {{Sinkhorn Label Allocation: Semi-supervised classification via annealed self-training}},
  author = {Tai, Kai Sheng and Bailis, Peter and Valiant, Gregory},
  booktitle = {International Conference on Machine Learning},
  year = {2021},
}

Environment

We recommend using a conda environment to manage dependencies:

$ conda env create -f environment.yml
$ conda activate sinkhorn-label-allocation

Usage

SLA can be run with a basic set of options using the following command:

$ python run_sla.py --dataset cifar10 --data_path /tmp/data --output_dir /tmp/sla --run_id my_sla_run --num_labeled 40 --seed 1 --num_epochs 1024

Similarly, the FixMatch baseline can be run using run_fixmatch.py:

$ python run_fixmatch.py --dataset cifar10 --data_path /tmp/data --output_dir /tmp/sla --run_id my_fixmatch_run --num_labeled 40 --seed 1 --num_epochs 1024

The following datasets are currently supported: cifar10, cifar100, and svhn.

A complete mixed precision SLA training run with the default parameters on CIFAR-10 takes about 35 hours on a single NVIDIA Titan V.

For additional algorithm specific options, use the --help flag:

$ python run_supervised.py -- --help
$ python run_fixmatch.py -- --help
$ python run_sla.py -- --help

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

stanford-futuredata / sinkhorn-label-allocation

Programming Languages

Labels

Projects that are alternatives of or similar to sinkhorn-label-allocation

Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training

Overview

Citation

Environment

Usage

License