Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → yukimasano → Self Label

yukimasano / Self Label

Self-labelling via simultaneous clustering and representation learning. (ICLR 2020)

Programming Languages

python

139335 projects - #7 most used programming language

Labels

clustering resnet representation-learning

Projects that are alternatives of or similar to Self Label

Compress

Compressing Representations for Self-Supervised Learning

Stars: ✭ 43 (-86.73%)

Mutual labels: representation-learning, clustering

Unsupervised Classification

SCAN: Learning to Classify Images without Labels (ECCV 2020), incl. SimCLR.

Stars: ✭ 605 (+86.73%)

Mutual labels: representation-learning, clustering

Revisiting-Contrastive-SSL

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [NeurIPS 2021]

Stars: ✭ 81 (-75%)

Mutual labels: clustering, representation-learning

Bagofconcepts

Python implementation of bag-of-concepts

Stars: ✭ 18 (-94.44%)

Mutual labels: representation-learning, clustering

Self Supervised Learning Overview

📜 Self-Supervised Learning from Images: Up-to-date reading list.

Stars: ✭ 73 (-77.47%)

Mutual labels: representation-learning, clustering

DESOM

🌐 Deep Embedded Self-Organizing Map: Joint Representation Learning and Self-Organization

Stars: ✭ 76 (-76.54%)

Mutual labels: clustering, representation-learning

M-NMF

An implementation of "Community Preserving Network Embedding" (AAAI 2017)

Stars: ✭ 119 (-63.27%)

Mutual labels: clustering, representation-learning

Gcn clustering

Code for CVPR'19 paper Linkage-based Face Clustering via GCN

Stars: ✭ 283 (-12.65%)

Mutual labels: clustering

Elasticluster

Create clusters of VMs on the cloud and configure them with Ansible.

Stars: ✭ 298 (-8.02%)

Mutual labels: clustering

Dedupe

🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Stars: ✭ 3,241 (+900.31%)

Mutual labels: clustering

Pykg2vec

Python library for knowledge graph embedding and representation learning.

Stars: ✭ 280 (-13.58%)

Mutual labels: representation-learning

React Native Maps Super Cluster

A Clustering-enabled map for React Native

Stars: ✭ 284 (-12.35%)

Mutual labels: clustering

Leaflet.markercluster

Marker Clustering plugin for Leaflet

Stars: ✭ 3,305 (+920.06%)

Mutual labels: clustering

Rabbitmq Peer Discovery K8s

Kubernetes-based peer discovery mechanism for RabbitMQ

Stars: ✭ 283 (-12.65%)

Mutual labels: clustering

Malheur

A Tool for Automatic Analysis of Malware Behavior

Stars: ✭ 313 (-3.4%)

Mutual labels: clustering

Awesome Computer Vision Models

A list of popular deep learning models related to classification, segmentation and detection problems

Stars: ✭ 278 (-14.2%)

Mutual labels: resnet

Rezero

Official PyTorch Repo for "ReZero is All You Need: Fast Convergence at Large Depth"

Stars: ✭ 317 (-2.16%)

Mutual labels: resnet

Tianchi Medical Lungtumordetect

天池医疗AI大赛[第一季]：肺部结节智能诊断 UNet/VGG/Inception/ResNet/DenseNet

Stars: ✭ 314 (-3.09%)

Mutual labels: resnet

All Algorithms implemented in R

Stars: ✭ 294 (-9.26%)

Mutual labels: clustering

Simclr

PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations by T. Chen et al.

Stars: ✭ 293 (-9.57%)

Mutual labels: representation-learning

View All Similar Projects ➔

Self-labelling via simultaneous clustering and representation learning

🆗🆗🎉 NEW models (20th August 2020): Added standard SeLa pretrained torchvision ResNet models to make loading much easier + added baselines using better MoCov2 augmentation (~69% LP performance) + added evaluation with K=1000 for ImageNet "unuspervised clustering"

🆕✅🎉 updated code: 23rd April 2020: bug fixes + CIFAR code + evaluation for resnet & alexnet.

Checkout our blogpost for a quick non-technical overview and an interactive visualization of our clusters.

Self-Label

This code is the official implementation of the ICLR 2020 paper Self-labelling via simultaneous clustering and representation learning.

Abstract

Combining clustering and representation learning is one of the most promising approaches for unsupervised learning of deep neural networks. However, doing so naively leads to ill posed learning problems with degenerate solutions. In this paper, we propose a novel and principled learning formulation that addresses these issues. The method is obtained by maximizing the information between labels and input data indices. We show that this criterion extends standard crossentropy minimization to an optimal transport problem, which we solve efficiently for millions of input images and thousands of labels using a fast variant of the Sinkhorn-Knopp algorithm. The resulting method is able to self-label visual data so as to train highly competitive image representations without manual labels. Our method achieves state of the art representation learning performance for AlexNet and ResNet-50 on SVHN, CIFAR-10, CIFAR-100 and ImageNet.

Results at a glance

	NMI(%)	aNMI(%)	ARI(%)	LP Acc (%)
AlexNet 1k	50.5	12.2	2.7	42.1
AlexNet 10k	66.4	4.7	4.7	43.8
R50 10x3k	54.2	34.4	7.2	61.5

With better augmentations (all single crop)

	Label-Acc	NMI(%)	aNMI(%)	ARI(%)	LP Acc (%)	model_weights
Aug++ R18 1k (new)	26.9	62.7	36.4	12.5	53.3	here
Aug++ R50 1k (new)	30.5	65.7	42.0	16.2	63.5	here
Aug++ R50 10x3k (new)	38.1	75.7	52.8	27.6	68.8	here
(MoCo-v2 + k-means**, K=3k)		71.4	39.6	15.8	71.1

"Aug++" refers to the better augmentations used in SimCLR, taken from the MoCo-v2 repo, but I still only trained for 280 epochs, with three lr-drops as in CMC.
There are still further improvements to be made with a MLP or training 800 epochs (I train 280), as done in SimCLR, MoCov2 and SwAV.
**MoCo-v2 uses 800 epochs, MLP and cos-lr-schedule. On MoCo-v2 I run k-means (K=3000) on the avg-pooled features (after the MLP-head it's pretty much the same performance) to obtain NMI, aNMI and ARI numbers.
Models above use standard torchvision ResNet backbones so loading is now super easy:

import torch, torchvision
model = torchvision.models.resnet50(pretrained=False, num_classes=3000)
ckpt = torch.load('resnet50-10x3k_pp.pth')
model.load_state_dict(ckpt['state_dict'])
pseudolabels = ckpt['L']

note on improvement potential: by just using "aug+": I get LP-accuracy of 67.2% after 200 epochs. MoCo-v2 with "aug+" only has 63.4% after 200 epochs.

Clusters that were discovered by our method

Sorted

Random

The edge-colors encode the true imagenet classes (which are not used for training). You can view all clusters here.

Requirements

Python >3.6
PyTorch > 1.0
CUDA
Numpy, SciPy
also, see requirements.txt
(optional:) TensorboardX

Running our code

Run the self-supervised training of an AlexNet with the command

$./scripts/alexnet.sh

or train a ResNet-50 with

$./scripts/resnet.sh

Note: you need to specify your dataset directory (it expects a format just like ImageNet with "train" and "val" folders). You also need to give the code enough GPUs to allow for storage of activations on the GPU. Otherwise you need to use the CPU variant which is significantly slower.

Full documentation of the unsupervised training code main.py:

usage: main.py [-h] [--epochs EPOCHS] [--batch-size BATCH_SIZE] [--lr LR]
               [--lrdrop LRDROP] [--wd WD] [--dtype {f64,f32}] [--nopts NOPTS]
               [--augs AUGS] [--paugs PAUGS] [--lamb LAMB] [--cpu]
               [--arch ARCH] [--archspec {big,small}] [--ncl NCL] [--hc HC]
               [--device DEVICE] [--modeldevice MODELDEVICE] [--exp EXP]
               [--workers WORKERS] [--imagenet-path IMAGENET_PATH]
               [--comment COMMENT] [--log-intv LOG_INTV] [--log-iter LOG_ITER]

PyTorch Implementation of Self-Label

optional arguments:
  -h, --help            show this help message and exit
  --epochs EPOCHS       number of epochs
  --batch-size BATCH_SIZE
                        batch size (default: 256)
  --lr LR               initial learning rate (default: 0.05)
  --lrdrop LRDROP       multiply LR by 0.1 every (default: 150 epochs)
  --wd WD               weight decay pow (default: (-5)
  --dtype {f64,f32}     SK-algo dtype (default: f64)
  --nopts NOPTS         number of pseudo-opts (default: 100)
  --augs AUGS           augmentation level (default: 3)
  --paugs PAUGS         for pseudoopt: augmentation level (default: 3)
  --lamb LAMB           for pseudoopt: lambda (default:25)
  --cpu                 use CPU variant (slow) (default: off)
  --arch ARCH           alexnet or resnet (default: alexnet)
  --archspec {big,small}
                        alexnet variant (default:big)
  --ncl NCL             number of clusters per head (default: 3000)
  --hc HC               number of heads (default: 1)
  --device DEVICE       GPU devices to use for storage and model
  --modeldevice MODELDEVICE
                        GPU numbers on which the CNN runs
  --exp EXP             path to experiment directory
  --workers WORKERS     number workers (default: 6)
  --imagenet-path IMAGENET_PATH
                        path to folder that contains `train` and `val`
  --comment COMMENT     name for tensorboardX
  --log-intv LOG_INTV   save stuff every x epochs (default: 1)
  --log-iter LOG_ITER   log every x-th batch (default: 200)

Evaluation

Linear Evaluation

We provide the linear evaluation methods in this repo. Simply download the models via . ./scripts/download_models.sh and then either run scripts/eval-alexnet.sh or scripts/eval-resnet.sh.

Pascal VOC

We follow the standard evaluation protocols for self-supervised visual representation learning.

for Classification: we follow the PyTorch implementation of DeepCluster with frozen BatchNorm.
for Segmentation: we follow the implmentation from the Colorization paper which uses the FCN repo. Note: requires the Caffe framework
for Detection: we follow Krähenbühl et al.'s implementation based on the Faster RCNN. Note: requires the Caffe framework

Our extracted pseudolabels

As we show in the paper, the pseudolabels we generate from our training can be used to quickly train a neural network with regular cross-entropy. Moreover they seem to correctly group together similar images. Hence we provide the labels for everyone to use.

AlexNet

You can download the pseudolabels from our best (raw) AlexNet model with 10x3000 clusters here.

ResNet

You can download the pseudolabels from our best ResNet model with 10x3000 clusters here.

Trained models

You can also download our trained models by running

$./scripts/download_models.sh

Use them like this:

import torch
import models
d = torch.load('self-label_models/resnet-10x3k.pth')
m = models.resnet(num_classes = [3000]*10)
m.load_state_dict(d)

d = torch.load('self-label_models/alexnet-10x3k-wRot.pth')
m = models.alexnet(num_classes = [3000]*10)
m.load_state_dict(d)

Reference

If you use this code etc., please cite the following paper:

Yuki M. Asano, Christian Rupprecht and Andrea Vedaldi. "Self-labelling via simultaneous clustering and representation learning." Proc. ICLR (2020)

@inproceedings{asano2020self,
  title={Self-labelling via simultaneous clustering and representation learning},
  author={Asano, Yuki M. and Rupprecht, Christian and Vedaldi, Andrea},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2020},
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 324

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗