All Projects → fidler-lab → efficient-annotation-cookbook

fidler-lab / efficient-annotation-cookbook

Licence: MIT license
Official implementation of "Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets" (CVPR2021)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to efficient-annotation-cookbook

com2ann
Tool for translation type comments to type annotations in Python
Stars: ✭ 115 (+112.96%)
Mutual labels:  annotations
ScoutAR
Augmented reality app displays nearby restaurant information in a live camera and map view.
Stars: ✭ 28 (-48.15%)
Mutual labels:  annotations
rubocop-linter-action
Rubocop Linter Action: A GitHub Action to run Rubocop against your code!
Stars: ✭ 86 (+59.26%)
Mutual labels:  annotations
Symfony-4-by-Samples
Symfony 4 by Samples is a personal project in which I will be creating small demos with tutorial in which to learn the symfony framework 4. Each of the samples contains a README.md file that indicates the purpose of the sample plus an step by step guide to reproduce it. Basic topics, login and register form, authentication, webpack encore, sass…
Stars: ✭ 40 (-25.93%)
Mutual labels:  annotations
etiketai
Etiketai is an online tool designed to label images, useful for training AI models
Stars: ✭ 63 (+16.67%)
Mutual labels:  annotations
dochooks
WordPress hooks in method comments. Annotated hooks.
Stars: ✭ 33 (-38.89%)
Mutual labels:  annotations
MetaBIN
[CVPR2021] Meta Batch-Instance Normalization for Generalizable Person Re-Identification
Stars: ✭ 58 (+7.41%)
Mutual labels:  cvpr2021
illuminsight
💡👀 Read EPUB books with built-in insights from wikis, definitions, translations, and Google.
Stars: ✭ 55 (+1.85%)
Mutual labels:  annotations
CVPR2021 PLOP
Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation
Stars: ✭ 102 (+88.89%)
Mutual labels:  cvpr2021
annotation-cache-bundle
Annotation based caching for services inside a symfony container
Stars: ✭ 16 (-70.37%)
Mutual labels:  annotations
RfDNet
Implementation of CVPR'21: RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction
Stars: ✭ 150 (+177.78%)
Mutual labels:  cvpr2021
Involution
PyTorch reimplementation of the paper "Involution: Inverting the Inherence of Convolution for Visual Recognition" (2D and 3D Involution) [CVPR 2021].
Stars: ✭ 98 (+81.48%)
Mutual labels:  cvpr2021
linkedresearch.org
🌐 linkedresearch.org
Stars: ✭ 32 (-40.74%)
Mutual labels:  annotations
SimpleVideoAnnotation
A simple video annotation made with python + OpenCV for detection in YoloV2 format
Stars: ✭ 13 (-75.93%)
Mutual labels:  annotations
elucidate-server
A W3C and OA compliant Web Annotation server
Stars: ✭ 48 (-11.11%)
Mutual labels:  annotations
BLIP
Official Implementation of CVPR2021 paper: Continual Learning via Bit-Level Information Preserving
Stars: ✭ 33 (-38.89%)
Mutual labels:  cvpr2021
adversaria
Typeclass interfaces to access user-defined Scala annotations
Stars: ✭ 22 (-59.26%)
Mutual labels:  annotations
summerfish-swagger
Automatic Swagger docs for Go
Stars: ✭ 16 (-70.37%)
Mutual labels:  annotations
semantic-guidance
Code for our CVPR-2021 paper on Combining Semantic Guidance and Deep Reinforcement Learning For Generating Human Level Paintings.
Stars: ✭ 19 (-64.81%)
Mutual labels:  cvpr2021
windigo-android
Windigo is easy to use type-safe rest/http client for android
Stars: ✭ 23 (-57.41%)
Mutual labels:  annotations

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets

This is the official implementation of "Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets" (CVPR 2021). For more details, please refer to:


Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets

Yuan-Hong Liao, Amlan Kar, Sanja Fidler

University of Toronto

[Paper] [Video] [Project]

CVPR2021 Oral

Data is the engine of modern computer vision, which necessitates collecting large-scale datasets. This is expensive, and guaranteeing the quality of the labels is a major challenge. In this paper, we investigate efficient annotation strategies for collecting multi-class classification labels fora large collection of images. While methods that exploit learnt models for labeling exist, a surprisingly prevalent approach is to query humans for a fixed number of labels per datum and aggregate them, which is expensive. Building on prior work on online joint probabilistic modeling of human annotations and machine generated beliefs, we propose modifications and best practices aimed at minimizing human labeling effort. Specifically, we make use ofadvances in self-supervised learning, view annotation as a semi-supervised learning problem, identify and mitigate pitfalls and ablate several key design choices to propose effective guidelines for labeling. Our analysis is done in a more realistic simulation that involves querying human labelers, which uncovers issues with evaluation using existing worker simulation methods. Simulated experiments on a 125k image subset of the ImageNet dataset with 100 classes showthat it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average, a 2.7x and 6.7x improvement over prior work and manual annotation, respectively.


Code usage

  • Downdload the extracted BYOL features and change root directory accordingly
wget -P data/features/ http://www.cs.toronto.edu/~andrew/research/cvpr2021-good_practices/data/byol_r50-e3b0c442.pth_feat1.npy 

Replace REPO_DIR (here) with the absolute path to the repository.

  • Run online labeling with simulated workers
    • <EXPERIMENT> can be imagenet_split_0~5, imagenet_animal, imagenet_100_classes
    • <METHOD> can be ds_model, lean, improved_lean, efficient_annotation
    • <SIMULATION> can be amt_structured_noise, amt_uniform_noise
python main.py experiment=<EXPERIMENT> learner_method=<METHOD> simulation <SIMULATION>

To change other configurations, go check the config.yaml here.

Code Structure

There are several components in our system: Sampler, AnnotationHolder, Learner, Optimizer and Aggregator.

  • Sampler: We implement RandomSampler and GreedyTaskAssignmentSampler. For GreedyTaskAssignmentSampler, you need to specify an additional flag max_annotation_per_worker

For example,

python main.py experiment=imagenet_animal learner_method=efficient_annotation simulation=amt_structured_noise sampler.algo=greedy_task_assignment sampler.max_annotation_per_worker=2000
  • AnnotationHolder: It holds all information of each example including worker annotation, ground truth and current risk estimation. For simulated worker, you can call annotation_holder.collect_annotation to query annotations. You can also sample the annotation outside and add them by calling annotation_holder.add_annotation

  • Learner: We implement DummyLearner and LinearNNLearner. You can use your favorite architecture by overwriting NNLearner.init_learner

  • Optimizer: We implement EMOptimizer. By calling optimizer.step, the optimizer perform EM for a fixed number of times unless it's converged. If DummyLearner is not used, the optimizer is expected to call optimizer.fit_machine_learner to train the machine learner and perform prediction over all data examples.

  • Aggregator: We implement MjAggregator and BayesAggregator. MjAggregator performs majority vote to infer the final label. BayesAggregator treat the ground truth and worker skill as hidden variables and infer it based on the observation (worker annotation).

Citation

If you use this code, please cite:

@misc{liao2021good,
      title={Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets}, 
      author={Yuan-Hong Liao and Amlan Kar and Sanja Fidler},
      year={2021},
      eprint={2104.12690},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].