All Projects → ufoym → Imbalanced Dataset Sampler

ufoym / Imbalanced Dataset Sampler

Licence: mit
A (PyTorch) imbalanced dataset sampler for oversampling low frequent classes and undersampling high frequent ones.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Imbalanced Dataset Sampler

Duckgoose
Utils for fast.ai course
Stars: ✭ 38 (-96.71%)
Mutual labels:  image-classification
Images Web Crawler
This package is a complete tool for creating a large dataset of images (specially designed -but not only- for machine learning enthusiasts). It can crawl the web, download images, rename / resize / covert the images and merge folders..
Stars: ✭ 51 (-95.58%)
Mutual labels:  image-classification
Meme Generator
MemeGen is a web application where the user gives an image as input and our tool generates a meme at one click for the user.
Stars: ✭ 57 (-95.06%)
Mutual labels:  image-classification
Hardhat Detector
A convolutional neural network implementation of a script that detects whether an individual is wearing a hardhat or not.
Stars: ✭ 41 (-96.45%)
Mutual labels:  image-classification
Bss distillation
Knowledge Distillation with Adversarial Samples Supporting Decision Boundary (AAAI 2019)
Stars: ✭ 51 (-95.58%)
Mutual labels:  image-classification
Divide And Co Training
[Paper 2020] Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training. Plus, an image classification toolbox includes ResNet, Wide-ResNet, ResNeXt, ResNeSt, ResNeXSt, SENet, Shake-Shake, DenseNet, PyramidNet, and EfficientNet.
Stars: ✭ 54 (-95.32%)
Mutual labels:  image-classification
Albumentations
Fast image augmentation library and an easy-to-use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about the library: https://www.mdpi.com/2078-2489/11/2/125
Stars: ✭ 9,353 (+709.78%)
Mutual labels:  image-classification
Global Self Attention Network
A Pytorch implementation of Global Self-Attention Network, a fully-attention backbone for vision tasks
Stars: ✭ 64 (-94.46%)
Mutual labels:  image-classification
Alpha pooling
Code for our paper "Generalized Orderless Pooling Performs Implicit Salient Matching" published at ICCV 2017.
Stars: ✭ 51 (-95.58%)
Mutual labels:  image-classification
Imagenet
Trial on kaggle imagenet object localization by yolo v3 in google cloud
Stars: ✭ 56 (-95.15%)
Mutual labels:  image-classification
Computervision Recipes
Best Practices, code samples, and documentation for Computer Vision.
Stars: ✭ 8,214 (+611.17%)
Mutual labels:  image-classification
Multidigitmnist
Combine multiple MNIST digits to create datasets with 100/1000 classes for few-shot learning/meta-learning
Stars: ✭ 48 (-95.84%)
Mutual labels:  image-classification
Mish
Official Repsoitory for "Mish: A Self Regularized Non-Monotonic Neural Activation Function" [BMVC 2020]
Stars: ✭ 1,072 (-7.19%)
Mutual labels:  image-classification
Cv Pretrained Model
A collection of computer vision pre-trained models.
Stars: ✭ 995 (-13.85%)
Mutual labels:  image-classification
Rostensorflow
TensorFlow ImageNet demo using ROS sensor_msgs/Image
Stars: ✭ 59 (-94.89%)
Mutual labels:  image-classification
Channel Pruning
Channel Pruning for Accelerating Very Deep Neural Networks (ICCV'17)
Stars: ✭ 979 (-15.24%)
Mutual labels:  image-classification
Codar
✅ CODAR is a Framework built using PyTorch to analyze post (Text+Media) and predict Cyber Bullying and offensive content. 💬📷
Stars: ✭ 52 (-95.5%)
Mutual labels:  image-classification
Deep Ranking
Learning Fine-grained Image Similarity with Deep Ranking is a novel application of neural networks, where the authors use a new multi scale architecture combined with a triplet loss to create a neural network that is able to perform image search. This repository is a simplified implementation of the same
Stars: ✭ 64 (-94.46%)
Mutual labels:  image-classification
The Third Eye
An AI based application to identify currency and gives audio feedback.
Stars: ✭ 63 (-94.55%)
Mutual labels:  image-classification
Tensorflow Kubernetes Art Classification
Train a TensorFlow model on Kubernetes to recognize art culture based on the collection from the Metropolitan Museum of Art
Stars: ✭ 55 (-95.24%)
Mutual labels:  image-classification

Imbalanced Dataset Sampler

license


Introduction

In many machine learning applications, we often come across datasets where some types of data may be seen more than other types. Take identification of rare diseases for example, there are probably more normal samples than disease ones. In these cases, we need to make sure that the trained model is not biased towards the class that has more data. As an example, consider a dataset where there are 5 disease images and 20 normal images. If the model predicts all images to be normal, its accuracy is 80%, and F1-score of such a model is 0.88. Therefore, the model has high tendency to be biased toward the ‘normal’ class.

To solve this problem, a widely adopted technique is called resampling. It consists of removing samples from the majority class (under-sampling) and / or adding more examples from the minority class (over-sampling). Despite the advantage of balancing classes, these techniques also have their weaknesses (there is no free lunch). The simplest implementation of over-sampling is to duplicate random records from the minority class, which can cause overfitting. In under-sampling, the simplest technique involves removing random records from the majority class, which can cause loss of information.

resampling

In this repo, we implement an easy-to-use PyTorch sampler ImbalancedDatasetSampler that is able to

  • rebalance the class distributions when sampling from the imbalanced dataset
  • estimate the sampling weights automatically
  • avoid creating a new balanced dataset
  • mitigate overfitting when it is used in conjunction with data augmentation techniques

Usage

For a simple start install the package via one of following ways:

python setup.py install
pip install .

Simply pass an ImbalancedDatasetSampler for the parameter sampler when creating a DataLoader. For example:

from torchsampler import ImbalancedDatasetSampler

train_loader = torch.utils.data.DataLoader(
    train_dataset, 
    sampler=ImbalancedDatasetSampler(train_dataset),
    batch_size=args.batch_size, 
    **kwargs
)

Then in each epoch, the loader will sample the entire dataset and weigh your samples inversely to your class appearing probability.

Example: Imbalanced MNIST Dataset

Distribution of classes in the imbalanced dataset:

With Imbalanced Dataset Sampler:

(left: test acc in each epoch; right: confusion matrix)

Without Imbalanced Dataset Sampler:

(left: test acc in each epoch; right: confusion matrix)

Note that there are significant improvements for minor classes such as 2 6 9, while the accuracy of the other classes is preserved.

Contributing

We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion. If you plan to contribute new features, utility functions or extensions, please first open an issue and discuss the feature with us.

Licensing

MIT licensed.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].