All Projects → YyzHarry → imbalanced-regression

YyzHarry / imbalanced-regression

Licence: MIT license
[ICML 2021, Long Talk] Delving into Deep Imbalanced Regression

Programming Languages

python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to imbalanced-regression

imbalanced-ensemble
Class-imbalanced / Long-tailed ensemble learning in Python. Modular, flexible, and extensible. | 模块化、灵活、易扩展的类别不平衡/长尾机器学习库
Stars: ✭ 199 (-53.18%)
Mutual labels:  imbalanced-data, imbalanced-learning, long-tail, imbalanced-classification
machine-learning-imbalanced-data
Code repository for the online course Machine Learning with Imbalanced Data
Stars: ✭ 94 (-77.88%)
Mutual labels:  imbalanced-data, imbalanced-learning, imbalanced-classification
ResLT
ResLT: Residual Learning for Long-tailed Recognition (TPAMI 2022)
Stars: ✭ 40 (-90.59%)
Mutual labels:  imbalanced-data, imbalanced-learning, long-tail
smogn
Synthetic Minority Over-Sampling Technique for Regression
Stars: ✭ 238 (-44%)
Mutual labels:  regression, imbalanced-data
BalancedMetaSoftmax-Classification
[NeurIPS 2020] Balanced Meta-Softmax for Long-Tailed Visual Recognition
Stars: ✭ 106 (-75.06%)
Mutual labels:  imbalanced-learning, imbalanced-classification
unicornn
Official code for UnICORNN (ICML 2021)
Stars: ✭ 21 (-95.06%)
Mutual labels:  icml, icml-2021
Parametric-Contrastive-Learning
Parametric Contrastive Learning (ICCV2021)
Stars: ✭ 155 (-63.53%)
Mutual labels:  imbalanced-data, imbalanced-learning
mesa
NeurIPS’20 | Build powerful ensemble class-imbalanced learning models via meta-knowledge-powered resampler. | 设计元知识驱动的采样器解决类别不平衡问题
Stars: ✭ 88 (-79.29%)
Mutual labels:  imbalanced-data, imbalanced-learning
clara-dicom-adapter
DICOM Adapter is a component of the Clara Deploy SDK which facilitates integration with DICOM compliant systems, enables ingestion of imaging data, helps triggering of jobs with configurable rules and offers pushing the output of jobs to PACS systems.
Stars: ✭ 31 (-92.71%)
Mutual labels:  healthcare
ePillID-benchmark
ePillID Dataset: A Low-Shot Fine-Grained Benchmark for Pill Identification (CVPR 2020 VL3)
Stars: ✭ 54 (-87.29%)
Mutual labels:  healthcare
awesome-list-machine-learning-healthcare
A list of awesome resources on the application of machine learning in healthcare
Stars: ✭ 17 (-96%)
Mutual labels:  healthcare
onelearn
Online machine learning methods
Stars: ✭ 14 (-96.71%)
Mutual labels:  regression
LinearityIQA
[official] Norm-in-Norm Loss with Faster Convergence and Better Performance for Image Quality Assessment (ACM MM 2020)
Stars: ✭ 73 (-82.82%)
Mutual labels:  regression
BAS
BAS R package https://merliseclyde.github.io/BAS/
Stars: ✭ 36 (-91.53%)
Mutual labels:  regression
rl-medical
Communicative Multiagent Deep Reinforcement Learning for Anatomical Landmark Detection using PyTorch.
Stars: ✭ 36 (-91.53%)
Mutual labels:  healthcare
pneumonia detection
Pneumonia Detection using machine learning - with PyTorch
Stars: ✭ 12 (-97.18%)
Mutual labels:  healthcare
machine learning from scratch matlab python
Vectorized Machine Learning in Python 🐍 From Scratch
Stars: ✭ 28 (-93.41%)
Mutual labels:  regression
Patient2Vec
Patient2Vec: A Personalized Interpretable Deep Representation of the Longitudinal Electronic Health Record
Stars: ✭ 85 (-80%)
Mutual labels:  healthcare
HealthCare-Scan-Nearby-Hospital-Locations
I developed this android application to help beginner developers to know how to use Google Maps API and how to convert JSON data into Java Object.
Stars: ✭ 23 (-94.59%)
Mutual labels:  healthcare
open-source-cure-for-cancer
Using AI to figure out cancer
Stars: ✭ 45 (-89.41%)
Mutual labels:  healthcare

Delving into Deep Imbalanced Regression

This repository contains the implementation code for paper:
Delving into Deep Imbalanced Regression
Yuzhe Yang, Kaiwen Zha, Ying-Cong Chen, Hao Wang, Dina Katabi
38th International Conference on Machine Learning (ICML 2021), Long Oral
[Project Page] [Paper] [Video] [Blog Post]



Deep Imbalanced Regression (DIR) aims to learn from imbalanced data with continuous targets,
tackle potential missing data for certain regions, and generalize to the entire target range.

Beyond Imbalanced Classification: Brief Introduction for DIR

Existing techniques for learning from imbalanced data focus on targets with categorical indices, i.e., the targets are different classes. However, many real-world tasks involve continuous and even infinite target values. We systematically investigate Deep Imbalanced Regression (DIR), which aims to learn continuous targets from natural imbalanced data, deal with potential missing data for certain target values, and generalize to the entire target range.

We curate and benchmark large-scale DIR datasets for common real-world tasks in computer vision, natural language processing, and healthcare domains, ranging from single-value prediction such as age, text similarity score, health condition score, to dense-value prediction such as depth.

Usage

We separate the codebase for different datasets into different subfolders. Please go into the subfolders for more information (e.g., installation, dataset preparation, training, evaluation & models).

IMDB-WIKI-DIR  |  AgeDB-DIR  |  NYUD2-DIR  |  STS-B-DIR

Highlights

(1) ✔️ New Task: Deep Imbalanced Regression (DIR)

(2) ✔️ New Techniques:

image image
Label distribution smoothing (LDS) Feature distribution smoothing (FDS)

(3) ✔️ New Benchmarks:

  • Computer Vision: 💡 IMDB-WIKI-DIR (age) / AgeDB-DIR (age) / NYUD2-DIR (depth)
  • Natural Language Processing: 📋 STS-B-DIR (text similarity score)
  • Healthcare: 🏥 SHHS-DIR (health condition score)
IMDB-WIKI-DIR AgeDB-DIR NYUD2-DIR STS-B-DIR SHHS-DIR
image image image image image

Apply LDS and FDS on Other Datasets / Models

We provide examples of how to apply LDS and FDS on other customized datasets and/or models.

LDS

To apply LDS on your customized dataset, you will first need to estimate the effective label distribution:

from collections import Counter
from scipy.ndimage import convolve1d
from utils import get_lds_kernel_window

# preds, labels: [Ns,], "Ns" is the number of total samples
preds, labels = ..., ...
# assign each label to its corresponding bin (start from 0)
# with your defined get_bin_idx(), return bin_index_per_label: [Ns,] 
bin_index_per_label = [get_bin_idx(label) for label in labels]

# calculate empirical (original) label distribution: [Nb,]
# "Nb" is the number of bins
Nb = max(bin_index_per_label) + 1
num_samples_of_bins = dict(Counter(bin_index_per_label))
emp_label_dist = [num_samples_of_bins.get(i, 0) for i in range(Nb)]

# lds_kernel_window: [ks,], here for example, we use gaussian, ks=5, sigma=2
lds_kernel_window = get_lds_kernel_window(kernel='gaussian', ks=5, sigma=2)
# calculate effective label distribution: [Nb,]
eff_label_dist = convolve1d(np.array(emp_label_dist), weights=lds_kernel_window, mode='constant')

With the estimated effective label distribution, one straightforward option is to use the loss re-weighting scheme:

from loss import weighted_mse_loss

# Use re-weighting based on effective label distribution, sample-wise weights: [Ns,]
eff_num_per_label = [eff_label_dist[bin_idx] for bin_idx in bin_index_per_label]
weights = [np.float32(1 / x) for x in eff_num_per_label]

# calculate loss
loss = weighted_mse_loss(preds, labels, weights=weights)

FDS

To apply FDS on your customized data/model, you will first need to define the FDS module in your network:

from fds import FDS

config = dict(feature_dim=..., start_update=0, start_smooth=1, kernel='gaussian', ks=5, sigma=2)

def Network(nn.Module):
    def __init__(self, **config):
        super().__init__()
        self.feature_extractor = ...
        self.regressor = nn.Linear(config['feature_dim'], 1)  # FDS operates before the final regressor
        self.FDS = FDS(**config)

    def forward(self, inputs, labels, epoch):
        features = self.feature_extractor(inputs)  # features: [batch_size, feature_dim]
        # smooth the feature distributions over the target space
        smoothed_features = features    
        if self.training and epoch >= config['start_smooth']:
            smoothed_features = self.FDS.smooth(smoothed_features, labels, epoch)
        preds = self.regressor(smoothed_features)
        
        return {'preds': preds, 'features': features}

During training, you will need to update the FDS statistics after each training epoch:

model = Network(**config)

for epoch in range(num_epochs):
    for (inputs, labels) in train_loader:
        # standard training pipeline
        ...

    # update FDS statistics after each training epoch
    if epoch >= config['start_update']:
        # collect features and labels for all training samples
        ...
        # training_features: [num_samples, feature_dim], training_labels: [num_samples,]
        training_features, training_labels = ..., ...
        model.FDS.update_last_epoch_stats(epoch)
        model.FDS.update_running_stats(training_features, training_labels, epoch)

Updates

  • [06/2021] We provide a hands-on tutorial of DIR. Check it out!
  • [05/2021] We create a Blog post for this work (version in Chinese is also available here). Check it out for more details!
  • [05/2021] Paper accepted to ICML 2021 as a Long Talk. We have released the code and models. You can find all reproduced checkpoints via this link, or go into each subfolder for models for each dataset.
  • [02/2021] arXiv version posted. Please stay tuned for updates.

Citation

If you find this code or idea useful, please cite our work:

@inproceedings{yang2021delving,
  title={Delving into Deep Imbalanced Regression},
  author={Yang, Yuzhe and Zha, Kaiwen and Chen, Ying-Cong and Wang, Hao and Katabi, Dina},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2021}
}

Contact

If you have any questions, feel free to contact us through email ([email protected] & [email protected]) or Github issues. Enjoy!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].