Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → ZhiningLiu1998 → Awesome Imbalanced Learning

ZhiningLiu1998 / Awesome Imbalanced Learning

Licence: cc0-1.0

A curated list of awesome imbalanced learning papers, codes, frameworks, and libraries. | 类别不平衡学习：论文、代码、框架与库

Labels

machine-learning awesome awesome-list ensemble-learning

Projects that are alternatives of or similar to Awesome Imbalanced Learning

Machine-learning-toolkits-with-python

Machine learning toolkits with Python

Stars: ✭ 31 (-92.99%)

Mutual labels: ensemble-learning

Ensemble-Pytorch

A unified ensemble framework for PyTorch to improve the performance and robustness of your deep learning model.

Stars: ✭ 407 (-7.92%)

Mutual labels: ensemble-learning

AutoGluon: AutoML for Text, Image, and Tabular Data

Stars: ✭ 3,920 (+786.88%)

Mutual labels: ensemble-learning

Deep and Machine Learning for Microscopy

Stars: ✭ 77 (-82.58%)

Mutual labels: ensemble-learning

Kaggle-Competition-Sberbank

Top 1% rankings (22/3270) code sharing for Kaggle competition Sberbank Russian Housing Market: https://www.kaggle.com/c/sberbank-russian-housing-market

Stars: ✭ 31 (-92.99%)

Mutual labels: ensemble-learning

A full pipeline AutoML tool for tabular data

Stars: ✭ 172 (-61.09%)

Mutual labels: ensemble-learning

kgpml.github.io/deep-vessel/

Stars: ✭ 52 (-88.24%)

Mutual labels: ensemble-learning

RMDL: Random Multimodel Deep Learning for Classification

Stars: ✭ 375 (-15.16%)

Mutual labels: ensemble-learning

AdaptiveRandomForest

Repository for the AdaptiveRandomForest algorithm implemented in MOA 2016-04

Stars: ✭ 28 (-93.67%)

Mutual labels: ensemble-learning

Machine learning for C# .Net

Stars: ✭ 294 (-33.48%)

Mutual labels: ensemble-learning

bird species classification

Supervised Classification of bird species 🐦 in high resolution images, especially for, Himalayan birds, having diverse species with fairly low amount of labelled data

Stars: ✭ 59 (-86.65%)

Mutual labels: ensemble-learning

💪 🤔 Modern Super Learning with Machine Learning Pipelines

Stars: ✭ 93 (-78.96%)

Mutual labels: ensemble-learning

An efficient open-source AutoML system for automating machine learning lifecycle, including feature engineering, neural architecture search, and hyper-parameter tuning.

Stars: ✭ 34 (-92.31%)

Mutual labels: ensemble-learning

subsemble R package for ensemble learning on subsets of data

Stars: ✭ 40 (-90.95%)

Mutual labels: ensemble-learning

A Python library for dynamic classifier and ensemble selection

Stars: ✭ 316 (-28.51%)

Mutual labels: ensemble-learning

An efficient open-source AutoML system for automating machine learning lifecycle, including feature engineering, neural architecture search, and hyper-parameter tuning.

Stars: ✭ 34 (-92.31%)

Mutual labels: ensemble-learning

GAN-Ensemble-for-Anomaly-Detection

This repository is the PyTorch implementation of GAN Ensemble for Anomaly Detection.

Stars: ✭ 26 (-94.12%)

Mutual labels: ensemble-learning

User Machine Learning Tutorial

useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive http://user2016.org/tutorials/10.html

Stars: ✭ 393 (-11.09%)

Mutual labels: ensemble-learning

Python for《Deep Learning》，该书为《深度学习》(花书) 数学推导、原理剖析与源码级别代码实现

Stars: ✭ 4,020 (+809.5%)

Mutual labels: ensemble-learning

Merlion: A Machine Learning Framework for Time Series Intelligence

Stars: ✭ 2,368 (+435.75%)

Mutual labels: ensemble-learning

View All Similar Projects ➔

Awesome Imbalanced Learning

A curated list of awesome imbalanced learning papers, codes, frameworks and libraries.

Class-imbalance (also known as the long-tail problem) is the fact that the classes are not represented equally in a classification problem, which is quite common in practice. For instance, fraud detection, prediction of rare adverse drug reactions and prediction gene families. Failure to account for the class imbalance often causes inaccurate and decreased predictive performance of many classification algorithms. Imbalanced learning aims to tackle the class imbalance problem to learn an unbiased model from imbalanced data.

Inspired by awesome-machine-learning. Contributions are welcomed!

Items marked with 🉑 are personally recommended (important/high-quality papers or libraries).

Table of Contents

Awesome Imbalanced Learning
Table of Contents
Libraries
- Python
- R
- Java
- Scalar
- Julia
Papers
Others
- Imbalanced Datasets
- Other Resources

Libraries

Python

imbalanced-learn [Github][Documentation][Paper] - imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-class imbalance. It is compatible with scikit-learn and is part of scikit-learn-contrib projects.

🉑 written in python, easy to use.
smote_variants [Documentation][Github] - A collection of 85 minority over-sampling techniques for imbalanced learning with multi-class oversampling and model selection features (All writen in Python, also support R and Julia).

R

smote_variants [Documentation][Github] - A collection of 85 minority over-sampling techniques for imbalanced learning with multi-class oversampling and model selection features (All writen in Python, also support R and Julia).
caret [Documentation][Github] - Contains the implementation of Random under/over-sampling.
ROSE [Documentation] - Contains the implementation of ROSE (Random Over-Sampling Examples).
DMwR [Documentation] - Contains the implementation of SMOTE (Synthetic Minority Over-sampling TEchnique).

Java

KEEL [Github][Paper] - KEEL provides a simple GUI based on data flow to design experiments with different datasets and computational intelligence algorithms (paying special attention to evolutionary algorithms) in order to assess the behavior of the algorithms. This tool includes many widely used imbalanced learning techniques such as (evolutionary) over/under-resampling, cost-sensitive learning, algorithm modification, and ensemble learning methods.

🉑 wide variety of classical classification, regression, preprocessing algorithms included.

Scalar

undersampling [Documentation][Github] - A Scala library for under-sampling and their ensemble variants in imbalanced classification.

Julia

smote_variants [Documentation][Github] - A collection of 85 minority over-sampling techniques for imbalanced learning with multi-class oversampling and model selection features (All writen in Python, also support R and Julia).

Papers

Surveys

Learning from imbalanced data (2009, 4700+ citations) - Highly cited, classic survey paper. It systematically reviewed the popular solutions, evaluation metrics, and challenging problems in future research in this area (as of 2009).

🉑 classic work.
Learning from imbalanced data: open challenges and future directions (2016, 400+ citations) - This paper concentrates on discussing the open issues and challenges in imbalanced learning, such as extreme class imbalance, dealing imbalance in online/stream learning, multi-class imbalanced learning, and semi/un-supervised imbalanced learning.
Learning from class-imbalanced data: Review of methods and applications (2017, 400+ citations) - A recent exhaustive survey of imbalanced learning methods and applications, a total of 527 papers were included in this study. It provides several detailed taxonomies of existing methods and also the recent trend of this research area.

🉑 a systematic survey with detailed taxonomies of existing methods.

Deep Learning

Surveys
- A systematic study of the class imbalance problem in convolutional neural networks (2018, 330+ citations)
- Survey on deep learning with class imbalance (2019, 50+ citations)
🉑 a recent comprehensive survey of the class imbalance problem in deep learning.
Hard example mining
Training region-based object detectors with online hard example mining (CVPR 2016, 840+ citations) - In the later phase of NN training, only do gradient back-propagation for "hard examples" (i.e., with large loss value)
Loss function engineering
- Training deep neural networks on imbalanced data sets (IJCNN 2016, 110+ citations) - Mean (square) false error that can equally capture classification errors from both the majority class and the minority class.
- Focal loss for dense object detection [Code (Unofficial)] (ICCV 2017, 2600+ citations) - A uniform loss function that focuses training on a sparse set of hard examples to prevents the vast number of easy negatives from overwhelming the detector during training.
  
  🉑 elegant solution, high influence.
- Deep imbalanced attribute classification using visual attention aggregation [Code] (ECCV 2018, 30+ citation)
- Imbalanced deep learning by minority class incremental rectification (TPAMI 2018, 60+ citations) - Class Rectification Loss for minimizing the dominant effect of majority classes by discovering sparsely sampled boundaries of minority classes in an iterative batch-wise learning process.
- Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss [Code] (NIPS 2019, 10+ citations) - A theoretically-principled label-distribution-aware margin (LDAM) loss motivated by minimizing a margin-based generalization bound.
- Gradient harmonized single-stage detector [Code] (AAAI 2019, 40+ citations) - Compared to Focal Loss, which only down-weights "easy" negative examples, GHM also down-weights "very hard" examples as they are likely to be outliers.
🉑 interesting idea: harmonizing the contribution of examples on the basis of their gradient distribution.
- Class-Balanced Loss Based on Effective Number of Samples (CVPR 2019, 70+ citations) - a simple and generic class-reweighting mechanism based on Effective Number of Samples.
Meta-learning
- Learning to model the tail (NIPS 2017, 70+ citations) - Transfer meta-knowledge from the data-rich classes in the head of the distribution to the data-poor classes in the tail.
- Learning to reweight examples for robust deep learning [Code] (ICML 2018, 150+ citations) - Implicitly learn a weight function to reweight the samples in gradient updates of DNN.
  
  🉑 representative work to solve the class imbalance problem through meta-learning.
- Meta-weight-net: Learning an explicit mapping for sample weighting [Code] (NIPS 2019) - Explicitly learn a weight function (with an MLP as the function approximator) to reweight the samples in gradient updates of DNN.
- Learning Data Manipulation for Augmentation and Weighting [Code] (NIPS 2019)
- Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks [Code] (ICLR 2020)
- MESA: Boost Ensemble Imbalanced Learning with MEta-SAmpler [Code] [Video] (NeurIPS 2020)
  
  🉑 meta-learning-powered ensemble learning
Representation Learning
- Learning deep representation for imbalanced classification (CVPR 2016, 220+ citations)
- Supervised Class Distribution Learning for GANs-Based Imbalanced Classification (ICDM 2019)
- Decoupling Representation and Classifier for Long-tailed Recognition (ICLR 2020)
  
  🉑 interesting findings
Generative Modeling of Factorized Representations in Class-Imbalanced Data (NeurIPS 2020, paper not released yet)
Posterior Recalibration
Posterior Re-calibration for Imbalanced Datasets (NeurIPS 2020)
Semi/Self-supervised Learning
- Rethinking the Value of Labels for Improving Class-Imbalanced Learning [Code] [Video] (NeurIPS 2020)
  
  🉑 semi-supervised training / self-supervised pre-training helps imbalance learning
- Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning (NeurIPS 2020)
Curriculum Learning
- Dynamic Curriculum Learning for Imbalanced Data Classification (ICCV 2019)
Two-phase Training
- Brain tumor segmentation with deep neural networks (2017, 1200+ citations) - Pre-training on balanced dataset, fine-tuning the last output layer before softmax on the original, imbalanced data.
Network Architecture
- BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition (CVPR 2020)

Ensemble Learning

General ensemble
- Self-paced Ensemble [Code] (ICDE 2020) - Self-paced Ensemble for Highly Imbalanced Massive Data Classification
  
  🉑 high performance & computational efficiency & widely applicable to different classifiers.
- MESA: Boost Ensemble Imbalanced Learning with MEta-SAmpler [Code] [Video] (NeurIPS 2020)
  
  🉑 meta-learning-powered ensemble learning
EasyEnsemble & BalanceCascade [Code (EasyEnsemble)] [Code (BalanceCascade)] (2008, 1300+ citations) - Parallel ensemble training with RUS (EasyEnsemble) / Cascade ensemble training with RUS while iteratively drops well-classified examples (BalanceCascade)

🉑 simple but effective solution.
Boosting-based
- AdaBoost [Code] (1995, 18700+ citations) - Adaptive Boosting with C4.5
- DataBoost (2004, 570+ citations) - Boosting with Data Generation for Imbalanced Data
- SMOTEBoost [Code] (2003, 1100+ citations) - Synthetic Minority Over-sampling TEchnique Boosting
  
  🉑 classic work.
- MSMOTEBoost (2011, 1300+ citations) - Modified Synthetic Minority Over-sampling TEchnique Boosting
- RAMOBoost [Code] (2010, 140+ citations) - Ranked Minority Over-sampling in Boosting
- RUSBoost [Code] (2009, 850+ citations) - Random Under-Sampling Boosting
  
  🉑 classic work.
AdaBoostNC (2012, 350+ citations) - Adaptive Boosting with Negative Correlation Learning
- EUSBoost (2013, 210+ citations) - Evolutionary Under-sampling in Boosting
Bagging-based
- Bagging [Code] (1996, 23100+ citations) - Bagging predictors
- OverBagging & UnderOverBagging & SMOTEBagging & MSMOTEBagging [Code (SMOTEBagging)] (2009, 290+ citations) - Random Over-sampling / Random Hybrid Resampling / SMOTE / Modified SMOTE with Bagging
- UnderBagging [Code] (2003, 170+ citations) - Random Under-sampling with Bagging

Data resampling

Over-sampling
- ROS [Code] - Random Over-sampling
- SMOTE [Code] (2002, 9800+ citations) - Synthetic Minority Over-sampling TEchnique
  
  🉑 classic work.
- Borderline-SMOTE [Code] (2005, 1400+ citations) - Borderline-Synthetic Minority Over-sampling TEchnique
- ADASYN [Code] (2008, 1100+ citations) - ADAptive SYNthetic Sampling
- SPIDER [Code (Java)] (2008, 150+ citations) - Selective Preprocessing of Imbalanced Data
- Safe-Level-SMOTE [Code (Java)] (2009, 370+ citations) - Safe Level Synthetic Minority Over-sampling TEchnique
- SVM-SMOTE [Code] (2009, 120+ citations) - SMOTE based on Support Vectors of SVM
- SMOTE-IPF (2015, 180+ citations) - SMOTE with Iterative-Partitioning Filter
- 85 variants of SMOTE[code]
Under-sampling
- RUS [Code] - Random Under-sampling
- CNN [Code] (1968, 2100+ citations) - Condensed Nearest Neighbor
- ENN [Code] (1972, 1500+ citations) - Edited Condensed Nearest Neighbor
- TomekLink [Code] (1976, 870+ citations) - Tomek's modification of Condensed Nearest Neighbor
- NCR [Code] (2001, 500+ citations) - Neighborhood Cleaning Rule
- NearMiss-1 & 2 & 3 [Code] (2003, 420+ citations) - Several kNN approaches to unbalanced data distributions.
- CNN with TomekLink [Code (Java)] (2004, 2000+ citations) - Condensed Nearest Neighbor + TomekLink
- OSS [Code] (2007, 2100+ citations) - One Side Selection
- EUS (2009, 290+ citations) - Evolutionary Under-sampling
- IHT [Code] (2014, 130+ citations) - Instance Hardness Threshold
Hybrid-sampling
- SMOTE-Tomek & SMOTE-ENN (2004, 2000+ citations) [Code (SMOTE-Tomek)] [Code (SMOTE-ENN)] - Synthetic Minority Over-sampling TEchnique + Tomek's modification of Condensed Nearest Neighbor/Edited Nearest Neighbor
  
  🉑 extensive experimental evaluation involving 10 different over/under-sampling methods.
- SMOTE-RSB (2012, 210+ citations) - Hybrid Preprocessing using SMOTE and Rough Sets Theory

Cost-sensitive Learning

CSC4.5 [Code (Java)] (2002, 420+ citations) - An instance-weighting method to induce cost-sensitive trees
CSSVM [Code (Java)] (2008, 710+ citations) - Cost-sensitive SVMs for highly imbalanced classification
CSNN [Code (Java)] (2005, 950+ citations) - Training cost-sensitive neural networks with methods addressing the class imbalance problem.

Anomaly Detection

Anomaly Detection Learning Resources by yzhao062 - Anomaly detection related books, papers, videos, and toolboxes.
Surveys
- Anomaly detection: A survey (2009, 7300+ citations)
- A survey of network anomaly detection techniques (2017, 210+ citations)
Classification-based
- One-class SVMs for document classification (2001, 1300+ citations)
- One-class Collaborative Filtering (2008, 830+ citations)
- Isolation Forest (2008, 1000+ citations)
- Anomaly Detection using One-Class Neural Networks (2018, 70+ citations)
- Anomaly Detection with Robust Deep Autoencoders (KDD 2017, 170+ citations)

Others

1. Imbalanced Datasets

ID	Name	Repository & Target	Ratio	#S	#F
1	ecoli	UCI, target: imU	8.6:1	336	7
2	optical_digits	UCI, target: 8	9.1:1	5,620	64
3	satimage	UCI, target: 4	9.3:1	6,435	36
4	pen_digits	UCI, target: 5	9.4:1	10,992	16
5	abalone	UCI, target: 7	9.7:1	4,177	10
6	sick_euthyroid	UCI, target: sick euthyroid	9.8:1	3,163	42
7	spectrometer	UCI, target: > =44	11:1	531	93
8	car_eval_34	UCI, target: good, v good	12:1	1,728	21
9	isolet	UCI, target: A, B	12:1	7,797	617
10	us_crime	UCI, target: >0.65	12:1	1,994	100
11	yeast_ml8	LIBSVM, target: 8	13:1	2,417	103
12	scene	LIBSVM, target: >one label	13:1	2,407	294
13	libras_move	UCI, target: 1	14:1	360	90
14	thyroid_sick	UCI, target: sick	15:1	3,772	52
15	coil_2000	KDD, CoIL, target: minority	16:1	9,822	85
16	arrhythmia	UCI, target: 06	17:1	452	278
17	solar_flare_m0	UCI, target: M->0	19:1	1,389	32
18	oil	UCI, target: minority	22:1	937	49
19	car_eval_4	UCI, target: vgood	26:1	1,728	21
20	wine_quality	UCI, wine, target: <=4	26:1	4,898	11
21	letter_img	UCI, target: Z	26:1	20,000	16
22	yeast_me2	UCI, target: ME2	28:1	1,484	8
23	webpage	LIBSVM, w7a, target: minority	33:1	34,780	300
24	ozone_level	UCI, ozone, data	34:1	2,536	72
25	mammography	UCI, target: minority	42:1	11,183	6
26	protein_homo	KDD CUP 2004, minority	111:1	145,751	74
27	abalone_19	UCI, target: 19	130:1	4,177	10

Note: This collection of datasets is from imblearn.datasets.fetch_datasets.

2. imbalanced databases

https://github.com/gykovacs/mldb

In this repo, there are 140+ KEEL data:

https://github.com/gykovacs/mldb/tree/master/mldb/data/classification

Other Resources

Code
- imbalanced-algorithms - Python-based implementations of algorithms for learning on imbalanced data.
- imbalanced-dataset-sampler - A (PyTorch) imbalanced dataset sampler for oversampling low frequent classes and undersampling high frequent ones.
- class_imbalance - Jupyter Notebook presentation for class imbalance in binary classification.
- Multi-class-with-imbalanced-dataset-classification - Perform multi-class classification on imbalanced 20-news-group dataset.
Paper list
- Paper-list-on-Imbalanced-Time-series-Classification-with-Deep-Learning
Slides
- acm_imbalanced_learning - slides and code for the ACM Imbalanced Learning talk on 27th April 2016 in Austin, TX.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 442

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗