All Projects → ZhiningLiu1998 → Awesome Imbalanced Learning

ZhiningLiu1998 / Awesome Imbalanced Learning

Licence: cc0-1.0
A curated list of awesome imbalanced learning papers, codes, frameworks, and libraries. | 类别不平衡学习:论文、代码、框架与库

Projects that are alternatives of or similar to Awesome Imbalanced Learning

Machine-learning-toolkits-with-python
Machine learning toolkits with Python
Stars: ✭ 31 (-92.99%)
Mutual labels:  ensemble-learning
Ensemble-Pytorch
A unified ensemble framework for PyTorch to improve the performance and robustness of your deep learning model.
Stars: ✭ 407 (-7.92%)
Mutual labels:  ensemble-learning
Autogluon
AutoGluon: AutoML for Text, Image, and Tabular Data
Stars: ✭ 3,920 (+786.88%)
Mutual labels:  ensemble-learning
atomai
Deep and Machine Learning for Microscopy
Stars: ✭ 77 (-82.58%)
Mutual labels:  ensemble-learning
Kaggle-Competition-Sberbank
Top 1% rankings (22/3270) code sharing for Kaggle competition Sberbank Russian Housing Market: https://www.kaggle.com/c/sberbank-russian-housing-market
Stars: ✭ 31 (-92.99%)
Mutual labels:  ensemble-learning
HyperGBM
A full pipeline AutoML tool for tabular data
Stars: ✭ 172 (-61.09%)
Mutual labels:  ensemble-learning
Deep-Vessel
kgpml.github.io/deep-vessel/
Stars: ✭ 52 (-88.24%)
Mutual labels:  ensemble-learning
Rmdl
RMDL: Random Multimodel Deep Learning for Classification
Stars: ✭ 375 (-15.16%)
Mutual labels:  ensemble-learning
AdaptiveRandomForest
Repository for the AdaptiveRandomForest algorithm implemented in MOA 2016-04
Stars: ✭ 28 (-93.67%)
Mutual labels:  ensemble-learning
Sharplearning
Machine learning for C# .Net
Stars: ✭ 294 (-33.48%)
Mutual labels:  ensemble-learning
bird species classification
Supervised Classification of bird species 🐦 in high resolution images, especially for, Himalayan birds, having diverse species with fairly low amount of labelled data
Stars: ✭ 59 (-86.65%)
Mutual labels:  ensemble-learning
sl3
💪 🤔 Modern Super Learning with Machine Learning Pipelines
Stars: ✭ 93 (-78.96%)
Mutual labels:  ensemble-learning
mindware
An efficient open-source AutoML system for automating machine learning lifecycle, including feature engineering, neural architecture search, and hyper-parameter tuning.
Stars: ✭ 34 (-92.31%)
Mutual labels:  ensemble-learning
subsemble
subsemble R package for ensemble learning on subsets of data
Stars: ✭ 40 (-90.95%)
Mutual labels:  ensemble-learning
Deslib
A Python library for dynamic classifier and ensemble selection
Stars: ✭ 316 (-28.51%)
Mutual labels:  ensemble-learning
mindware
An efficient open-source AutoML system for automating machine learning lifecycle, including feature engineering, neural architecture search, and hyper-parameter tuning.
Stars: ✭ 34 (-92.31%)
Mutual labels:  ensemble-learning
GAN-Ensemble-for-Anomaly-Detection
This repository is the PyTorch implementation of GAN Ensemble for Anomaly Detection.
Stars: ✭ 26 (-94.12%)
Mutual labels:  ensemble-learning
User Machine Learning Tutorial
useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive http://user2016.org/tutorials/10.html
Stars: ✭ 393 (-11.09%)
Mutual labels:  ensemble-learning
Deeplearning
Python for《Deep Learning》,该书为《深度学习》(花书) 数学推导、原理剖析与源码级别代码实现
Stars: ✭ 4,020 (+809.5%)
Mutual labels:  ensemble-learning
Merlion
Merlion: A Machine Learning Framework for Time Series Intelligence
Stars: ✭ 2,368 (+435.75%)
Mutual labels:  ensemble-learning

Awesome Imbalanced Learning

A curated list of awesome imbalanced learning papers, codes, frameworks and libraries.

Class-imbalance (also known as the long-tail problem) is the fact that the classes are not represented equally in a classification problem, which is quite common in practice. For instance, fraud detection, prediction of rare adverse drug reactions and prediction gene families. Failure to account for the class imbalance often causes inaccurate and decreased predictive performance of many classification algorithms. Imbalanced learning aims to tackle the class imbalance problem to learn an unbiased model from imbalanced data.

Inspired by awesome-machine-learning. Contributions are welcomed!

Items marked with 🉑 are personally recommended (important/high-quality papers or libraries).

Table of Contents

Libraries

Python

  • imbalanced-learn [Github][Documentation][Paper] - imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-class imbalance. It is compatible with scikit-learn and is part of scikit-learn-contrib projects.

    🉑 written in python, easy to use.

  • smote_variants [Documentation][Github] - A collection of 85 minority over-sampling techniques for imbalanced learning with multi-class oversampling and model selection features (All writen in Python, also support R and Julia).

R

  • smote_variants [Documentation][Github] - A collection of 85 minority over-sampling techniques for imbalanced learning with multi-class oversampling and model selection features (All writen in Python, also support R and Julia).
  • caret [Documentation][Github] - Contains the implementation of Random under/over-sampling.
  • ROSE [Documentation] - Contains the implementation of ROSE (Random Over-Sampling Examples).
  • DMwR [Documentation] - Contains the implementation of SMOTE (Synthetic Minority Over-sampling TEchnique).

Java

  • KEEL [Github][Paper] - KEEL provides a simple GUI based on data flow to design experiments with different datasets and computational intelligence algorithms (paying special attention to evolutionary algorithms) in order to assess the behavior of the algorithms. This tool includes many widely used imbalanced learning techniques such as (evolutionary) over/under-resampling, cost-sensitive learning, algorithm modification, and ensemble learning methods.

    🉑 wide variety of classical classification, regression, preprocessing algorithms included.

Scalar

Julia

  • smote_variants [Documentation][Github] - A collection of 85 minority over-sampling techniques for imbalanced learning with multi-class oversampling and model selection features (All writen in Python, also support R and Julia).

Papers

Surveys

  • Learning from imbalanced data (2009, 4700+ citations) - Highly cited, classic survey paper. It systematically reviewed the popular solutions, evaluation metrics, and challenging problems in future research in this area (as of 2009).

    🉑 classic work.

  • Learning from imbalanced data: open challenges and future directions (2016, 400+ citations) - This paper concentrates on discussing the open issues and challenges in imbalanced learning, such as extreme class imbalance, dealing imbalance in online/stream learning, multi-class imbalanced learning, and semi/un-supervised imbalanced learning.

  • Learning from class-imbalanced data: Review of methods and applications (2017, 400+ citations) - A recent exhaustive survey of imbalanced learning methods and applications, a total of 527 papers were included in this study. It provides several detailed taxonomies of existing methods and also the recent trend of this research area.

    🉑 a systematic survey with detailed taxonomies of existing methods.

Deep Learning

Ensemble Learning

Data resampling

  • Over-sampling

    • ROS [Code] - Random Over-sampling

    • SMOTE [Code] (2002, 9800+ citations) - Synthetic Minority Over-sampling TEchnique

      🉑 classic work.

    • Borderline-SMOTE [Code] (2005, 1400+ citations) - Borderline-Synthetic Minority Over-sampling TEchnique

    • ADASYN [Code] (2008, 1100+ citations) - ADAptive SYNthetic Sampling

    • SPIDER [Code (Java)] (2008, 150+ citations) - Selective Preprocessing of Imbalanced Data

    • Safe-Level-SMOTE [Code (Java)] (2009, 370+ citations) - Safe Level Synthetic Minority Over-sampling TEchnique

    • SVM-SMOTE [Code] (2009, 120+ citations) - SMOTE based on Support Vectors of SVM

    • SMOTE-IPF (2015, 180+ citations) - SMOTE with Iterative-Partitioning Filter

    • 85 variants of SMOTE[code]

  • Under-sampling

    • RUS [Code] - Random Under-sampling
    • CNN [Code] (1968, 2100+ citations) - Condensed Nearest Neighbor
    • ENN [Code] (1972, 1500+ citations) - Edited Condensed Nearest Neighbor
    • TomekLink [Code] (1976, 870+ citations) - Tomek's modification of Condensed Nearest Neighbor
    • NCR [Code] (2001, 500+ citations) - Neighborhood Cleaning Rule
    • NearMiss-1 & 2 & 3 [Code] (2003, 420+ citations) - Several kNN approaches to unbalanced data distributions.
    • CNN with TomekLink [Code (Java)] (2004, 2000+ citations) - Condensed Nearest Neighbor + TomekLink
    • OSS [Code] (2007, 2100+ citations) - One Side Selection
    • EUS (2009, 290+ citations) - Evolutionary Under-sampling
    • IHT [Code] (2014, 130+ citations) - Instance Hardness Threshold
  • Hybrid-sampling

    • SMOTE-Tomek & SMOTE-ENN (2004, 2000+ citations) [Code (SMOTE-Tomek)] [Code (SMOTE-ENN)] - Synthetic Minority Over-sampling TEchnique + Tomek's modification of Condensed Nearest Neighbor/Edited Nearest Neighbor

      🉑 extensive experimental evaluation involving 10 different over/under-sampling methods.

    • SMOTE-RSB (2012, 210+ citations) - Hybrid Preprocessing using SMOTE and Rough Sets Theory

Cost-sensitive Learning

  • CSC4.5 [Code (Java)] (2002, 420+ citations) - An instance-weighting method to induce cost-sensitive trees
  • CSSVM [Code (Java)] (2008, 710+ citations) - Cost-sensitive SVMs for highly imbalanced classification
  • CSNN [Code (Java)] (2005, 950+ citations) - Training cost-sensitive neural networks with methods addressing the class imbalance problem.

Anomaly Detection

Others

1. Imbalanced Datasets

ID Name Repository & Target Ratio #S #F
1 ecoli UCI, target: imU 8.6:1 336 7
2 optical_digits UCI, target: 8 9.1:1 5,620 64
3 satimage UCI, target: 4 9.3:1 6,435 36
4 pen_digits UCI, target: 5 9.4:1 10,992 16
5 abalone UCI, target: 7 9.7:1 4,177 10
6 sick_euthyroid UCI, target: sick euthyroid 9.8:1 3,163 42
7 spectrometer UCI, target: > =44 11:1 531 93
8 car_eval_34 UCI, target: good, v good 12:1 1,728 21
9 isolet UCI, target: A, B 12:1 7,797 617
10 us_crime UCI, target: >0.65 12:1 1,994 100
11 yeast_ml8 LIBSVM, target: 8 13:1 2,417 103
12 scene LIBSVM, target: >one label 13:1 2,407 294
13 libras_move UCI, target: 1 14:1 360 90
14 thyroid_sick UCI, target: sick 15:1 3,772 52
15 coil_2000 KDD, CoIL, target: minority 16:1 9,822 85
16 arrhythmia UCI, target: 06 17:1 452 278
17 solar_flare_m0 UCI, target: M->0 19:1 1,389 32
18 oil UCI, target: minority 22:1 937 49
19 car_eval_4 UCI, target: vgood 26:1 1,728 21
20 wine_quality UCI, wine, target: <=4 26:1 4,898 11
21 letter_img UCI, target: Z 26:1 20,000 16
22 yeast_me2 UCI, target: ME2 28:1 1,484 8
23 webpage LIBSVM, w7a, target: minority 33:1 34,780 300
24 ozone_level UCI, ozone, data 34:1 2,536 72
25 mammography UCI, target: minority 42:1 11,183 6
26 protein_homo KDD CUP 2004, minority 111:1 145,751 74
27 abalone_19 UCI, target: 19 130:1 4,177 10

Note: This collection of datasets is from imblearn.datasets.fetch_datasets.

2. imbalanced databases

https://github.com/gykovacs/mldb

In this repo, there are 140+ KEEL data:

https://github.com/gykovacs/mldb/tree/master/mldb/data/classification

Other Resources

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].