All Projects → damian-horna → multi-imbalance

damian-horna / multi-imbalance

Licence: MIT license
Python package for tackling multi-class imbalance problems. http://www.cs.put.poznan.pl/mlango/publications/multiimbalance/

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to multi-imbalance

Machine Learning
A repository of resources for understanding the concepts of machine learning/deep learning.
Stars: ✭ 29 (-56.06%)
Mutual labels:  preprocessing, smote, bagging
Statistical-Learning-using-R
This is a Statistical Learning application which will consist of various Machine Learning algorithms and their implementation in R done by me and their in depth interpretation.Documents and reports related to the below mentioned techniques can be found on my Rpubs profile.
Stars: ✭ 27 (-59.09%)
Mutual labels:  decision-trees, bagging
SMOTE
Synthetic Minority Over-sampling Technique
Stars: ✭ 28 (-57.58%)
Mutual labels:  smote, oversampling
Class-Imbalance
Dealing with class imbalance problem in machine learning. Synthetic oversampling(SMOTE, ADASYN).
Stars: ✭ 31 (-53.03%)
Mutual labels:  smote, oversampling
interpretable-ml
Techniques & resources for training interpretable ML models, explaining ML models, and debugging ML models.
Stars: ✭ 17 (-74.24%)
Mutual labels:  decision-trees
Voice4Rural
A complete one stop solution for all the problems of Rural area people. 👩🏻‍🌾
Stars: ✭ 12 (-81.82%)
Mutual labels:  decision-trees
Parametric-Contrastive-Learning
Parametric Contrastive Learning (ICCV2021)
Stars: ✭ 155 (+134.85%)
Mutual labels:  class-imbalance
dsp
DSP and filtering library
Stars: ✭ 36 (-45.45%)
Mutual labels:  resampling
preprocessy
Python package for Customizable Data Preprocessing Pipelines
Stars: ✭ 34 (-48.48%)
Mutual labels:  preprocessing
T-Reqs
T-Reqs is a multi-language requirements file generator which also serves the purpose of preparing a template Dockerfile for working with Docker applications.
Stars: ✭ 18 (-72.73%)
Mutual labels:  python-package
Face-Landmarking
Real time face landmarking using decision trees and NN autoencoders
Stars: ✭ 73 (+10.61%)
Mutual labels:  decision-trees
business-rules-motor-insurance
Hyperon - Motor Insurance Demo App. This is a sample application to demonstrate capabilities of Hyperon.io library (Java Business Rules Engine (BRE)/Java Pricing Engine). The application demonstrates responsive quotations for Car/Motor Insurance based on decision tables and Rhino functions (for math calculations). It shows different possible bus…
Stars: ✭ 16 (-75.76%)
Mutual labels:  decision-trees
auto-pairs
Vim plugin, insert or delete brackets, parentheses, and quotes in pairs
Stars: ✭ 109 (+65.15%)
Mutual labels:  balancing
goscore
Go Scoring API for PMML
Stars: ✭ 85 (+28.79%)
Mutual labels:  decision-trees
mesa
NeurIPS’20 | Build powerful ensemble class-imbalanced learning models via meta-knowledge-powered resampler. | 设计元知识驱动的采样器解决类别不平衡问题
Stars: ✭ 88 (+33.33%)
Mutual labels:  class-imbalance
stackgbm
🌳 Stacked Gradient Boosting Machines
Stars: ✭ 24 (-63.64%)
Mutual labels:  decision-trees
mlr3spatiotempcv
Spatiotemporal resampling methods for mlr3
Stars: ✭ 43 (-34.85%)
Mutual labels:  resampling
ballbot
Firmware for self balancing ballbot
Stars: ✭ 11 (-83.33%)
Mutual labels:  balancing
The-Supervised-Learning-Workshop
An Interactive Approach to Understanding Supervised Learning Algorithms
Stars: ✭ 24 (-63.64%)
Mutual labels:  decision-trees
NVTabular
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
Stars: ✭ 797 (+1107.58%)
Mutual labels:  preprocessing

Build Status codecov Documentation Status PyPI version PyPI - Python Version PyPI license

multi-imbalance

Multi-class imbalance is a common problem occurring in real-world supervised classifications tasks. While there has already been some research on the specialized methods aiming to tackle that challenging problem, most of them still lack coherent Python implementation that is simple, intuitive and easy to use. multi-imbalance is a python package tackling the problem of multi-class imbalanced datasets in machine learning.

Requirements

Tha package has been tested under python 3.6, 3.7 and 3.8. It relies heavily on scikit-learn and typical scientific stack (numpy, scipy, pandas etc.). Requirements include:

  • numpy>=1.17.0,
  • scikit-learn>=0.22.0,
  • pandas>=0.25.1,
  • pytest>=5.1.2,
  • imbalanced-learn>=0.6.1
  • IPython>=7.13.0,
  • seaborn>=0.10.1,
  • matplotlib>=3.2.1

Installation

Just type in

pip install multi-imbalance

Implemented algorithms

Our package includes implementation of such algorithms, as:

  • One-vs-One (OVO) and One-vs-all (OVA) ensembles [2],
  • Error-Correcting Output Codes (ECOC) [1] with dense, sparse and complete encoding [9] ,
  • Global-CS [4],
  • Static-SMOTE [10],
  • Mahalanobis Distance Oversampling [3],
  • Similarity-based Oversampling and Undersampling Preprocessing (SOUP) [5],
  • SPIDER3 cost-sensitive pre-processing [8].
  • Multi-class Roughly Balanced Bagging (MRBB) [7],
  • SOUP Bagging [6],

Example usage

from multi_imbalance.resampling.mdo import MDO

# Mahalanbois Distance Oversampling
mdo = MDO(k=9, k1_frac=0, seed=0)

# read the data
X_train, y_train, X_test, y_test = ...

# preprocess
X_train_resampled, y_train_resampled = mdo.fit_transform(np.copy(X_train), np.copy(y_train))

# train the classifier on preprocessed data
clf_tree = DecisionTreeClassifier(random_state=0)
clf_tree.fit(X_train_resampled, y_train_resampled)

# make predictions
y_pred = clf_tree.predict(X_test)

Example usage with pipeline

At the moment, due to some sklearn's limitations the only way to use our resampling methods is to use the pipelines implemented in imbalanced-learn. It doesn't apply to ensemble methods.

from imblearn.pipeline import Pipeline

X, y = load_arff_dataset('data/arff/new_ecoli.arff')
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=42)

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('mdo', MDO()),
    ('knn', KNN())
])

pipeline.fit(X_train, y_train)
y_hat = pipeline.predict(X_test)

print(classification_report(y_test, y_hat))

For more examples please refer to https://multi-imbalance.readthedocs.io/en/latest/ or check examples directory.

For developers:

multi-imbalance follows sklearn's coding guideline: https://scikit-learn.org/stable/developers/contributing.html

We use pytest as our unit tests framework. To use it, simply run:

pytest

If you would like to check the code coverage:

coverage run -m pytest
coverage report -m # or coverage html

multi-imbalance uses reStructuredText markdown for docstrings. To build the documentation locally run:

cd docs
make html -B

and open docs/_build/html/index.html

if you add a new algorithm, we would appreciate if you include references and an example of use in ./examples or docstrings.

About

If you use multi-imbalance in a scientific publication, please consider including citation to the following thesis:

@InProceedings{10.1007/978-3-030-67670-4_36,
    author="Grycza, Jacek and Horna, Damian and Klimczak, Hanna and Lango, Mateusz and Pluci{\'{n}}ski, Kamil and Stefanowski, Jerzy",
    title="multi-imbalance: Open Source Python Toolbox for Multi-class Imbalanced Classification",
    booktitle="Machine Learning and Knowledge Discovery in Databases. Applied Data Science and Demo Track",
    year="2021",
    publisher="Springer International Publishing",
    address="Cham",
    pages="546--549",
    isbn="978-3-030-67670-4"
}

References:

[1] Dietterich, T., and Bakiri, G. Solving multi-class learning problems via error-correcting output codes. Journal of Artificial Intelligence Research 2 (02 1995), 263–286.

[2] Fernández, A., López, V., Galar, M., del Jesus, M., and Herrera, F. Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches. Knowledge-Based Systems 42 (2013), 97 – 110.

[3] Abdi, L., and Hashemi, S. To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Transactions on Knowledge and Data Engineering 28 (January 2016), 238–251.

[4] Zhou, Z., and Liu, X. On multi-class cost-sensitive learning. In Proceedings of the 21st National Conference on Artificial Intelligence - Volume 1 (2006), AAAI’06, AAAI Press, pp. 567–572.

[5] Janicka, M., Lango, M., and Stefanowski, J. Using information on class interrelations to improve classification of multi-class imbalanced data: A new resampling algorithm. International Journal of Applied Mathematics and Computer Science 29 (December 2019).

[6] Lango, M., and Stefanowski, J. SOUP-Bagging: a new approach for multi-class imbalanced data classification. PP-RAI ’19: Polskie Porozumienie na Rzecz Sztucznej Inteligencji (2019).

[7] Lango, M., and Stefanowski, J. Multi-class and feature selection extensions of roughly balanced bagging for imbalanced data. J Intell Inf Syst 50 (2017), 97–127

[8] Wojciechowski, S., Wilk, S., and Stefanowski, J. An algorithm for selective preprocessing of multi-class imbalanced data. In Proceedings of the 10th International Conference on Computer Recognition Systems (05 2017), pp. 238–247.

[9] Kuncheva, L. Combining Pattern Classifiers: Methods and Algorithms. Wiley (2004).

[10] Fernández-Navarro, F., Hervás-Martínez, C., and Antonio Gutiérrez, P. A dynamic over-sampling procedure based on sensitivity for multi-class problems. Pattern Recognition, 44(8), 1821–1833 (2011).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].