All Projects → nickkunz → smogn

nickkunz / smogn

Licence: GPL-3.0 license
Synthetic Minority Over-Sampling Technique for Regression

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to smogn

imbalanced-regression
[ICML 2021, Long Talk] Delving into Deep Imbalanced Regression
Stars: ✭ 425 (+78.57%)
Mutual labels:  regression, imbalanced-data
Machine Learning
A repository of resources for understanding the concepts of machine learning/deep learning.
Stars: ✭ 29 (-87.82%)
Mutual labels:  imbalanced-data, smote
ml-book
Codice sorgente ed Errata Corrige del mio libro "A tu per tu col Machine Learning"
Stars: ✭ 16 (-93.28%)
Mutual labels:  regression
FixedEffectjlr
R interface for Fixed Effect Models
Stars: ✭ 20 (-91.6%)
Mutual labels:  regression
SimpleGP
Simple Genetic Programming for Symbolic Regression in Python3
Stars: ✭ 20 (-91.6%)
Mutual labels:  regression
ResLT
ResLT: Residual Learning for Long-tailed Recognition (TPAMI 2022)
Stars: ✭ 40 (-83.19%)
Mutual labels:  imbalanced-data
SegSwap
(CVPRW 2022) Learning Co-segmentation by Segment Swapping for Retrieval and Discovery
Stars: ✭ 46 (-80.67%)
Mutual labels:  synthetic-data
kaggle-house-prices-advanced-regression-techniques
Repository for source code of kaggle competition: House Prices: Advanced Regression Techniques
Stars: ✭ 37 (-84.45%)
Mutual labels:  regression
class imbalance
Jupyter Notebook presentation for class imbalance in binary classification
Stars: ✭ 48 (-79.83%)
Mutual labels:  imbalanced-data
numerics
library of numerical methods using Armadillo
Stars: ✭ 17 (-92.86%)
Mutual labels:  regression
imbalanced-ensemble
Class-imbalanced / Long-tailed ensemble learning in Python. Modular, flexible, and extensible. | 模块化、灵活、易扩展的类别不平衡/长尾机器学习库
Stars: ✭ 199 (-16.39%)
Mutual labels:  imbalanced-data
php-chess
A chess library for PHP.
Stars: ✭ 42 (-82.35%)
Mutual labels:  regression
Goodreads visualization
A Jupyter notebook where I play with my Goodreads data
Stars: ✭ 51 (-78.57%)
Mutual labels:  regression
regression-python
In this repository you can find many different, small, projects which demonstrate regression techniques using python programming language
Stars: ✭ 15 (-93.7%)
Mutual labels:  regression
Microeconometrics.jl
Microeconometric estimation in Julia
Stars: ✭ 30 (-87.39%)
Mutual labels:  regression
MachineLearning
Machine learning for beginner(Data Science enthusiast)
Stars: ✭ 104 (-56.3%)
Mutual labels:  regression
Machine-Learning-Algorithms
All Machine Learning Algorithms
Stars: ✭ 24 (-89.92%)
Mutual labels:  regression
RVM-MATLAB
MATLAB code for Relevance Vector Machine using SB2_Release_200.
Stars: ✭ 38 (-84.03%)
Mutual labels:  regression
tensorscript
REPO MOVED TO https://repetere.github.io/jsonstack-model - Deep Learning Classification, Clustering, LSTM Time Series and Regression with Tensorflow
Stars: ✭ 37 (-84.45%)
Mutual labels:  regression
StoreItemDemand
(117th place - Top 26%) Deep learning using Keras and Spark for the "Store Item Demand Forecasting" Kaggle competition.
Stars: ✭ 24 (-89.92%)
Mutual labels:  regression

Synthetic Minority Over-Sampling Technique for Regression with Gaussian Noise

PyPI version License: GPL v3 Build Status Codacy Badge GitHub last commit

Description

A Python implementation of Synthetic Minority Over-Sampling Technique for Regression with Gaussian Noise (SMOGN). Conducts the Synthetic Minority Over-Sampling Technique for Regression (SMOTER) with traditional interpolation, as well as with the introduction of Gaussian Noise (SMOTER-GN). Selects between the two over-sampling techniques by the KNN distances underlying a given observation. If the distance is close enough, SMOTER is applied. If too far away, SMOTER-GN is applied. Useful for prediction problems where regression is applicable, but the values in the interest of predicting are rare or uncommon. This can also serve as a useful alternative to log transforming a skewed response variable, especially if generating synthetic data is also of interest.

Features

  1. The only open-source Python supported version of Synthetic Minority Over-Sampling Technique for Regression.

  2. Supports Pandas DataFrame inputs containing mixed data types, auto distance metric selection by data type, and optional auto removal of missing values.

  3. Flexible inputs available to control the areas of interest within a continuous response variable and friendly parameters for over-sampling synthetic data.

  4. Purely Pythonic, developed for consistency, maintainability, and future improvement, no foreign function calls to C or Fortran, as contained in original R implementation.

Requirements

  1. Python 3
  2. NumPy
  3. Pandas

Installation

## install pypi release
pip install smogn

## install developer version
pip install git+https://github.com/nickkunz/smogn.git

Usage

## load libraries
import smogn
import pandas

## load data
housing = pandas.read_csv(
    
    ## http://jse.amstat.org/v19n3/decock.pdf
    "https://raw.githubusercontent.com/nickkunz/smogn/master/data/housing.csv"
)

## conduct smogn
housing_smogn = smogn.smoter(
    
    data = housing, 
    y = "SalePrice"
)

Examples

  1. Beginner
  2. Intermediate
  3. Advanced

Applications

  1. de Santi, N. S., Rodrigues, N. V., Montero-Dorta, A. D., Abramo, L. R., Tucci, B., & Artale, M. C. (2022). Mimicking the Halo-Galaxy Connection Using Machine Learning. arXiv preprint:2201.06054. https://arxiv.org/abs/2201.06054.

  2. Gangapurwala, S., Geisert, M., Orsolino, R., Fallon, M., & Havoutis, I. (2022). RLOC: Terrain-Aware Legged Locomotion Using Reinforcement Learning and Optimal Control. arXiv preprint:2201.03094. https://arxiv.org/abs/2012.03094.

  3. Wang, B., Spessa, A., Feng, P., Hou, X., Yue, C., Luo, J.-J., Ciais, P., Waters, C., Cowie, A., Nolan, R. H., Nikonovas, T., Jin, H., Walshaw, H., Wei, J., Guo, X., Liu, D. L., & Yu, Q. (2021). Extreme Fire Weather Is The Major Driver Of Severe Bushfires In Southeast Australia. Science Bulletin, 67(6), 655-664. https://doi.org/10.1016/j.scib.2021.10.001.

  4. Agrawal, A., & Petersen, M. R. (2021). Detecting Arsenic Contamination Using Satellite Imagery and Machine Learning. Toxics, 9(12), 333. https://doi.org/10.3390/toxics9120333.

Citations

@software{smogn,
  author       = {Nicholas Kunz},
  title        = {{SMOGN}: Synthetic Minority Over-Sampling Technique for Regression with Gaussian Noise},
  year         = {2020},
  publisher    = {PyPI},
  version      = {v0.1.2},
  url          = {https://pypi.org/project/smogn/},
  copyright    = {GPL v3.0}
}

Contributions

SMOGN is open for improvements and maintenance. Your help is valued to make the package better for everyone.

License

© Nick Kunz, 2022. Licensed under the General Public License v3.0 (GPLv3).

Reference

Branco, P., Torgo, L., Ribeiro, R. (2017). SMOGN: A Pre-Processing Approach for Imbalanced Regression. Proceedings of Machine Learning Research, 74:36-50. http://proceedings.mlr.press/v74/branco17a/branco17a.pdf.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].