All Projects → trent-b → Iterative Stratification

trent-b / Iterative Stratification

Licence: bsd-3-clause
scikit-learn cross validators for iterative stratification of multilabel data

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Iterative Stratification

Sktime Dl
sktime companion package for deep learning based on TensorFlow
Stars: ✭ 379 (-28.89%)
Mutual labels:  scikit-learn
Sklearn Bayes
Python package for Bayesian Machine Learning with scikit-learn API
Stars: ✭ 428 (-19.7%)
Mutual labels:  scikit-learn
Onnxmltools
ONNXMLTools enables conversion of models to ONNX
Stars: ✭ 476 (-10.69%)
Mutual labels:  scikit-learn
Sktime
A unified framework for machine learning with time series
Stars: ✭ 4,741 (+789.49%)
Mutual labels:  scikit-learn
Actionai
custom human activity recognition modules by pose estimation and cascaded inference using sklearn API
Stars: ✭ 404 (-24.2%)
Mutual labels:  scikit-learn
Scikit Lego
Extra blocks for scikit-learn pipelines.
Stars: ✭ 445 (-16.51%)
Mutual labels:  scikit-learn
Libfaceid
libfaceid is a research framework for prototyping of face recognition solutions. It seamlessly integrates multiple detection, recognition and liveness models w/ speech synthesis and speech recognition.
Stars: ✭ 354 (-33.58%)
Mutual labels:  scikit-learn
Skll
SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.
Stars: ✭ 523 (-1.88%)
Mutual labels:  scikit-learn
Machinejs
[UNMAINTAINED] Automated machine learning- just give it a data file! Check out the production-ready version of this project at ClimbsRocks/auto_ml
Stars: ✭ 412 (-22.7%)
Mutual labels:  scikit-learn
Best Of Ml Python
🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.
Stars: ✭ 6,057 (+1036.4%)
Mutual labels:  scikit-learn
Giotto Tda
A high-performance topological machine learning toolbox in Python
Stars: ✭ 384 (-27.95%)
Mutual labels:  scikit-learn
Skorch
A scikit-learn compatible neural network library that wraps PyTorch
Stars: ✭ 4,241 (+695.68%)
Mutual labels:  scikit-learn
Data Science Ipython Notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+4036.59%)
Mutual labels:  scikit-learn
Neuraxle
A Sklearn-like Framework for Hyperparameter Tuning and AutoML in Deep Learning projects. Finally have the right abstractions and design patterns to properly do AutoML. Let your pipeline steps have hyperparameter spaces. Enable checkpoints to cut duplicate calculations. Go from research to production environment easily.
Stars: ✭ 377 (-29.27%)
Mutual labels:  scikit-learn
Palladium
Framework for setting up predictive analytics services
Stars: ✭ 481 (-9.76%)
Mutual labels:  scikit-learn
Interpret
Fit interpretable models. Explain blackbox machine learning.
Stars: ✭ 4,352 (+716.51%)
Mutual labels:  scikit-learn
Sklearn Doc Zh
📖 [译] scikit-learn(sklearn) 中文文档
Stars: ✭ 4,520 (+748.03%)
Mutual labels:  scikit-learn
Scikit Survival
Survival analysis built on top of scikit-learn
Stars: ✭ 525 (-1.5%)
Mutual labels:  scikit-learn
Scikit Multiflow
A machine learning package for streaming data in Python. The other ancestor of River.
Stars: ✭ 485 (-9.01%)
Mutual labels:  scikit-learn
Onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Stars: ✭ 5,910 (+1008.82%)
Mutual labels:  scikit-learn

Build Status Coverage Status

iterative-stratification

iterative-stratification is a project that provides scikit-learn compatible cross validators with stratification for multilabel data.

Presently scikit-learn provides several cross validators with stratification. However, these cross validators do not offer the ability to stratify multilabel data. This iterative-stratification project offers implementations of MultilabelStratifiedKFold, MultilabelRepeatedStratifiedKFold, and MultilabelStratifiedShuffleSplit with a base algorithm for stratifying multilabel data described in the following paper:

Sechidis K., Tsoumakas G., Vlahavas I. (2011) On the Stratification of Multi-Label Data. In: Gunopulos D., Hofmann T., Malerba D., Vazirgiannis M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science, vol 6913. Springer, Berlin, Heidelberg.

Requirements

iterative-stratification has been tested under Python 3.4 through 3.8 with the following dependencies:

  • scipy(>=0.13.3)
  • numpy(>=1.8.2)
  • scikit-learn(>=0.19.0)

Installation

iterative-stratification is currently available on the PyPi repository and can be installed via pip:

pip install iterative-stratification


The package is also installable from the Anaconda Cloud platform:

conda install -c trent-b iterative-stratification

Toy Examples

The multilabel cross validators that this package provides may be used with the scikit-learn API in the same manner as any other cross validators. For example, these cross validators may be passed to cross_val_score or cross_val_predict. Below are some toy examples of the direct use of the multilabel cross validators.

MultilabelStratifiedKFold

from iterstrat.ml_stratifiers import MultilabelStratifiedKFold
import numpy as np

X = np.array([[1,2], [3,4], [1,2], [3,4], [1,2], [3,4], [1,2], [3,4]])
y = np.array([[0,0], [0,0], [0,1], [0,1], [1,1], [1,1], [1,0], [1,0]])

mskf = MultilabelStratifiedKFold(n_splits=2, shuffle=True, random_state=0)

for train_index, test_index in mskf.split(X, y):
   print("TRAIN:", train_index, "TEST:", test_index)
   X_train, X_test = X[train_index], X[test_index]
   y_train, y_test = y[train_index], y[test_index]

Output:

TRAIN: [0 3 4 6] TEST: [1 2 5 7]
TRAIN: [1 2 5 7] TEST: [0 3 4 6]

RepeatedMultilabelStratifiedKFold

from iterstrat.ml_stratifiers import RepeatedMultilabelStratifiedKFold
import numpy as np

X = np.array([[1,2], [3,4], [1,2], [3,4], [1,2], [3,4], [1,2], [3,4]])
y = np.array([[0,0], [0,0], [0,1], [0,1], [1,1], [1,1], [1,0], [1,0]])

rmskf = RepeatedMultilabelStratifiedKFold(n_splits=2, n_repeats=2, random_state=0)

for train_index, test_index in rmskf.split(X, y):
   print("TRAIN:", train_index, "TEST:", test_index)
   X_train, X_test = X[train_index], X[test_index]
   y_train, y_test = y[train_index], y[test_index]

Output:

TRAIN: [0 3 4 6] TEST: [1 2 5 7]
TRAIN: [1 2 5 7] TEST: [0 3 4 6]
TRAIN: [0 1 4 5] TEST: [2 3 6 7]
TRAIN: [2 3 6 7] TEST: [0 1 4 5]

MultilabelStratifiedShuffleSplit

from iterstrat.ml_stratifiers import MultilabelStratifiedShuffleSplit
import numpy as np

X = np.array([[1,2], [3,4], [1,2], [3,4], [1,2], [3,4], [1,2], [3,4]])
y = np.array([[0,0], [0,0], [0,1], [0,1], [1,1], [1,1], [1,0], [1,0]])

msss = MultilabelStratifiedShuffleSplit(n_splits=3, test_size=0.5, random_state=0)

for train_index, test_index in msss.split(X, y):
   print("TRAIN:", train_index, "TEST:", test_index)
   X_train, X_test = X[train_index], X[test_index]
   y_train, y_test = y[train_index], y[test_index]

Output:

TRAIN: [1 2 5 7] TEST: [0 3 4 6]
TRAIN: [2 3 6 7] TEST: [0 1 4 5]
TRAIN: [1 2 5 6] TEST: [0 3 4 7]
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].