All Projects → scikit-multilearn → Scikit Multilearn

scikit-multilearn / Scikit Multilearn

Licence: bsd-2-clause
A scikit-learn based module for multi-label et. al. classification

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Scikit Multilearn

Orange3
🍊 📊 💡 Orange: Interactive data analysis
Stars: ✭ 3,152 (+394.04%)
Mutual labels:  classification, scikit-learn, clustering
Python-Machine-Learning-Fundamentals
D-Lab's 6 hour introduction to machine learning in Python. Learn how to perform classification, regression, clustering, and do model selection using scikit-learn and TPOT.
Stars: ✭ 46 (-92.79%)
Mutual labels:  clustering, scikit-learn, classification
Practical Machine Learning With Python
Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.
Stars: ✭ 1,868 (+192.79%)
Mutual labels:  classification, scikit-learn, clustering
Machine Learning With Python
Practice and tutorial-style notebooks covering wide variety of machine learning techniques
Stars: ✭ 2,197 (+244.36%)
Mutual labels:  classification, scikit-learn, clustering
Machine-Learning-Specialization
Project work and Assignments for Machine learning specialization course on Coursera by University of washington
Stars: ✭ 27 (-95.77%)
Mutual labels:  clustering, classification
pyclustertend
A python package to assess cluster tendency
Stars: ✭ 38 (-94.04%)
Mutual labels:  clustering, scikit-learn
hmm
A Hidden Markov Model implemented in Javascript
Stars: ✭ 29 (-95.45%)
Mutual labels:  clustering, classification
textlytics
Text processing library for sentiment analysis and related tasks
Stars: ✭ 25 (-96.08%)
Mutual labels:  scikit-learn, classification
machine-learning-course
Machine Learning Course @ Santa Clara University
Stars: ✭ 17 (-97.34%)
Mutual labels:  clustering, scikit-learn
Alphapy
Automated Machine Learning [AutoML] with Python, scikit-learn, Keras, XGBoost, LightGBM, and CatBoost
Stars: ✭ 564 (-11.6%)
Mutual labels:  classification, scikit-learn
R
All Algorithms implemented in R
Stars: ✭ 294 (-53.92%)
Mutual labels:  classification, clustering
topometry
A comprehensive dimensional reduction framework to recover the latent topology from high-dimensional data.
Stars: ✭ 64 (-89.97%)
Mutual labels:  clustering, scikit-learn
ML-Track
This repository is a recommended track, designed to get started with Machine Learning.
Stars: ✭ 19 (-97.02%)
Mutual labels:  clustering, scikit-learn
sklearn-audio-classification
An in-depth analysis of audio classification on the RAVDESS dataset. Feature engineering, hyperparameter optimization, model evaluation, and cross-validation with a variety of ML techniques and MLP
Stars: ✭ 31 (-95.14%)
Mutual labels:  scikit-learn, classification
audio noise clustering
https://dodiku.github.io/audio_noise_clustering/results/ ==> An experiment with a variety of clustering (and clustering-like) techniques to reduce noise on an audio speech recording.
Stars: ✭ 24 (-96.24%)
Mutual labels:  clustering, scikit-learn
Python-Machine-Learning
Python Machine Learning Algorithms
Stars: ✭ 80 (-87.46%)
Mutual labels:  scikit-learn, classification
Smile
Statistical Machine Intelligence & Learning Engine
Stars: ✭ 5,412 (+748.28%)
Mutual labels:  classification, clustering
Malheur
A Tool for Automatic Analysis of Malware Behavior
Stars: ✭ 313 (-50.94%)
Mutual labels:  classification, clustering
Ml Course
Starter code of Prof. Andrew Ng's machine learning MOOC in R statistical language
Stars: ✭ 154 (-75.86%)
Mutual labels:  classification, clustering
Uci Ml Api
Simple API for UCI Machine Learning Dataset Repository (search, download, analyze)
Stars: ✭ 190 (-70.22%)
Mutual labels:  classification, clustering

scikit-multilearn

PyPI version License Build Status Linux and OSX Build Status Windows

scikit-multilearn is a Python module capable of performing multi-label learning tasks. It is built on-top of various scientific Python packages (numpy, scipy) and follows a similar API to that of scikit-learn.

Features

  • Native Python implementation. A native Python implementation for a variety of multi-label classification algorithms. To see the list of all supported classifiers, check this link.

  • Interface to Meka. A Meka wrapper class is implemented for reference purposes and integration. This provides access to all methods available in MEKA, MULAN, and WEKA — the reference standard in the field.

  • Builds upon giants! Team-up with the power of numpy and scikit. You can use scikit-learn's base classifiers as scikit-multilearn's classifiers. In addition, the two packages follow a similar API.

Dependencies

In most cases you will want to follow the requirements defined in the requirements/*.txt files in the package.

Base dependencies

scipy
numpy
future
scikit-learn
liac-arff # for loading ARFF files
requests # for dataset module
networkx # for networkX base community detection clusterers
python-louvain # for networkX base community detection clusterers
keras

GPL-incurring dependencies for two clusterers

python-igraph # for igraph library based clusterers
python-graphtool # for graphtool base clusterers

Note: Installing graphtool is complicated, please see: graphtool install instructions

Installation

To install scikit-multilearn, simply type the following command:

$ pip install scikit-multilearn

This will install the latest release from the Python package index. If you wish to install the bleeding-edge version, then clone this repository and run setup.py:

$ git clone https://github.com/scikit-multilearn/scikit-multilearn.git
$ cd scikit-multilearn
$ python setup.py

Basic Usage

Before proceeding to classification, this library assumes that you have a dataset with the following matrices:

  • x_train, x_test: training and test feature matrices of size (n_samples, n_features)
  • y_train, y_test: training and test label matrices of size (n_samples, n_labels)

Suppose we wanted to use a problem-transformation method called Binary Relevance, which treats each label as a separate single-label classification problem, to a Support-vector machine (SVM) classifier, we simply perform the following tasks:

# Import BinaryRelevance from skmultilearn
from skmultilearn.problem_transform import BinaryRelevance

# Import SVC classifier from sklearn
from sklearn.svm import SVC

# Setup the classifier
classifier = BinaryRelevance(classifier=SVC(), require_dense=[False,True])

# Train
classifier.fit(X_train, y_train)

# Predict
y_pred = classifier.predict(X_test)

More examples and use-cases can be seen in the documentation. For using the MEKA wrapper, check this link.

Contributing

This project is open for contributions. Here are some of the ways for you to contribute:

  • Bug reports/fix
  • Features requests
  • Use-case demonstrations
  • Documentation updates

In case you want to implement your own multi-label classifier, please read our Developer's Guide to help you integrate your implementation in our API.

To make a contribution, just fork this repository, push the changes in your fork, open up an issue, and make a Pull Request!

We're also available in Slack! Just go to our slack group.

Cite

If you used scikit-multilearn in your research or project, please cite our work:

@ARTICLE{2017arXiv170201460S,
   author = {{Szyma{\'n}ski}, P. and {Kajdanowicz}, T.},
   title = "{A scikit-based Python environment for performing multi-label classification}",
   journal = {ArXiv e-prints},
   archivePrefix = "arXiv",
   eprint = {1702.01460},
   year = 2017,
   month = feb
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].