All Projects → mwydmuch → napkinXC

mwydmuch / napkinXC

Licence: MIT license
Extremely simple and fast extreme multi-class and multi-label classifiers.

Programming Languages

C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language
shell
77523 projects
c
50402 projects - #5 most used programming language
CMake
9771 projects

Projects that are alternatives of or similar to napkinXC

omikuji
An efficient implementation of Partitioned Label Trees & its variations for extreme multi-label classification
Stars: ✭ 69 (+81.58%)
Mutual labels:  classification, multi-label-classification, extreme-classification
extremeText
Library for fast text representation and extreme classification.
Stars: ✭ 141 (+271.05%)
Mutual labels:  multi-label-classification, extreme-classification
kaggle-human-protein-atlas-image-classification
Kaggle 2018 @ Human Protein Atlas Image Classification
Stars: ✭ 34 (-10.53%)
Mutual labels:  classification, multi-label-classification
Openml R
R package to interface with OpenML
Stars: ✭ 81 (+113.16%)
Mutual labels:  classification, datasets
Aspect-Based-Sentiment-Analysis
A python program that implements Aspect Based Sentiment Analysis classification system for SemEval 2016 Dataset.
Stars: ✭ 57 (+50%)
Mutual labels:  classification, multi-label-classification
DECAF
DECAF: Deep Extreme Classification with Label Features
Stars: ✭ 46 (+21.05%)
Mutual labels:  multi-label-classification, extreme-classification
Awesome Project Ideas
Curated list of Machine Learning, NLP, Vision, Recommender Systems Project Ideas
Stars: ✭ 6,114 (+15989.47%)
Mutual labels:  classification, multi-label-classification
Bird Recognition Review
A list of useful resources in the bird sound (song and calls) recognition, such as datasets, papers, links to open source projects and competitions
Stars: ✭ 116 (+205.26%)
Mutual labels:  classification, datasets
GalaXC
GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification
Stars: ✭ 28 (-26.32%)
Mutual labels:  multi-label-classification, extreme-classification
3d Pointcloud
Papers and Datasets about Point Cloud.
Stars: ✭ 179 (+371.05%)
Mutual labels:  classification, datasets
time-series-classification
Classifying time series using feature extraction
Stars: ✭ 75 (+97.37%)
Mutual labels:  classification, datasets
newt
Natural World Tasks
Stars: ✭ 24 (-36.84%)
Mutual labels:  classification, datasets
facerec-bias-bfw
Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).
Stars: ✭ 40 (+5.26%)
Mutual labels:  classification
InstantDL
InstantDL: An easy and convenient deep learning pipeline for image segmentation and classification
Stars: ✭ 33 (-13.16%)
Mutual labels:  classification
dl-relu
Deep Learning using Rectified Linear Units (ReLU)
Stars: ✭ 20 (-47.37%)
Mutual labels:  classification
volkscv
A Python toolbox for computer vision research and project
Stars: ✭ 58 (+52.63%)
Mutual labels:  classification
Kaggle-dog-breed-classification
This is the baseline of Kaggle-dog-breed-classification on Python, Keras, and TensorFlow.
Stars: ✭ 27 (-28.95%)
Mutual labels:  classification
Traffic-Signs
Second Project of the Udacity Self-Driving Car Nanodegree Program
Stars: ✭ 35 (-7.89%)
Mutual labels:  classification
Machine Learning From Scratch
Machine Learning models from scratch with a better visualisation
Stars: ✭ 15 (-60.53%)
Mutual labels:  classification
MLLabelUtils.jl
Utility package for working with classification targets and label-encodings
Stars: ✭ 30 (-21.05%)
Mutual labels:  classification

napkinXC

C++ build Python build Documentation Status PyPI version

napkinXC is an extremely simple and fast library for extreme multi-class and multi-label classification, that focus on implementing various methods for Probabilistic Label Trees. It allows training a classifier for very large datasets in just a few lines of code with minimal resources.

Right now, napkinXC implements the following features both in Python and C++:

  • Probabilistic Label Trees (PLTs) and Hierarchical softmax (HSM),
  • different types of inference methods (top-k, above a given threshold, etc.),
  • fast prediction with labels weight, e.g., propensity scores,
  • efficient online F-measure optimization (OFO) procedure,
  • different tree building methods, including hierarchical k-means clustering method,
  • training of tree node
  • support for custom tree structures, and node weights,
  • helpers to download and load data from XML Repository,
  • helpers to measure performance (precision@k, recall@k, nDCG@k, propensity-scored precision@k, and more).

Please note that this library is still under development and also serves as a base for experiments. API may not be compatible between releases and some of the experimental features may not be documented. Do not hesitate to open an issue in case of a question or problem!

The napkinXC is distributed under the MIT license. All contributions to the project are welcome!

Python Quick Start and Documentation

Install via pip:

pip install napkinxc

We provide precompiled wheels for many Linux distros, macOS, and Windows for Python 3.7+. In case there is no wheel for your os, it will be quickly compiled from the source. Compilation from source requires modern C++17 compiler, CMake, Git, and Python 3.7+ installed.

The latest (master) version can be installed directly from the GitHub repository (not recommended):

pip install git+https://github.com/mwydmuch/napkinXC.git

A minimal example of usage:

from napkinxc.datasets import load_dataset
from napkinxc.models import PLT
from napkinxc.measures import precision_at_k

X_train, Y_train = load_dataset("eurlex-4k", "train")
X_test, Y_test = load_dataset("eurlex-4k", "test")
plt = PLT("eurlex-model")
plt.fit(X_train, Y_train)
Y_pred = plt.predict(X_test, top_k=1)
print(precision_at_k(Y_test, Y_pred, k=1)) 

More examples can be found under python/examples directory, and napkinXC's documentation is available at https://napkinxc.readthedocs.io.

Executable

napkinXC can also be used as executable to train and evaluate models using data in LIBSVM format. See documentation for more details.

References and acknowledgments

This library implements methods from the following papers (see experiments directory for scripts to replicate the results):

Another implementation of PLT model is available in extremeText library, that implements approach described in this NeurIPS paper.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].