All Projects → closest-git → QuantumForest

closest-git / QuantumForest

Licence: other
Fast Differentiable Forest lib with the advantages of both decision trees and neural networks

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to QuantumForest

Writing-editing-Network
Code for Paper Abstract Writing through Editing Mechanism
Stars: ✭ 72 (+14.29%)
Mutual labels:  attention-mechanism, end-to-end-learning
Amazon-Fine-Food-Review
Machine learning algorithm such as KNN,Naive Bayes,Logistic Regression,SVM,Decision Trees,Random Forest,k means and Truncated SVD on amazon fine food review
Stars: ✭ 28 (-55.56%)
Mutual labels:  decision-trees
DAF3D
Deep Attentive Features for Prostate Segmentation in 3D Transrectal Ultrasound
Stars: ✭ 60 (-4.76%)
Mutual labels:  attention-mechanism
NTUA-slp-nlp
💻Speech and Natural Language Processing (SLP & NLP) Lab Assignments for ECE NTUA
Stars: ✭ 19 (-69.84%)
Mutual labels:  attention-mechanism
ntua-slp-semeval2018
Deep-learning models of NTUA-SLP team submitted in SemEval 2018 tasks 1, 2 and 3.
Stars: ✭ 79 (+25.4%)
Mutual labels:  attention-mechanism
enformer-pytorch
Implementation of Enformer, Deepmind's attention network for predicting gene expression, in Pytorch
Stars: ✭ 146 (+131.75%)
Mutual labels:  attention-mechanism
OverlapPredator
[CVPR 2021, Oral] PREDATOR: Registration of 3D Point Clouds with Low Overlap.
Stars: ✭ 293 (+365.08%)
Mutual labels:  attention-mechanism
IPL-ML-2018
Predicting IPL match results. https://kuharan.github.io/IPL-ML-2018/
Stars: ✭ 14 (-77.78%)
Mutual labels:  decision-trees
SentimentAnalysis
Sentiment Analysis: Deep Bi-LSTM+attention model
Stars: ✭ 32 (-49.21%)
Mutual labels:  attention-mechanism
GROOT
[ICML 2021] A fast algorithm for fitting robust decision trees. http://proceedings.mlr.press/v139/vos21a.html
Stars: ✭ 15 (-76.19%)
Mutual labels:  decision-trees
yggdrasil-decision-forests
A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.
Stars: ✭ 156 (+147.62%)
Mutual labels:  decision-trees
AoA-pytorch
A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
Stars: ✭ 33 (-47.62%)
Mutual labels:  attention-mechanism
FragmentVC
Any-to-any voice conversion by end-to-end extracting and fusing fine-grained voice fragments with attention
Stars: ✭ 134 (+112.7%)
Mutual labels:  attention-mechanism
automatic-personality-prediction
[AAAI 2020] Modeling Personality with Attentive Networks and Contextual Embeddings
Stars: ✭ 43 (-31.75%)
Mutual labels:  attention-mechanism
attention-mechanism-keras
attention mechanism in keras, like Dense and RNN...
Stars: ✭ 19 (-69.84%)
Mutual labels:  attention-mechanism
IBM-final-project-Machine-Learning
Final project of IBM's course https://www.coursera.org/learn/machine-learning-with-python on coursera
Stars: ✭ 33 (-47.62%)
Mutual labels:  decision-trees
Patient2Vec
Patient2Vec: A Personalized Interpretable Deep Representation of the Longitudinal Electronic Health Record
Stars: ✭ 85 (+34.92%)
Mutual labels:  attention-mechanism
nystrom-attention
Implementation of Nyström Self-attention, from the paper Nyströmformer
Stars: ✭ 83 (+31.75%)
Mutual labels:  attention-mechanism
PAM
[TPAMI 2020] Parallax Attention for Unsupervised Stereo Correspondence Learning
Stars: ✭ 62 (-1.59%)
Mutual labels:  attention-mechanism
Video-Cap
🎬 Video Captioning: ICCV '15 paper implementation
Stars: ✭ 44 (-30.16%)
Mutual labels:  attention-mechanism

QuantumForest

QuantumForest is a new lib on the model of differentiable decision trees. It has the advantages of both trees and neural networks. Experiments on large datasets show that QuantumForest has higher accuracy than both deep networks and best GBDT libs(XGBoost, Catboost, mGBDT,...).

  • Keeping simple tree structure,easy to use and explain the decision process

  • Full differentiability like neural networks. So we could train it with many powerful optimization algorithms (SGD, Adam, …), just like the training of deep CNN.

  • Support batch training to reduce the memory usage greatly.

  • Support the end-to-end learning mode. Reduce a lot of work on data preprocessing and feature engineering.

![](Differentiable tree_1.png)

Performance

To verify our model and algorithm, we test its performance on six large datasets.

​ Table 1: Six large tabular datasets

Higgs Click YearPrediction Microsoft Yahoo EPSILON
Training 8.4M 800K 309K 580K 473K 320K
Validation 2.1M 100K 103K 143 71K 80K
Test 500K 100K 103K 241K 165K 100K
Features 28 11 90 136 699 2000
Problem Classification Classification Regression Regression Regression Classification
Description UCI ML Higgs 2012 KDD Cup Million Song Dataset MSLR-WEB 10k Yahoo LETOR dataset PASCAL Challenge 2008

The following table lists the accuracy of QuantumForest and some GBDT libraries

Higgs Click YearPrediction Microsoft Yahoo EPSILON
CatBoost 0.2434 0.3438 80.68 0.5587 0.5781 0.1119
XGBoost 0.2600 0.3461 81.11 0.5637 0.5756 0.1144
LightGBM 0.2291 0.3322 76.25 0.5587 0.5576 0.1160
NODE 0.2412 0.3309 77.43 0.5584 0.5666 0.1043
mGBDT OOM OOM 80.67 OOM OOM OOM
QuantumForest 0.2467 0.3309 74.02 0.5568 0.5656 0.1048

*Some results are copied form the testing results of NODE

All libraries use default parameters. LightGBM is the winner of ’Higgs’ and ’Yahoo’ datasets. NODE is the winner of ’Click’ datasets. QuantumForest performs best on the ’Click’, ’YearPrediction’,’Microsoft’, and ’EPSILON’ datasets. mGBDT always failed because out of memory(OOM) for most large datasets. The differentiable forest model has only been developed for a few years and is still in its early stages. QuantumForest shows the potential of differentiable forest model.

Dependencies

  • Python 3.7

  • PyTorch 1.3.1

Usage

For six large dataset in Table 1:

  1. Create a directory data_root for storing datasets

    QuantumForest would automatically download the datasets and save the files at data_root . Then automatically split each dataset into training, validation and test sets.

  2. To get the accuracy of QuantumForest

    For example, test the accuracy of the HIGGS dataset

    python main_tabular_data.py --data_root=../Datasets/ --dataset=HIGGS --learning_rate=0.002

    To test other datasets, set the dataset to one of [YEAR,YAHOO,CLICK,MICROSOFT,HIGGS,EPSILON]

  3. To get the accuracy of other GBDT libraries (all with default parameters)

    For example, test the accuracy of LightGBM

    python main_tabular_data.py  --dataset=HIGGS --model=LightGBM 

    To test other GBDT libraries, set the model to LightGBM, XGBoost, or Catboost

For other datasets:

1 Prepare the data

​ The class of data should be the child class of quantum_forest.TabularDataset. For the detail, please see TabularDataset.py.

2 Call quantum_forest

import quantum_forest

config = quantum_forest.QForest_config(data,0.002)     
config.device = quantum_forest.OnInitInstance(random_state=42)
config.model="QForest"      
config.in_features = data.X_train.shape[1]
config.tree_module = quantum_forest.DeTree 
config, visual =  quantum_forest.InitExperiment(config, 0)
config.response_dim = 3
config.feat_info = None
data.onFold(0,config,pkl_path=f"FOLD_0.pickle")
learner = quantum_forest.QuantumForest(config,data,feat_info=None,visual=visual).   \
                fit(data.X_train, data.y_train, eval_set=[(data.X_valid,data.y_valid)])
best_score = learner.best_score

Notes

More parameters

python main_tabular_data.py  --dataset=HIGGS --learning_rate=0.001 --scale=large --subsample=0.3
  1. set different learning_rate.
  2. set scale to large, the accuracy will be higher, but it will take longer to run.
  3. set subsample<1.0, the accuracy maybe higher with less time.

Citation

If you find this code useful, please consider citing:

[1] Chen, Yingshi. "Deep differentiable forest with sparse attention for the tabular data." arXiv preprint arXiv:2003.00223 (2020).
[2] Chen, Yingshi. "LiteMORT: A memory efficient gradient boosting tree system on adaptive compact distributions." arXiv preprint arXiv:2001.09419 (2020).

Future work

  • More huge testing datasets.

    ​ If anyone has larger dataset, please send to us for testing

  • More models.

  • More papers.

Author

Yingshi Chen ([email protected])

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].