All Projects → LucasBoTang → Optimal_Classification_Trees

LucasBoTang / Optimal_Classification_Trees

Licence: MIT license
Three MIP models for optimal classification tree: OCT, binOCT, flowOCT

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Optimal Classification Trees

jupyter-notebooks
Jupyter Notebooks and miscellaneous
Stars: ✭ 51 (+121.74%)
Mutual labels:  decision-tree
ml
经典机器学习算法的极简实现
Stars: ✭ 130 (+465.22%)
Mutual labels:  decision-tree
hexo-theme-mip
Hexo MIP 模板
Stars: ✭ 15 (-34.78%)
Mutual labels:  mip
bonsai-dt
Programmable Decision Tree Framework
Stars: ✭ 34 (+47.83%)
Mutual labels:  decision-tree
kdd99-scikit
Solutions to kdd99 dataset with Decision tree and Neural network by scikit-learn
Stars: ✭ 50 (+117.39%)
Mutual labels:  decision-tree
Machine Learning
⚡机器学习实战(Python3):kNN、决策树、贝叶斯、逻辑回归、SVM、线性回归、树回归
Stars: ✭ 5,601 (+24252.17%)
Mutual labels:  decision-tree
scoruby
Ruby Scoring API for PMML
Stars: ✭ 69 (+200%)
Mutual labels:  decision-tree
grasp
Essential NLP & ML, short & fast pure Python code
Stars: ✭ 58 (+152.17%)
Mutual labels:  decision-tree
devs
Devs是一款轻量级的规则引擎
Stars: ✭ 15 (-34.78%)
Mutual labels:  decision-tree
Rule Extraction from Trees
A toolkit for extracting comprehensible rules from tree-based algorithms
Stars: ✭ 34 (+47.83%)
Mutual labels:  decision-tree
SentimentAnalysis
(BOW, TF-IDF, Word2Vec, BERT) Word Embeddings + (SVM, Naive Bayes, Decision Tree, Random Forest) Base Classifiers + Pre-trained BERT on Tensorflow Hub + 1-D CNN and Bi-Directional LSTM on IMDB Movie Reviews Dataset
Stars: ✭ 40 (+73.91%)
Mutual labels:  decision-tree
DecisionTrees
A python implementation of the CART algorithm for decision trees
Stars: ✭ 38 (+65.22%)
Mutual labels:  decision-tree
Awesome Decision Tree Papers
A collection of research papers on decision, classification and regression trees with implementations.
Stars: ✭ 1,908 (+8195.65%)
Mutual labels:  decision-tree
sklearn-oblique-tree
a python interface to OC1 and other oblique decision tree implementations
Stars: ✭ 33 (+43.48%)
Mutual labels:  decision-tree
production-scheduling
Solving an item production scheduling problem with the help of mathematical optimization
Stars: ✭ 35 (+52.17%)
Mutual labels:  mip
Data-Mining-and-Warehousing
Data Mining algorithms for IDMW632C course at IIIT Allahabad, 6th semester
Stars: ✭ 19 (-17.39%)
Mutual labels:  decision-tree
Decision Tree Prune
Decision Tree with PEP,MEP,EBP,CVP,REP,CCP,ECP pruning algorithms,all are implemented with Python(sklearn-decision-tree-prune included,All are finished).
Stars: ✭ 60 (+160.87%)
Mutual labels:  decision-tree
OCT-Converter
Tools for extracting the raw optical coherence tomography (OCT) and fundus data from proprietary file formats.
Stars: ✭ 120 (+421.74%)
Mutual labels:  oct
STOCK-RETURN-PREDICTION-USING-KNN-SVM-GUASSIAN-PROCESS-ADABOOST-TREE-REGRESSION-AND-QDA
Forecast stock prices using machine learning approach. A time series analysis. Employ the Use of Predictive Modeling in Machine Learning to Forecast Stock Return. Approach Used by Hedge Funds to Select Tradeable Stocks
Stars: ✭ 94 (+308.7%)
Mutual labels:  decision-tree
decision-trees-for-ml
Building Decision Trees From Scratch In Python
Stars: ✭ 61 (+165.22%)
Mutual labels:  decision-tree

Optimal Classification Tree

Introduction

This project aims to better understand the computational and prediction performance of MIP-based classification tree learning approaches proposed by recent research papers. In particular, the OCT, binOCT, and flowOCT formulations are implemented and evaluated on publicly available classification datasets. Moreover, we propose a stable classification tree formulation incorporating the training and validation split decision into the model fitting processes. Computational results suggest flowOCT has the most consistent performance across datasets and tree complexity. OCT is competitive on small datasets and shallow trees, while binOCT yields the best performance on large datasets and deep trees. The proposed stable classification tree achieves stronger out-of-sample performance than flowOCT under random training and validation set splits.

Dependencies

MIP Models

Stable Classification Tree

With the ideas come from Bertsimas and Paskov (2020), we also implemented a stable classification tree (SOCT) formulation incorporating the training and validation set split decision into the decision tree learning processes. The model can be regardded as training the decision tree on the "hardest" sub-set of the training dataset. The SOCT is solved using a robust optimization method and a cutting plane method.

  • Bertsimas, D., & Paskov, I. (2020). Stable Regression: On the Power of Optimization over Randomization in Training Regression Problems. Journal of Machine Learning Research, 21, 1-25.

Parameters

  • max_depth: The maximum depth of the tree.
  • min_samples_split: The minimum number of samples required to split an internal node.
  • alpha: The penalty term coefficients for the number of splitting nodes.
  • warmstart: Warm start with skLearn decision tree or not.
  • timelimit: The time limit for running the MIP solver.
  • output: Show the optimizing output or not.

Sample Code

import tree as miptree
octree = miptree.optimalDecisionTreeClassifier(max_depth=3, min_samples_split=2, alpha=0.01, warmstart=True, timelimit=600, output=True)
octree.fit(x_train, y_train)
y_test_pred = octree.predict(x_test)

Data

Report

Details about the project report is here.

Attention: The experiment results in the report is old version.

Results

All the tests are conducted on Intel(R) Core(TM) CPU i7-7700HQ @ 2.80GHz and a memory of 16 GB. Algorithms are implemented in Python 3.7 with Gurobi 9.1 as an optimization solver. The time limit is set to 600 seconds and solver is ran without warmstarting.

Out-of-Sample Prediction Performance for OCT, binOCT, flowOCT, and CART

instance depth OCT binOCT flowOCT CART
balance-scale 2 0.6879 0.6497 0.6709 0.6285
balance-scale 3 0.6985 0.6603 0.6964 0.7197
balance-scale 4 0.7070 0.6624 0.7197 0.7707
balance-scale 5 0.6815 0.6157 0.7134 0.7707
breast-cancer 2 0.6952 0.6857 0.6810 0.7095
breast-cancer 3 0.7143 0.7143 0.6810 0.7286
breast-cancer 4 0.7238 0.7000 0.7429 0.7190
breast-cancer 5 0.7286 0.6238 0.6857 0.6952
car-evaluation 2 0.7654 0.7654 0.7654 0.7739
car-evaluation 3 0.7971 0.7863 0.7932 0.7724
car-evaluation 4 0.7361 0.8171 0.8071 0.8364
car-evaluation 5 0.6952 0.8233 0.7978 0.8341
hayes-roth 2 0.6083 0.5500 0.6083 0.5250
hayes-roth 3 0.6417 0.7000 0.7250 0.6083
hayes-roth 4 0.7500 0.6417 0.7917 0.6750
hayes-roth 5 0.7750 0.5417 0.7750 0.7833
house-votes-84 2 0.9713 0.9655 0.9713 0.9713
house-votes-84 3 0.9713 0.9713 0.9713 0.9713
house-votes-84 4 0.9713 0.9253 0.9713 0.9598
house-votes-84 5 0.9713 0.9598 0.9713 0.9713
monks-1 2 0.7506 0.7050 0.7506 0.7314
monks-1 3 0.8729 0.8393 0.8585 0.7890
monks-1 4 0.9496 0.9041 1.0000 0.7770
monks-1 5 0.8129 1.0000 1.0000 0.7506
monks-2 2 0.6623 0.6071 0.6623 0.6623
monks-2 3 0.6623 0.5850 0.6623 0.5717
monks-2 4 0.6623 0.5806 0.6623 0.6026
monks-2 5 0.6623 0.5894 0.6623 0.6689
monks-3 2 0.9664 0.9664 0.9664 0.9664
monks-3 3 0.9904 0.9904 0.9904 0.9784
monks-3 4 0.9904 0.9880 0.9904 0.9904
monks-3 5 0.9736 0.9832 0.9904 0.9904
soybean-small 2 1.0000 0.9722 1.0000 0.7500
soybean-small 3 0.9444 0.7500 0.9722 0.9722
soybean-small 4 0.9444 0.8333 0.9444 0.9722
soybean-small 5 0.9722 0.7222 0.9722 0.9722
spect 2 0.8358 0.7811 0.8358 0.7811
spect 3 0.8358 0.7562 0.8358 0.8060
spect 4 0.8358 0.7413 0.7910 0.7761
spect 5 0.7910 0.7313 0.7910 0.7811
tic-tac-toe 2 0.6889 0.6889 0.6889 0.7097
tic-tac-toe 3 0.7514 0.7458 0.7625 0.7125
tic-tac-toe 4 0.7333 0.8014 0.7486 0.8000
tic-tac-toe 5 0.6958 0.7958 0.7361 0.8097

Optimality Gap for OCT, binOCT, flowOCT

instance depth OCT binOCT flowOCT
balance-scale 2 0.00% 0.00% 0.00%
balance-scale 3 93.82% 84.57% 0.00%
balance-scale 4 95.52% 100.00% 19.92%
balance-scale 5 97.87% 100.00% 20.64%
breast-cancer 2 0.00% 0.00% 0.00%
breast-cancer 3 89.21% 99.96% 0.00%
breast-cancer 4 92.47% 100.00% 20.40%
breast-cancer 5 93.06% 100.00% 24.14%
car-evaluation 2 99.29% 0.00% 0.00%
car-evaluation 3 99.62% 94.30% 17.55%
car-evaluation 4 100.00% 100.00% 21.76%
car-evaluation 5 100.00% 100.00% 24.51%
hayes-roth 2 0.00% 0.00% 0.00%
hayes-roth 3 69.15% 29.71% 0.00%
hayes-roth 4 92.10% 100.00% 2.71%
hayes-roth 5 100.00% 100.00% 10.40%
house-votes-84 2 0.00% 0.00% 0.00%
house-votes-84 3 0.00% 66.04% 0.00%
house-votes-84 4 0.00% 33.33% 0.00%
house-votes-84 5 0.00% 0.00% 0.00%
monks-1 2 0.00% 0.00% 0.00%
monks-1 3 100.00% 24.89% 0.00%
monks-1 4 66.67% 66.67% 0.00%
monks-1 5 94.67% 0.00% 0.00%
monks-2 2 0.00% 0.00% 0.00%
monks-2 3 21.39% 99.42% 0.00%
monks-2 4 64.39% 100.00% 0.00%
monks-2 5 97.43% 100.00% 0.00%
monks-3 2 0.00% 0.00% 0.00%
monks-3 3 33.33% 0.00% 0.00%
monks-3 4 100.00% 100.00% 0.48%
monks-3 5 79.38% 100.00% 0.00%
soybean-small 2 0.00% 0.00% 0.00%
soybean-small 3 0.00% 0.00% 0.00%
soybean-small 4 0.00% 0.00% 0.00%
soybean-small 5 0.00% 0.00% 0.00%
spect 2 0.00% 0.00% 0.00%
spect 3 0.00% 13.76% 0.00%
spect 4 0.00% 48.55% 0.75%
spect 5 86.84% 53.65% 3.62%
tic-tac-toe 2 20.71% 0.00% 0.00%
tic-tac-toe 3 94.79% 100.00% 23.22%
tic-tac-toe 4 100.00% 100.00% 26.86%
tic-tac-toe 5 100.00% 100.00% 31.55%

Training Time for OCT, binOCT, flowOCT and CART

instance depth OCT binOCT flowOCT CART
balance-scale 2 324.485 6.703 1.948 0.016
balance-scale 3 600.985 601.105 187.126 0.001
balance-scale 4 602.283 601.775 600.087 0.001
balance-scale 5 606.111 603.954 600.100 0.000
breast-cancer 2 76.785 12.278 1.327 0.000
breast-cancer 3 600.662 600.958 7.482 0.000
breast-cancer 4 601.735 601.599 600.201 0.001
breast-cancer 5 603.483 603.669 600.165 0.001
car-evaluation 2 601.157 21.770 37.913 0.001
car-evaluation 3 602.927 601.778 600.066 0.001
car-evaluation 4 607.996 605.761 600.285 0.001
car-evaluation 5 618.090 613.143 600.488 0.000
hayes-roth 2 25.766 2.432 0.892 0.000
hayes-roth 3 600.384 539.534 18.473 0.000
hayes-roth 4 600.896 600.857 447.373 0.000
hayes-roth 5 602.023 601.833 600.136 0.001
house-votes-84 2 6.927 1.323 0.731 0.000
house-votes-84 3 3.303 514.985 0.267 0.000
house-votes-84 4 7.094 204.534 0.237 0.000
house-votes-84 5 22.034 6.601 0.357 0.000
monks-1 2 68.489 3.624 3.230 0.001
monks-1 3 601.394 475.691 31.504 0.001
monks-1 4 423.765 404.370 9.361 0.001
monks-1 5 608.339 12.078 24.010 0.001
monks-2 2 62.953 26.734 3.040 0.000
monks-2 3 601.407 601.485 24.141 0.001
monks-2 4 604.056 602.460 37.813 0.001
monks-2 5 608.826 606.907 24.869 0.001
monks-3 2 27.279 1.292 1.432 0.000
monks-3 3 256.932 199.872 18.712 0.000
monks-3 4 603.237 602.149 600.129 0.001
monks-3 5 609.401 606.728 8.107 0.001
soybean-small 2 3.059 0.329 0.475 0.001
soybean-small 3 1.674 0.297 0.326 0.001
soybean-small 4 11.630 0.749 0.763 0.001
soybean-small 5 17.474 1.842 1.631 0.000
spect 2 3.590 8.102 0.270 0.000
spect 3 9.966 431.826 0.360 0.000
spect 4 6.822 601.869 368.500 0.001
spect 5 605.338 605.569 600.283 0.001
tic-tac-toe 2 567.332 159.264 63.977 0.001
tic-tac-toe 3 602.668 602.060 600.130 0.001
tic-tac-toe 4 606.554 605.508 600.159 0.001
tic-tac-toe 5 615.277 621.808 600.446 0.001

Average Solution for flowOCT and SOCT Solved with Robust Optimization and Cutting Plane Methods

instance flowOCT SOCT_CP SOCT_RB
balance-scale 3.30 15.97 7.17
breast-cancer 6.18 16.53 11.84
car_evaluation 24.16 108.59 54.20
hayes-roth 0.67 4.10 1.23
house-votes-84 0.58 1.31 1.20
monk1 0.47 1.18 0.81
monk2 0.77 5.81 2.66
monk3 0.32 0.79 0.62
soybean-small 0.32 0.62 1.28
spect 1.80 5.67 5.14
tic-tac-toe 71.50 246.28 88.11
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].