All Projects → Yimeng-Zhang → Rule_Extraction_from_Trees

Yimeng-Zhang / Rule_Extraction_from_Trees

Licence: other
A toolkit for extracting comprehensible rules from tree-based algorithms

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Rule Extraction from Trees

Data-Mining-and-Warehousing
Data Mining algorithms for IDMW632C course at IIIT Allahabad, 6th semester
Stars: ✭ 19 (-44.12%)
Mutual labels:  data-mining, decision-tree
Algorithmic-Trading
Algorithmic trading using machine learning.
Stars: ✭ 102 (+200%)
Mutual labels:  data-mining, decision-tree
Gwu data mining
Materials for GWU DNSC 6279 and DNSC 6290.
Stars: ✭ 217 (+538.24%)
Mutual labels:  data-mining
Orange3
🍊 📊 💡 Orange: Interactive data analysis
Stars: ✭ 3,152 (+9170.59%)
Mutual labels:  data-mining
Datascience
Curated list of Python resources for data science.
Stars: ✭ 3,051 (+8873.53%)
Mutual labels:  data-mining
Amazing Feature Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
Stars: ✭ 218 (+541.18%)
Mutual labels:  data-mining
Reaper
Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
Stars: ✭ 240 (+605.88%)
Mutual labels:  data-mining
Qminer
Analytic platform for real-time large-scale streams containing structured and unstructured data.
Stars: ✭ 206 (+505.88%)
Mutual labels:  data-mining
kenchi
A scikit-learn compatible library for anomaly detection
Stars: ✭ 36 (+5.88%)
Mutual labels:  data-mining
Lasio
Python library for reading and writing well data using Log ASCII Standard (LAS) files
Stars: ✭ 234 (+588.24%)
Mutual labels:  data-mining
Tweetfeels
Real-time sentiment analysis in Python using twitter's streaming api
Stars: ✭ 249 (+632.35%)
Mutual labels:  data-mining
Deepgraph
Analyze Data with Pandas-based Networks. Documentation:
Stars: ✭ 232 (+582.35%)
Mutual labels:  data-mining
Statistical Learning
Lecture Slides and R Sessions for Trevor Hastie and Rob Tibshinari's "Statistical Learning" Stanford course
Stars: ✭ 223 (+555.88%)
Mutual labels:  data-mining
Suod
(MLSys' 21) An Acceleration System for Large-scare Unsupervised Heterogeneous Outlier Detection (Anomaly Detection)
Stars: ✭ 245 (+620.59%)
Mutual labels:  data-mining
Prefixspan Py
The shortest yet efficient Python implementation of the sequential pattern mining algorithm PrefixSpan, closed sequential pattern mining algorithm BIDE, and generator sequential pattern mining algorithm FEAT.
Stars: ✭ 214 (+529.41%)
Mutual labels:  data-mining
Matminer
Data mining for materials science
Stars: ✭ 251 (+638.24%)
Mutual labels:  data-mining
Zhihu Analysis Python
Social Network Analysis of Zhihu with Python
Stars: ✭ 215 (+532.35%)
Mutual labels:  data-mining
Chirp
Interface to manage and centralize Google Alert information
Stars: ✭ 227 (+567.65%)
Mutual labels:  data-mining
Data Mining Conferences
Ranking, acceptance rate, deadline, and publication tips
Stars: ✭ 236 (+594.12%)
Mutual labels:  data-mining
decision-trees-for-ml
Building Decision Trees From Scratch In Python
Stars: ✭ 61 (+79.41%)
Mutual labels:  decision-tree

Rule_Extraction_From_Trees

A toolkit for extracting comprehensible rules and selecting the best performing rule set from tree-based algorithms, based on Skope-rules. Currently only supports 2-classes classification task.

Major groups of functionalities:

  1. Visualize tree structures and output as images;
  2. Rule extraction from trained tree models;
  3. Filter rules based on recall/precision threshold on a given dataset;
  4. Make predictions by rule voting.

Model supported:

  1. DecisionTreeClassifier/DecisionTreeRegressor
  2. BaggingClassifier/BaggingRegressor
  3. RandomForestClassifier/RandomForestRegressor
  4. ExtraTreesClassifier/ ExtraTreeRegressor

Installation

This project requires:

  • Python (>= 2.7 or >= 3.3)
  • NumPy (>= 1.10.4)
  • SciPy (>= 0.17.0)
  • Pandas (>= 0.18.1)
  • Scikit-Learn (>= 0.17.1)
  • pydotplus (>=2.0.2)
  • graphviz (>=0.8.2)

Installing graphviz (for windows user):

  1. Download and install executable from https://graphviz.gitlab.io/_pages/Download/Download_windows.html

  2. Set the PATH variable as follows

    install_graphviz

  3. Restart your currently running application that requires the path

  4. pip install pydotplus

Quick Start

See Demo1 here for a detailed example.

First download the code into your project folder.

  1. Train or load a tree-based model. Having the dataset that is trained on is better.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import tree,ensemble,metrics

from rule import Rule
from rule_extraction import rule_extract,draw_tree

# Train the model
model = tree.DecisionTreeClassifier(criterion='gini',max_depth=3)
model.fit(X_train,y_train)
  1. Extract all the rules from the tree (all paths from root node to leaves)
rules, _ = rule_extract(model=model,feature_names=X_train.columns)
for i in rules:
    print(i)

# output 
Sex_ordered > 0.4722778648138046 and Pclass_ordered > 0.3504907488822937 and Fare > 26.125
Sex_ordered <= 0.4722778648138046 and Age > 13.0 and Pclass_ordered <= 0.5564569681882858
Sex_ordered <= 0.4722778648138046 and Age <= 13.0 and Pclass_ordered <= 0.3504907488822937
Sex_ordered > 0.4722778648138046 and Pclass_ordered <= 0.3504907488822937 and Fare <= 20.800000190734863
Sex_ordered <= 0.4722778648138046 and Age > 13.0 and Pclass_ordered > 0.5564569681882858
Sex_ordered <= 0.4722778648138046 and Age <= 13.0 and Pclass_ordered > 0.3504907488822937
Sex_ordered > 0.4722778648138046 and Pclass_ordered > 0.3504907488822937 and Fare <= 26.125
Sex_ordered > 0.4722778648138046 and Pclass_ordered <= 0.3504907488822937 and Fare > 20.800000190734863
  1. Draw the structure of the tree
# blue (class=1) denote the node make prediction of class 1
# orange (class=0) denote the node make prediction of class 0
# the darker the color, the more purity the node has 
draw_tree(model=model,
          outdir='./images/DecisionTree/',
          feature_names=X_train.columns,
          proportion=False, # show [proportion] or [number of samples] from a node
          class_names=['0','1'])

  1. Filter rules base on recall/precision on dataset
rules, rule_dict = rule_extract(model=model_tree_clf,
           		   feature_names=X_train.columns,
                   x_test=X_test,
           	  	   y_test=y_test,
                   recall_min_c0=0.9,  # recall threshold on class 1
                   precision_min_c0=0.6)  # precision threshold on class 1

for i in rule_dict:
    print(i)
# return:(rule, recall on 1-class, prec on 1-class, recall on 0-class, prec on 0-class, nb) 
('Fare > 26.125 and Pclass_ordered > 0.3504907488822937 and Sex_ordered > 0.4722778648138046', (0.328125, 0.9130434782608695, 0.9746835443037974, 0.6416666666666667, 1))
('Fare <= 26.125 and Pclass_ordered > 0.3504907488822937 and Sex_ordered > 0.4722778648138046', (0.21875, 0.875, 0.9746835443037974, 0.6062992125984252, 1))

API Reference

TODO

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].