Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

Stars: ✭ 218 (+541.18%)

Mutual labels: data-mining

Reaper

Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs

Stars: ✭ 240 (+605.88%)

Mutual labels: data-mining

Qminer

Analytic platform for real-time large-scale streams containing structured and unstructured data.

Stars: ✭ 206 (+505.88%)

Mutual labels: data-mining

kenchi

A scikit-learn compatible library for anomaly detection

Stars: ✭ 36 (+5.88%)

Mutual labels: data-mining

Lasio

Python library for reading and writing well data using Log ASCII Standard (LAS) files

Stars: ✭ 234 (+588.24%)

Mutual labels: data-mining

Tweetfeels

Real-time sentiment analysis in Python using twitter's streaming api

Stars: ✭ 249 (+632.35%)

Mutual labels: data-mining

Deepgraph

Analyze Data with Pandas-based Networks. Documentation:

Stars: ✭ 232 (+582.35%)

Mutual labels: data-mining

Statistical Learning

Lecture Slides and R Sessions for Trevor Hastie and Rob Tibshinari's "Statistical Learning" Stanford course

Stars: ✭ 223 (+555.88%)

Mutual labels: data-mining

Suod

(MLSys' 21) An Acceleration System for Large-scare Unsupervised Heterogeneous Outlier Detection (Anomaly Detection)

Stars: ✭ 245 (+620.59%)

Mutual labels: data-mining

Prefixspan Py

The shortest yet efficient Python implementation of the sequential pattern mining algorithm PrefixSpan, closed sequential pattern mining algorithm BIDE, and generator sequential pattern mining algorithm FEAT.

Stars: ✭ 214 (+529.41%)

Mutual labels: data-mining

Matminer

Data mining for materials science

Stars: ✭ 251 (+638.24%)

Mutual labels: data-mining

Zhihu Analysis Python

Social Network Analysis of Zhihu with Python

Stars: ✭ 215 (+532.35%)

Mutual labels: data-mining

Chirp

Interface to manage and centralize Google Alert information

Stars: ✭ 227 (+567.65%)

Mutual labels: data-mining

Data Mining Conferences

Ranking, acceptance rate, deadline, and publication tips

Stars: ✭ 236 (+594.12%)

Mutual labels: data-mining

decision-trees-for-ml

Building Decision Trees From Scratch In Python

Stars: ✭ 61 (+79.41%)

Mutual labels: decision-tree

View All Similar Projects ➔

Rule_Extraction_From_Trees

A toolkit for extracting comprehensible rules and selecting the best performing rule set from tree-based algorithms, based on Skope-rules. Currently only supports 2-classes classification task.

Major groups of functionalities:

Visualize tree structures and output as images;
Rule extraction from trained tree models;
Filter rules based on recall/precision threshold on a given dataset;
Make predictions by rule voting.

Model supported:

DecisionTreeClassifier/DecisionTreeRegressor
BaggingClassifier/BaggingRegressor
RandomForestClassifier/RandomForestRegressor
ExtraTreesClassifier/ ExtraTreeRegressor

Installation

This project requires:

Python (>= 2.7 or >= 3.3)
NumPy (>= 1.10.4)
SciPy (>= 0.17.0)
Pandas (>= 0.18.1)
Scikit-Learn (>= 0.17.1)
pydotplus (>=2.0.2)
graphviz (>=0.8.2)

Installing graphviz (for windows user):

Download and install executable from https://graphviz.gitlab.io/_pages/Download/Download_windows.html
Set the PATH variable as follows
Restart your currently running application that requires the path
pip install pydotplus

Quick Start

See Demo1 here for a detailed example.

First download the code into your project folder.

Train or load a tree-based model. Having the dataset that is trained on is better.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import tree,ensemble,metrics

from rule import Rule
from rule_extraction import rule_extract,draw_tree

# Train the model
model = tree.DecisionTreeClassifier(criterion='gini',max_depth=3)
model.fit(X_train,y_train)

Extract all the rules from the tree (all paths from root node to leaves)

rules, _ = rule_extract(model=model,feature_names=X_train.columns)
for i in rules:
    print(i)

# output 
Sex_ordered > 0.4722778648138046 and Pclass_ordered > 0.3504907488822937 and Fare > 26.125
Sex_ordered <= 0.4722778648138046 and Age > 13.0 and Pclass_ordered <= 0.5564569681882858
Sex_ordered <= 0.4722778648138046 and Age <= 13.0 and Pclass_ordered <= 0.3504907488822937
Sex_ordered > 0.4722778648138046 and Pclass_ordered <= 0.3504907488822937 and Fare <= 20.800000190734863
Sex_ordered <= 0.4722778648138046 and Age > 13.0 and Pclass_ordered > 0.5564569681882858
Sex_ordered <= 0.4722778648138046 and Age <= 13.0 and Pclass_ordered > 0.3504907488822937
Sex_ordered > 0.4722778648138046 and Pclass_ordered > 0.3504907488822937 and Fare <= 26.125
Sex_ordered > 0.4722778648138046 and Pclass_ordered <= 0.3504907488822937 and Fare > 20.800000190734863

Draw the structure of the tree

# blue (class=1) denote the node make prediction of class 1
# orange (class=0) denote the node make prediction of class 0
# the darker the color, the more purity the node has 
draw_tree(model=model,
          outdir='./images/DecisionTree/',
          feature_names=X_train.columns,
          proportion=False, # show [proportion] or [number of samples] from a node
          class_names=['0','1'])

Filter rules base on recall/precision on dataset

rules, rule_dict = rule_extract(model=model_tree_clf,
           		   feature_names=X_train.columns,
                   x_test=X_test,
           	  	   y_test=y_test,
                   recall_min_c0=0.9,  # recall threshold on class 1
                   precision_min_c0=0.6)  # precision threshold on class 1

for i in rule_dict:
    print(i)
# return:(rule, recall on 1-class, prec on 1-class, recall on 0-class, prec on 0-class, nb) 
('Fare > 26.125 and Pclass_ordered > 0.3504907488822937 and Sex_ordered > 0.4722778648138046', (0.328125, 0.9130434782608695, 0.9746835443037974, 0.6416666666666667, 1))
('Fare <= 26.125 and Pclass_ordered > 0.3504907488822937 and Sex_ordered > 0.4722778648138046', (0.21875, 0.875, 0.9746835443037974, 0.6062992125984252, 1))

API Reference

TODO

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Yimeng-Zhang / Rule_Extraction_from_Trees

Programming Languages

Labels

Projects that are alternatives of or similar to Rule Extraction from Trees

Rule_Extraction_From_Trees

Installation

Quick Start

API Reference