All Projects → betterenvi → Gspan

betterenvi / Gspan

Licence: mit
Python implementation of frequent subgraph mining algorithm gSpan. Directed graphs are supported.

Projects that are alternatives of or similar to Gspan

Interpretable machine learning with python
Examples of techniques for training interpretable ML models, explaining ML models, and debugging ML models for accuracy, discrimination, and security.
Stars: ✭ 530 (+414.56%)
Mutual labels:  jupyter-notebook, data-mining
Awesome Ai Books
Some awesome AI related books and pdfs for learning and downloading, also apply some playground models for learning
Stars: ✭ 855 (+730.1%)
Mutual labels:  jupyter-notebook, data-mining
Cookbook 2nd Code
Code of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]
Stars: ✭ 541 (+425.24%)
Mutual labels:  jupyter-notebook, data-mining
Mli Resources
H2O.ai Machine Learning Interpretability Resources
Stars: ✭ 428 (+315.53%)
Mutual labels:  jupyter-notebook, data-mining
Evalne
Source code for EvalNE, a Python library for evaluating Network Embedding methods.
Stars: ✭ 67 (-34.95%)
Mutual labels:  graph-algorithms, data-mining
Rong360
用户贷款风险预测
Stars: ✭ 489 (+374.76%)
Mutual labels:  jupyter-notebook, data-mining
Spring2017 proffosterprovost
Introduction to Data Science
Stars: ✭ 18 (-82.52%)
Mutual labels:  jupyter-notebook, data-mining
PracticalMachineLearning
A collection of ML related stuff including notebooks, codes and a curated list of various useful resources such as books and softwares. Almost everything mentioned here is free (as speech not free food) or open-source.
Stars: ✭ 60 (-41.75%)
Mutual labels:  data-mining, graph-algorithms
Gendis
Contains an implementation (sklearn API) of the algorithm proposed in "GENDIS: GEnetic DIscovery of Shapelets" and code to reproduce all experiments.
Stars: ✭ 59 (-42.72%)
Mutual labels:  jupyter-notebook, data-mining
Helioml
A book about machine learning, statistics, and data mining for heliophysics
Stars: ✭ 36 (-65.05%)
Mutual labels:  jupyter-notebook, data-mining
Graph Adversarial Learning Literature
A curated list of adversarial attacks and defenses papers on graph-structured data.
Stars: ✭ 362 (+251.46%)
Mutual labels:  graph-algorithms, data-mining
2017 Ccf Bdci Enterprise
2017-CCF-BDCI-企业经营退出风险预测:9th/569 (Top 1.58%)
Stars: ✭ 81 (-21.36%)
Mutual labels:  jupyter-notebook, data-mining
Graph Fraud Detection Papers
A curated list of fraud detection papers using graph information or graph neural networks
Stars: ✭ 339 (+229.13%)
Mutual labels:  graph-algorithms, data-mining
Feature Engineering And Feature Selection
A Guide for Feature Engineering and Feature Selection, with implementations and examples in Python.
Stars: ✭ 526 (+410.68%)
Mutual labels:  jupyter-notebook, data-mining
Pydataroad
open source for wechat-official-account (ID: PyDataLab)
Stars: ✭ 302 (+193.2%)
Mutual labels:  jupyter-notebook, data-mining
Cookbook 2nd
IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018
Stars: ✭ 704 (+583.5%)
Mutual labels:  jupyter-notebook, data-mining
Ppnp
PPNP & APPNP models from "Predict then Propagate: Graph Neural Networks meet Personalized PageRank" (ICLR 2019)
Stars: ✭ 177 (+71.84%)
Mutual labels:  graph-algorithms, jupyter-notebook
Link Prediction
Representation learning for link prediction within social networks
Stars: ✭ 245 (+137.86%)
Mutual labels:  graph-algorithms, jupyter-notebook
Drugs Recommendation Using Reviews
Analyzing the Drugs Descriptions, conditions, reviews and then recommending it using Deep Learning Models, for each Health Condition of a Patient.
Stars: ✭ 35 (-66.02%)
Mutual labels:  jupyter-notebook, data-mining
Rental Prediction
2018年全国大学生计算机应用能力大赛之住房月租金预测第一名代码
Stars: ✭ 74 (-28.16%)
Mutual labels:  jupyter-notebook, data-mining

gSpan

For Chinese readme, please go to README-Chinese.

gSpan is an algorithm for mining frequent subgraphs.

This program implements gSpan with Python. The repository on GitHub is https://github.com/betterenvi/gSpan. This implementation borrows some ideas from gboost.

Undirected Graphs

This program supports undirected graphs, and produces same results with gboost on the dataset graphdata/graph.data.

Directed Graphs

So far(date: 2016-10-29), gboost does not support directed graphs. This program implements gSpan for directed graphs. More specific, this program can mine frequent directed subgraph that has at least one node that can reach other nodes in the subgraph. But correctness is not guaranteed since the author did not do enough testing. After running several times on datasets graphdata/graph.data.directed.1 and graph.data.simple.5, there is no fault.

How to install

This program supports both Python 2 and Python 3.

Method 1

Install this project using pip:

pip install gspan-mining
Method 2

First, clone the project:

git clone https://github.com/betterenvi/gSpan.git
cd gSpan

You can optionally install this project as a third-party library so that you can run it under any path.

python setup.py install

How to run

The command is:

python -m gspan_mining [-s min_support] [-n num_graph] [-l min_num_vertices] [-u max_num_vertices] [-d True/False] [-v True/False] [-p True/False] [-w True/False] [-h] database_file_name 
Some examples
  • Read graph data from ./graphdata/graph.data, and mine undirected subgraphs given min support is 5000
python -m gspan_mining -s 5000 ./graphdata/graph.data
  • Read graph data from ./graphdata/graph.data, mine undirected subgraphs given min support is 5000, and visualize these frequent subgraphs(matplotlib and networkx are required)
python -m gspan_mining -s 5000 -p True ./graphdata/graph.data
  • Read graph data from ./graphdata/graph.data, and mine directed subgraphs given min support is 5000
python -m gspan_mining -s 5000 -d True ./graphdata/graph.data
  • Print help info
python -m gspan_mining -h

The author also wrote example code using Jupyter Notebook. Mining results and visualizations are presented. For detail, please refer to main.ipynb.

Running time

  • Environment

    • OS: Windows 10
    • Python version: Python 2.7.12
    • Processor: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz 3.60 GHz
    • Ram: 8.00 GB
  • Running time On the dataset ./graphdata/graph.data, running time is listed below:

Min support Number of frequent subgraphs Time
5000 26 51.48 s
3000 52 69.07 s
1000 455 3 m 49 s
600 1235 7 m 29 s
400 2710 12 m 53 s

Reference

gSpan: Graph-Based Substructure Pattern Mining, by X. Yan and J. Han. Proc. 2002 of Int. Conf. on Data Mining (ICDM'02).

One C++ implementation of gSpan.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].