benedekrozemberczki / RolX

Licence: GPL-3.0 license
An alternative implementation of Recursive Feature and Role Extraction (KDD11 & KDD12)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to RolX

M-NMF
An implementation of "Community Preserving Network Embedding" (AAAI 2017)
Stars: ✭ 119 (+128.85%)
Mutual labels:  deepwalk, matrix-factorization, factorization, unsupervised-learning, node2vec, struc2vec, implicit-factorization, semisupervised-learning
walklets
A lightweight implementation of Walklets from "Don't Walk Skip! Online Learning of Multi-scale Network Embeddings" (ASONAM 2017).
Stars: ✭ 94 (+80.77%)
Mutual labels:  word2vec, deepwalk, gensim, graph-mining, embedding, node2vec, graph-embedding, node-embedding
NMFADMM
A sparsity aware implementation of "Alternating Direction Method of Multipliers for Non-Negative Matrix Factorization with the Beta-Divergence" (ICASSP 2014).
Stars: ✭ 39 (-25%)
Mutual labels:  word2vec, deepwalk, matrix-factorization, factorization, unsupervised-learning, embedding, node2vec
Awesome Community Detection
A curated list of community detection research papers with implementations.
Stars: ✭ 1,874 (+3503.85%)
Mutual labels:  deepwalk, matrix-factorization, factorization, unsupervised-learning, embedding, node2vec
FSCNMF
An implementation of "Fusing Structure and Content via Non-negative Matrix Factorization for Embedding Information Networks".
Stars: ✭ 16 (-69.23%)
Mutual labels:  word2vec, deepwalk, embedding, node2vec, graph-embedding, node-embedding
Gemsec
The TensorFlow reference implementation of 'GEMSEC: Graph Embedding with Self Clustering' (ASONAM 2019).
Stars: ✭ 210 (+303.85%)
Mutual labels:  word2vec, matrix-factorization, gensim, unsupervised-learning
FEATHER
The reference implementation of FEATHER from the CIKM '20 paper "Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models".
Stars: ✭ 34 (-34.62%)
Mutual labels:  deepwalk, node2vec, graph-embedding, node-embedding
Tadw
An implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).
Stars: ✭ 43 (-17.31%)
Mutual labels:  word2vec, matrix-factorization, gensim, unsupervised-learning
Awesome Graph Classification
A collection of important graph embedding, classification and representation learning papers with implementations.
Stars: ✭ 4,309 (+8186.54%)
Mutual labels:  deepwalk, node2vec, graph-embedding, node-embedding
resolutions-2019
A list of data mining and machine learning papers that I implemented in 2019.
Stars: ✭ 19 (-63.46%)
Mutual labels:  deepwalk, node2vec, graph-embedding, node-embedding
Graphembedding
Implementation and experiments of graph embedding algorithms.
Stars: ✭ 2,461 (+4632.69%)
Mutual labels:  deepwalk, node2vec, struc2vec
Graph2vec
A parallel implementation of "graph2vec: Learning Distributed Representations of Graphs" (MLGWorkshop 2017).
Stars: ✭ 605 (+1063.46%)
Mutual labels:  word2vec, matrix-factorization, unsupervised-learning
Germanwordembeddings
Toolkit to obtain and preprocess german corpora, train models using word2vec (gensim) and evaluate them with generated testsets
Stars: ✭ 189 (+263.46%)
Mutual labels:  word2vec, gensim
Shallowlearn
An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.
Stars: ✭ 196 (+276.92%)
Mutual labels:  word2vec, gensim
Splitter
A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).
Stars: ✭ 177 (+240.38%)
Mutual labels:  word2vec, gensim
Aravec
AraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models.
Stars: ✭ 239 (+359.62%)
Mutual labels:  word2vec, gensim
word-embeddings-from-scratch
Creating word embeddings from scratch and visualize them on TensorBoard. Using trained embeddings in Keras.
Stars: ✭ 22 (-57.69%)
Mutual labels:  word2vec, gensim
fastwalk
A multi-thread implementation of node2vec random walk.
Stars: ✭ 24 (-53.85%)
Mutual labels:  embedding, node2vec
GE-FSG
Graph Embedding via Frequent Subgraphs
Stars: ✭ 39 (-25%)
Mutual labels:  word2vec, graph-embedding
Log Anomaly Detector
Log Anomaly Detection - Machine learning to detect abnormal events logs
Stars: ✭ 169 (+225%)
Mutual labels:  word2vec, gensim

ReFeX and RolX

codebeat badge repo size benedekrozemberczki

ReFex is a structural graph feature extraction algorithm which creates binary features which describe structural properties of nodes in a large graph. First, continuous features are extracted based on descriptive statistics of neighbourhoods. These statistics are aggregated recursively. The original algorithm was extended in this implementation in such way that more advanced descriptive statistics can be extracted during the recursion phase. In addition, the number of feature extraction recursions and the binary binning also have controllable parameters. Finally, the strongly correlated features can be dropped based on an arbitrarily chosen threshold.

RolX is an algorithm which takes features extracted with ReFeX and factorizes the binary node-feature matrix in order to create low dimensional structural node representations. Nodes with similar structural features will be clustered together in the latent space. The original model uses non-negative matrix factorization, in our work we use an implicit matrix factorization model which is trained with a potent variant of gradient descent. Our implementation supports GPU use.

This repository provides a custom implementation for ReFex and RolX as described in the papers:

It's who you know: graph mining using recursive structural features. Keith Henderson, Brian Gallagher, Lei Li, Leman Akoglu, Tina Eliassi-Rad, Hanghang Tong and Christos Faloutsos. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. [Paper]

RolX: Structural Role Extraction & Mining in Large Graphs Keith Henderson, Brian Gallagher, Tina Eliassi-Rad, Hanghang Tong, Sugato Basu, Leman Akoglu, Danai Koutra, Christos Faloutsos and Lei Li. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. [Paper]

Another Python implementation is available [here].

Requirements

The codebase is implemented in Python 2.7. package versions used for development are just below.

networkx          1.11
tqdm              4.19.5
numpy             1.13.3
pandas            0.20.3
tensorflow-gpu    1.3.0
jsonschema        2.6.0
texttable         1.2.1

Datasets

The code takes an input graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. A sample graph for the `Facebook TVshows` dataset is included in the `data/` directory.

Logging

The models are defined in a way that parameter settings, extracted features and factorization loss are logged in every single epoch. Specifically we log the followings:

1. Hyperparameter settings.                  We save each hyperparameter used in the experiment.
3. Number of extracted features per epoch.   We take the number of features before and after pruning.
2. Cost per epoch.                           Reconstruction cost is stored in every iteration.
4. Runtime.                                  We measure the time needed for feature extraction and optimization -- measured by seconds.

Options

The feature extraction and factorization are handled by the `src/main.py` script which provides the following command line arguments.

Input and output options

  --input                        STR   Input graph path.           Default is `data/tvshow_edges.csv`.
  --embedding-output             STR   Embeddings path.            Default is `output/embeddings/tvhsow_embedding.csv`.
  --recursive-features-output    STR   Recursive features path.    Default is `output/features/tvhsow_features.csv`.
  --log-output                   STR   Log path.                   Default is `output/logs/tvhsow.log`.

ReFeX options

  --recursive-iterations  INT      Number of recursions.                                Default is 3.
  --bins                  INT      Number of binarization bins.                         Default is 4.
  --aggregator            STR      Aggregation strategy (simple/complex).               Default is `simple`.
  --pruning-cutoff        FLOAT    Absolute correlation for feature dropping.           Default is 0.9.

RolX options

  --epochs                  INT       Number of epochs.                           Default is 10.
  --batch-size              INT       Number of edges in batch.                   Default is 32.
  --dimensions              INT       Number of dimensions.                       Default is 16.
  --initial-learning-rate   FLOAT     Initial learning rate.                      Default is 0.01.
  --minimal-learning-rate   FLOAT     Final learning rate.                        Default is 0.001.
  --annealing-factor        FLOAT     Annealing factor for learning rate.         Default is 1.0.
  --lambd                   FLOAT     Weight regularization penalty.              Default is 10**-3.

Examples

The following commands create structural features, learn a graph embedding and write these to disk. The node representations are ordered by the ID.

Creating a RolX embedding of the default dataset with the default hyperparameter settings. Saving the ReFeX features, RolX embedding and the log file at the default path.

python src/main.py

Creating an embedding of an other dataset the Facebook Companies. Saving the output and the log in a custom place.

python src/main.py --input data/company_edges.csv  --embedding-output output/embeddings/company_embedding.csv --recursive-features-output output/features/company_features.csv --log-output output/logs/company_log.json

Creating an embedding of the default dataset in 128 dimensions with 8 binary feature bins.

python src/main.py --dimensions 128 --bins 8

License


Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].