All Projects → benedekrozemberczki → Graphwavemachine

benedekrozemberczki / Graphwavemachine

Licence: gpl-3.0
A scalable implementation of "Learning Structural Node Embeddings Via Diffusion Wavelets (KDD 2018)".

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Graphwavemachine

Tadw
An implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).
Stars: ✭ 43 (-71.52%)
Mutual labels:  unsupervised-learning, word2vec
Gemsec
The TensorFlow reference implementation of 'GEMSEC: Graph Embedding with Self Clustering' (ASONAM 2019).
Stars: ✭ 210 (+39.07%)
Mutual labels:  unsupervised-learning, word2vec
Danmf
A sparsity aware implementation of "Deep Autoencoder-like Nonnegative Matrix Factorization for Community Detection" (CIKM 2018).
Stars: ✭ 161 (+6.62%)
Mutual labels:  unsupervised-learning, word2vec
NMFADMM
A sparsity aware implementation of "Alternating Direction Method of Multipliers for Non-Negative Matrix Factorization with the Beta-Divergence" (ICASSP 2014).
Stars: ✭ 39 (-74.17%)
Mutual labels:  word2vec, unsupervised-learning
Graph2vec
A parallel implementation of "graph2vec: Learning Distributed Representations of Graphs" (MLGWorkshop 2017).
Stars: ✭ 605 (+300.66%)
Mutual labels:  unsupervised-learning, word2vec
altair
Assessing Source Code Semantic Similarity with Unsupervised Learning
Stars: ✭ 42 (-72.19%)
Mutual labels:  word2vec, unsupervised-learning
RolX
An alternative implementation of Recursive Feature and Role Extraction (KDD11 & KDD12)
Stars: ✭ 52 (-65.56%)
Mutual labels:  word2vec, unsupervised-learning
Bagofconcepts
Python implementation of bag-of-concepts
Stars: ✭ 18 (-88.08%)
Mutual labels:  unsupervised-learning, word2vec
Text Summarizer
Python Framework for Extractive Text Summarization
Stars: ✭ 96 (-36.42%)
Mutual labels:  unsupervised-learning, word2vec
Isolation Forest
A Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.
Stars: ✭ 139 (-7.95%)
Mutual labels:  unsupervised-learning
Wordembeddings Elmo Fasttext Word2vec
Using pre trained word embeddings (Fasttext, Word2Vec)
Stars: ✭ 146 (-3.31%)
Mutual labels:  word2vec
Autoregressive Predictive Coding
Autoregressive Predictive Coding: An unsupervised autoregressive model for speech representation learning
Stars: ✭ 138 (-8.61%)
Mutual labels:  unsupervised-learning
Complete Life Cycle Of A Data Science Project
Complete-Life-Cycle-of-a-Data-Science-Project
Stars: ✭ 140 (-7.28%)
Mutual labels:  unsupervised-learning
Lr Gan.pytorch
Pytorch code for our ICLR 2017 paper "Layered-Recursive GAN for image generation"
Stars: ✭ 145 (-3.97%)
Mutual labels:  unsupervised-learning
Splitbrainauto
Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction. In CVPR, 2017.
Stars: ✭ 137 (-9.27%)
Mutual labels:  unsupervised-learning
Textfeatures
👷‍♂️ A simple package for extracting useful features from character objects 👷‍♀️
Stars: ✭ 148 (-1.99%)
Mutual labels:  word2vec
Word2vec
对 ansj 编写的 Word2VEC_java 的进一步包装,同时实现了常用的词语相似度和句子相似度计算。
Stars: ✭ 136 (-9.93%)
Mutual labels:  word2vec
Ml
sourced.ml is a library and command line tools to build and apply machine learning models on top of Universal Abstract Syntax Trees
Stars: ✭ 136 (-9.93%)
Mutual labels:  word2vec
Awesome Sentence Embedding
A curated list of pretrained sentence and word embedding models
Stars: ✭ 1,973 (+1206.62%)
Mutual labels:  unsupervised-learning
Fasttext4j
Implementing Facebook's FastText with java
Stars: ✭ 148 (-1.99%)
Mutual labels:  word2vec

GraphWave

Arxiv codebeat badge repo sizebenedekrozemberczki

Abstract

Nodes residing in different parts of a graph can have similar structural roles within their local network topology. The identification of such roles provides key insight into the organization of networks and can be used for a variety of machine learning tasks. However, learning structural representations of nodes is a challenging problem, and it has typically involved manually specifying and tailoring topological features for each node. In this paper, we develop GraphWave, a method that represents each node's network neighborhood via a low-dimensional embedding by leveraging heat wavelet diffusion patterns. Instead of training on hand-selected features, GraphWave learns these embeddings in an unsupervised way. We mathematically prove that nodes with similar network neighborhoods will have similar GraphWave embeddings even though these nodes may reside in very different parts of the network, and our method scales linearly with the number of edges. Experiments in a variety of different settings demonstrate GraphWave's real-world potential for capturing structural roles in networks, and our approach outperforms existing state-of-the-art baselines in every experiment, by as much as 137%.

The model is now also available in the package Karate Club.

This repository provides an implementation for GraphWave as it is described in:

Learning Structural Node Embeddings Via Diffusion Wavelets. Claire Donnat, Marinka Zitnik, David Hallac and Jure Leskovec. Proceedings of the 24th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-18).

The dense reference implementation is available [here].

Requirements

The codebase is implemented in Python 3.5.2 | Anaconda 4.2.0 (64-bit). Package versions used for development are just below.

networkx          1.11
tqdm              4.19.5
numpy             1.15.4
pandas            0.23.4
pygsp             0.5.1
texttable         1.5.0

Datasets

The code takes an input graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. A sample graph for the `Facebook Restaurants` dataset is included in the `data/` directory.

Options

Learning of the embedding is handled by the src/main.py script which provides the following command line arguments.

Input and output options

  --input     STR    Input dataset.    Default is `data/food_edges.csv`.
  --output    STR    Output dataset.   Default is `output/embedding.csv`.

Model options

  --mechanism           STR          Wavelet generation method.                          Default is `exact`.
  --heat-coefficient    FLOAT        Heat kernel coefficient.                            Default is 1000.0.
  --sample-number       INT          Number of characteristic function samples.          Default is 50.
  --approximation       INT          Order of Chebyshev polynomial.                      Default is 100.
  --step-size           INT          Sampling step size.                                 Default is 20.
  --switch              INT          Graph size at procedure switches to approximation.  Default is 100.

Examples

The following commands learn a graph embedding and writes it to disk. The node representations are ordered by the ID. Creating a GraphWave embedding of the default dataset with the default hyperparameter settings. Saving the embedding at the default path.

$ python src/main.py

Creating an embedding of an other dataset the Facebook Companies. Saving the output and the log in a custom place.

$ python src/main.py --input data/company_edges.csv  --output output/company_embedding.csv

Creating an embedding of the default dataset in 128 dimensions.

$ python src/main.py --sample-number 128

License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].