All Projects → benedekrozemberczki → Diff2vec

benedekrozemberczki / Diff2vec

Licence: gpl-3.0
Reference implementation of Diffusion2Vec (Complenet 2018) built on Gensim and NetworkX.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Diff2vec

word-embeddings-from-scratch
Creating word embeddings from scratch and visualize them on TensorBoard. Using trained embeddings in Keras.
Stars: ✭ 22 (-79.63%)
Mutual labels:  embeddings, gensim
Magnitude
A fast, efficient universal vector embedding utility package.
Stars: ✭ 1,394 (+1190.74%)
Mutual labels:  embeddings, gensim
Gemsec
The TensorFlow reference implementation of 'GEMSEC: Graph Embedding with Self Clustering' (ASONAM 2019).
Stars: ✭ 210 (+94.44%)
Mutual labels:  unsupervised-learning, gensim
RolX
An alternative implementation of Recursive Feature and Role Extraction (KDD11 & KDD12)
Stars: ✭ 52 (-51.85%)
Mutual labels:  gensim, unsupervised-learning
Fast sentence embeddings
Compute Sentence Embeddings Fast!
Stars: ✭ 384 (+255.56%)
Mutual labels:  embeddings, gensim
Tadw
An implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).
Stars: ✭ 43 (-60.19%)
Mutual labels:  unsupervised-learning, gensim
Lmdb Embeddings
Fast word vectors with little memory usage in Python
Stars: ✭ 404 (+274.07%)
Mutual labels:  embeddings, gensim
Sine
A PyTorch Implementation of "SINE: Scalable Incomplete Network Embedding" (ICDM 2018).
Stars: ✭ 67 (-37.96%)
Mutual labels:  unsupervised-learning, gensim
Nlp Journey
Documents, papers and codes related to Natural Language Processing, including Topic Model, Word Embedding, Named Entity Recognition, Text Classificatin, Text Generation, Text Similarity, Machine Translation),etc. All codes are implemented intensorflow 2.0.
Stars: ✭ 1,290 (+1094.44%)
Mutual labels:  gensim
Vizuka
Explore high-dimensional datasets and how your algo handles specific regions.
Stars: ✭ 100 (-7.41%)
Mutual labels:  unsupervised-learning
Cesi
WWW 2018: CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information
Stars: ✭ 85 (-21.3%)
Mutual labels:  embeddings
Self Supervised Relational Reasoning
Official PyTorch implementation of the paper "Self-Supervised Relational Reasoning for Representation Learning", NeurIPS 2020 Spotlight.
Stars: ✭ 89 (-17.59%)
Mutual labels:  unsupervised-learning
Ddflow
DDFlow: Learning Optical Flow with Unlabeled Data Distillation
Stars: ✭ 101 (-6.48%)
Mutual labels:  unsupervised-learning
Pysad
Streaming Anomaly Detection Framework in Python (Outlier Detection for Streaming Data)
Stars: ✭ 87 (-19.44%)
Mutual labels:  unsupervised-learning
Grounder
Implementation of Grounding of Textual Phrases in Images by Reconstruction in Tensorflow
Stars: ✭ 83 (-23.15%)
Mutual labels:  unsupervised-learning
Pointglr
Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds (CVPR 2020)
Stars: ✭ 86 (-20.37%)
Mutual labels:  unsupervised-learning
Self Supervised Speech Recognition
speech to text with self-supervised learning based on wav2vec 2.0 framework
Stars: ✭ 106 (-1.85%)
Mutual labels:  unsupervised-learning
Scikit Fusion
scikit-fusion: Data fusion via collective latent factor models
Stars: ✭ 103 (-4.63%)
Mutual labels:  embeddings
Verse
Reference implementation of the paper VERSE: Versatile Graph Embeddings from Similarity Measures
Stars: ✭ 98 (-9.26%)
Mutual labels:  embeddings
Awesome Transfer Learning
Best transfer learning and domain adaptation resources (papers, tutorials, datasets, etc.)
Stars: ✭ 1,349 (+1149.07%)
Mutual labels:  unsupervised-learning

Diff2Vec

Arxiv codebeat badge repo sizebenedekrozemberczki

A graph embedding is a representation of graph vertices in a low-dimensional space, which approximately preserves properties such as distances between nodes. Vertex sequence-based embedding procedures use features extracted from linear sequences of nodes to create embeddings using a neural network. In this paper, we propose diffusion graphs as a method to rapidly generate vertex sequences for network embedding. Its computational efficiency is superior to previous methods due to simpler sequence generation, and it produces more accurate results. In experiments, we found that the performance relative to other methods improves with increasing edge density in the graph. In a community detection task, clustering nodes in the embedding space produces better results compared to other sequence-based embedding methods.

The model is now also available in the package Karate Club.

This repository provides a reference implementation for Diff2Vec as described in the paper:

Fast Sequence Based Embedding with Diffusion Graphs Benedek Rozemberczki and Rik Sarkar. International Conference on Complex Networks, 2018.

Citing

If you find Diff2Vec useful in your research, please consider citing the following paper:

>@inproceedings{rozemberczki2018fastsequence,  
  title={{Fast Sequence Based Embedding with Diffusion Graphs}},  
  author={Rozemberczki, Benedek and Sarkar, Rik},  
  booktitle={International Conference on Complex Networks},  
  year={2018},  
  pages={99--107}
 }

Requirements

The codebase is implemented in Python 3.5.2 | Anaconda 4.2.0 (64-bit).

tqdm              4.28.1
numpy             1.15.4
pandas            0.23.4
texttable         1.5.0
gensim            3.6.0
networkx          2.4
joblib            0.13.0
logging           0.4.9.6  

Datasets

The code takes an input graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. A sample graph for the `Facebook Restaurants` dataset is included in the `data/` directory.

Options

Learning of the embedding is handled by the `src/diffusion_2_vec.py` script which provides the following command line arguments.

Input and output options

  --input    STR     Path to the edge list csv.            Default is `data/restaurant_edges.csv`
  --output   STR     Path to the embedding features.       Default is `emb/restaurant.csv`

Model options

  --model                    STR      Embedding procedure.                      Default is `non-pooled`
  --dimensions               INT      Number of embedding dimensions.           Default is 128.
  --vertex-set-cardinality   INT      Number of nodes per diffusion tree.       Default is 80.
  --num-diffusions           INT      Number of diffusions per source node.     Default is 10.
  --window-size              INT      Context size for optimization.            Default is 10.
  --iter                     INT      Number of ASGD iterations.                Default is 1.
  --workers                  INT      Number of cores.                          Default is 4.
  --alpha                    FLOAT    Initial learning rate.                    Default is 0.025.

Examples

The following commands learns a graph embedding and writes it to disk. The first column in the embedding file is the node ID.

Creating an embedding of the default dataset with the default hyperparameter settings.
python src/diffusion_2_vec.py

Creating an embedding of an other dataset the Facebook Politicians.

python src/diffusion_2_vec.py --input data/politician_edges.csv --output output/politician.csv

Creating an embedding of the default dataset in 32 dimensions, 5 sequences per source node with maximal vertex set cardinality of 40.

python src/diffusion_2_vec.py --dimensions 32 --num-diffusions 5 --vertex-set-cardinality 40

License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].