Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → benedekrozemberczki → Diff2vec

benedekrozemberczki / Diff2vec

Licence: gpl-3.0

Reference implementation of Diffusion2Vec (Complenet 2018) built on Gensim and NetworkX.

Programming Languages

python

139335 projects - #7 most used programming language

Labels

deep-learning machine-learning tensorflow neural-network unsupervised-learning embeddings gensim

Projects that are alternatives of or similar to Diff2vec

word-embeddings-from-scratch

Creating word embeddings from scratch and visualize them on TensorBoard. Using trained embeddings in Keras.

Stars: ✭ 22 (-79.63%)

Mutual labels: embeddings, gensim

Magnitude

A fast, efficient universal vector embedding utility package.

Stars: ✭ 1,394 (+1190.74%)

Mutual labels: embeddings, gensim

Gemsec

The TensorFlow reference implementation of 'GEMSEC: Graph Embedding with Self Clustering' (ASONAM 2019).

Stars: ✭ 210 (+94.44%)

Mutual labels: unsupervised-learning, gensim

RolX

An alternative implementation of Recursive Feature and Role Extraction (KDD11 & KDD12)

Stars: ✭ 52 (-51.85%)

Mutual labels: gensim, unsupervised-learning

Fast sentence embeddings

Compute Sentence Embeddings Fast!

Stars: ✭ 384 (+255.56%)

Mutual labels: embeddings, gensim

Tadw

An implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).

Stars: ✭ 43 (-60.19%)

Mutual labels: unsupervised-learning, gensim

Lmdb Embeddings

Fast word vectors with little memory usage in Python

Stars: ✭ 404 (+274.07%)

Mutual labels: embeddings, gensim

Sine

A PyTorch Implementation of "SINE: Scalable Incomplete Network Embedding" (ICDM 2018).

Stars: ✭ 67 (-37.96%)

Mutual labels: unsupervised-learning, gensim

Nlp Journey

Documents, papers and codes related to Natural Language Processing, including Topic Model, Word Embedding, Named Entity Recognition, Text Classificatin, Text Generation, Text Similarity, Machine Translation)，etc. All codes are implemented intensorflow 2.0.

Stars: ✭ 1,290 (+1094.44%)

Mutual labels: gensim

Vizuka

Explore high-dimensional datasets and how your algo handles specific regions.

Stars: ✭ 100 (-7.41%)

Mutual labels: unsupervised-learning

Cesi

WWW 2018: CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information

Stars: ✭ 85 (-21.3%)

Mutual labels: embeddings

Self Supervised Relational Reasoning

Official PyTorch implementation of the paper "Self-Supervised Relational Reasoning for Representation Learning", NeurIPS 2020 Spotlight.

Stars: ✭ 89 (-17.59%)

Mutual labels: unsupervised-learning

Ddflow

DDFlow: Learning Optical Flow with Unlabeled Data Distillation

Stars: ✭ 101 (-6.48%)

Mutual labels: unsupervised-learning

Pysad

Streaming Anomaly Detection Framework in Python (Outlier Detection for Streaming Data)

Stars: ✭ 87 (-19.44%)

Mutual labels: unsupervised-learning

Grounder

Implementation of Grounding of Textual Phrases in Images by Reconstruction in Tensorflow

Stars: ✭ 83 (-23.15%)

Mutual labels: unsupervised-learning

Pointglr

Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds (CVPR 2020)

Stars: ✭ 86 (-20.37%)

Mutual labels: unsupervised-learning

Self Supervised Speech Recognition

speech to text with self-supervised learning based on wav2vec 2.0 framework

Stars: ✭ 106 (-1.85%)

Mutual labels: unsupervised-learning

Scikit Fusion

scikit-fusion: Data fusion via collective latent factor models

Stars: ✭ 103 (-4.63%)

Mutual labels: embeddings

Verse

Reference implementation of the paper VERSE: Versatile Graph Embeddings from Similarity Measures

Stars: ✭ 98 (-9.26%)

Mutual labels: embeddings

Awesome Transfer Learning

Best transfer learning and domain adaptation resources (papers, tutorials, datasets, etc.)

Stars: ✭ 1,349 (+1149.07%)

Mutual labels: unsupervised-learning

View All Similar Projects ➔

Diff2Vec

⠀⠀

A graph embedding is a representation of graph vertices in a low-dimensional space, which approximately preserves properties such as distances between nodes. Vertex sequence-based embedding procedures use features extracted from linear sequences of nodes to create embeddings using a neural network. In this paper, we propose diffusion graphs as a method to rapidly generate vertex sequences for network embedding. Its computational efficiency is superior to previous methods due to simpler sequence generation, and it produces more accurate results. In experiments, we found that the performance relative to other methods improves with increasing edge density in the graph. In a community detection task, clustering nodes in the embedding space produces better results compared to other sequence-based embedding methods.

The model is now also available in the package Karate Club.

This repository provides a reference implementation for Diff2Vec as described in the paper:

Fast Sequence Based Embedding with Diffusion Graphs Benedek Rozemberczki and Rik Sarkar. International Conference on Complex Networks, 2018.

Citing

If you find Diff2Vec useful in your research, please consider citing the following paper:

>@inproceedings{rozemberczki2018fastsequence,  
  title={{Fast Sequence Based Embedding with Diffusion Graphs}},  
  author={Rozemberczki, Benedek and Sarkar, Rik},  
  booktitle={International Conference on Complex Networks},  
  year={2018},  
  pages={99--107}
 }

Requirements

The codebase is implemented in Python 3.5.2 | Anaconda 4.2.0 (64-bit).

tqdm              4.28.1
numpy             1.15.4
pandas            0.23.4
texttable         1.5.0
gensim            3.6.0
networkx          2.4
joblib            0.13.0
logging           0.4.9.6

Datasets

The code takes an input graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. A sample graph for the `Facebook Restaurants` dataset is included in the `data/` directory.

Options

Learning of the embedding is handled by the `src/diffusion_2_vec.py` script which provides the following command line arguments.

Input and output options

  --input    STR     Path to the edge list csv.            Default is `data/restaurant_edges.csv`
  --output   STR     Path to the embedding features.       Default is `emb/restaurant.csv`

Model options

  --model                    STR      Embedding procedure.                      Default is `non-pooled`
  --dimensions               INT      Number of embedding dimensions.           Default is 128.
  --vertex-set-cardinality   INT      Number of nodes per diffusion tree.       Default is 80.
  --num-diffusions           INT      Number of diffusions per source node.     Default is 10.
  --window-size              INT      Context size for optimization.            Default is 10.
  --iter                     INT      Number of ASGD iterations.                Default is 1.
  --workers                  INT      Number of cores.                          Default is 4.
  --alpha                    FLOAT    Initial learning rate.                    Default is 0.025.

Examples

The following commands learns a graph embedding and writes it to disk. The first column in the embedding file is the node ID.

Creating an embedding of the default dataset with the default hyperparameter settings.

python src/diffusion_2_vec.py

Creating an embedding of an other dataset the Facebook Politicians.

python src/diffusion_2_vec.py --input data/politician_edges.csv --output output/politician.csv

Creating an embedding of the default dataset in 32 dimensions, 5 sequences per source node with maximal vertex set cardinality of 40.

python src/diffusion_2_vec.py --dimensions 32 --num-diffusions 5 --vertex-set-cardinality 40

License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 108

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗