All Projects β†’ benedekrozemberczki β†’ Splitter

benedekrozemberczki / Splitter

Licence: gpl-3.0
A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Splitter

Gemsec
The TensorFlow reference implementation of 'GEMSEC: Graph Embedding with Self Clustering' (ASONAM 2019).
Stars: ✭ 210 (+18.64%)
Mutual labels:  word2vec, gensim, clustering
Sense2vec
πŸ¦† Contextually-keyed word vectors
Stars: ✭ 1,184 (+568.93%)
Mutual labels:  word2vec, gensim
Webvectors
Web-ify your word2vec: framework to serve distributional semantic models online
Stars: ✭ 154 (-12.99%)
Mutual labels:  word2vec, gensim
Text Summarizer
Python Framework for Extractive Text Summarization
Stars: ✭ 96 (-45.76%)
Mutual labels:  word2vec, clustering
Tadw
An implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).
Stars: ✭ 43 (-75.71%)
Mutual labels:  word2vec, gensim
Word2vec
θ¨“η·΄δΈ­ζ–‡θ©žε‘ι‡ Word2vec, Word2vec was created by a team of researchers led by Tomas Mikolov at Google.
Stars: ✭ 48 (-72.88%)
Mutual labels:  word2vec, gensim
Nlp Journey
Documents, papers and codes related to Natural Language Processing, including Topic Model, Word Embedding, Named Entity Recognition, Text Classificatin, Text Generation, Text Similarity, Machine Translation),etc. All codes are implemented intensorflow 2.0.
Stars: ✭ 1,290 (+628.81%)
Mutual labels:  word2vec, gensim
Nlp In Practice
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Stars: ✭ 790 (+346.33%)
Mutual labels:  word2vec, gensim
Ml Projects
ML based projects such as Spam Classification, Time Series Analysis, Text Classification using Random Forest, Deep Learning, Bayesian, Xgboost in Python
Stars: ✭ 127 (-28.25%)
Mutual labels:  word2vec, gensim
Role2vec
A scalable Gensim implementation of "Learning Role-based Graph Embeddings" (IJCAI 2018).
Stars: ✭ 134 (-24.29%)
Mutual labels:  word2vec, gensim
Danmf
A sparsity aware implementation of "Deep Autoencoder-like Nonnegative Matrix Factorization for Community Detection" (CIKM 2018).
Stars: ✭ 161 (-9.04%)
Mutual labels:  word2vec, clustering
Gensim
Topic Modelling for Humans
Stars: ✭ 12,763 (+7110.73%)
Mutual labels:  word2vec, gensim
Twitter sentiment analysis word2vec convnet
Twitter Sentiment Analysis with Gensim Word2Vec and Keras Convolutional Network
Stars: ✭ 24 (-86.44%)
Mutual labels:  word2vec, gensim
Text Analytics With Python
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.
Stars: ✭ 1,132 (+539.55%)
Mutual labels:  gensim, clustering
Bagofconcepts
Python implementation of bag-of-concepts
Stars: ✭ 18 (-89.83%)
Mutual labels:  word2vec, clustering
Musae
The reference implementation of "Multi-scale Attributed Node Embedding".
Stars: ✭ 75 (-57.63%)
Mutual labels:  word2vec, gensim
Wordembeddings Elmo Fasttext Word2vec
Using pre trained word embeddings (Fasttext, Word2Vec)
Stars: ✭ 146 (-17.51%)
Mutual labels:  word2vec, gensim
Lmdb Embeddings
Fast word vectors with little memory usage in Python
Stars: ✭ 404 (+128.25%)
Mutual labels:  word2vec, gensim
Word2vec Tutorial
δΈ­ζ–‡θ©žε‘ι‡θ¨“η·΄ζ•™ε­Έ
Stars: ✭ 426 (+140.68%)
Mutual labels:  word2vec, gensim
Magnitude
A fast, efficient universal vector embedding utility package.
Stars: ✭ 1,394 (+687.57%)
Mutual labels:  word2vec, gensim

Splitter Arxiv repo sizeβ €benedekrozemberczkiβ €

A PyTorch implementation of Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019).

Abstract

Recent interest in graph embedding methods has focused on learning a single representation for each node in the graph. But can nodes really be best described by a single vector representation? In this work, we propose a method for learning multiple representations of the nodes in a graph (e.g., the users of a social network). Based on a principled decomposition of the ego-network, each representation encodes the role of the node in a different local community in which the nodes participate. These representations allow for improved reconstruction of the nuanced relationships that occur in the graph a phenomenon that we illustrate through state-of-the-art results on link prediction tasks on a variety of graphs, reducing the error by up to 90%. In addition, we show that these embeddings allow for effective visual analysis of the learned community structure.

This repository provides a PyTorch implementation of Splitter as described in the paper:

Splitter: Learning Node Representations that Capture Multiple Social Contexts. Alessandro Epasto and Bryan Perozzi. WWW, 2019. [Paper]

The original Tensorflow implementation is available [here].

Requirements

The codebase is implemented in Python 3.5.2. package versions used for development are just below.

networkx          1.11
tqdm              4.28.1
numpy             1.15.4
pandas            0.23.4
texttable         1.5.0
scipy             1.1.0
argparse          1.1.0
torch             1.1.0
gensim            3.6.0

Datasets

The code takes the **edge list** of the graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. A sample graph for `Cora` is included in the `input/` directory.

Outputs

The embeddings are saved in the `input/` directory. Each embedding has a header and a column with the node IDs. Finally, the node embedding is sorted by the node ID column.

Options

The training of a Splitter embedding is handled by the `src/main.py` script which provides the following command line arguments.

Input and output options

  --edge-path               STR    Edge list csv.           Default is `input/chameleon_edges.csv`.
  --embedding-output-path   STR    Embedding output csv.    Default is `output/chameleon_embedding.csv`.
  --persona-output-path     STR    Persona mapping JSON.    Default is `output/chameleon_personas.json`.

Model options

  --seed               INT     Random seed.                       Default is 42.
  --number of walks    INT     Number of random walks per node.   Default is 10.
  --window-size        INT     Skip-gram window size.             Default is 5.
  --negative-samples   INT     Number of negative samples.        Default is 5.
  --walk-length        INT     Random walk length.                Default is 40.
  --lambd              FLOAT   Regularization parameter.          Default is 0.1
  --dimensions         INT     Number of embedding dimensions.    Default is 128.
  --workers            INT     Number of cores for pre-training.  Default is 4.   
  --learning-rate      FLOAT   SGD learning rate.                 Default is 0.025

Examples

The following commands learn an embedding and save it with the persona map. Training a model on the default dataset.

python src/main.py

Training a Splitter model with 32 dimensions.

python src/main.py --dimensions 32

Increasing the number of walks and the walk length.

python src/main.py --number-of-walks 20 --walk-length 80

License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].