Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → benedekrozemberczki → Musae

benedekrozemberczki / Musae

Licence: gpl-3.0

The reference implementation of "Multi-scale Attributed Node Embedding".

Programming Languages

139335 projects - #7 most used programming language

Labels

deep-learning word2vec network-analysis gensim

Projects that are alternatives of or similar to Musae

🦆 Contextually-keyed word vectors

Stars: ✭ 1,184 (+1478.67%)

Mutual labels: word2vec, gensim

Product-Categorization-NLP

Multi-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).

Stars: ✭ 30 (-60%)

Mutual labels: word2vec, gensim

A lightweight implementation of Walklets from "Don't Walk Skip! Online Learning of Multi-scale Network Embeddings" (ASONAM 2017).

Stars: ✭ 94 (+25.33%)

Mutual labels: word2vec, gensim

Word2VecAndTsne

Scripts demo-ing how to train a Word2Vec model and reduce its vector space

Stars: ✭ 45 (-40%)

Mutual labels: word2vec, gensim

Nlp In Practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Stars: ✭ 790 (+953.33%)

Mutual labels: word2vec, gensim

document embedding and machine learning script for beginners

Stars: ✭ 92 (+22.67%)

Mutual labels: word2vec, gensim

Implementação e modelo gerado com o treinamento (trigram) da wikipedia em pt-br

Stars: ✭ 34 (-54.67%)

Mutual labels: word2vec, gensim

An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.

Stars: ✭ 196 (+161.33%)

Mutual labels: word2vec, gensim

Word2vec Tutorial

中文詞向量訓練教學

Stars: ✭ 426 (+468%)

Mutual labels: word2vec, gensim

Lmdb Embeddings

Fast word vectors with little memory usage in Python

Stars: ✭ 404 (+438.67%)

Mutual labels: word2vec, gensim

word-embeddings-from-scratch

Creating word embeddings from scratch and visualize them on TensorBoard. Using trained embeddings in Keras.

Stars: ✭ 22 (-70.67%)

Mutual labels: word2vec, gensim

An implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).

Stars: ✭ 43 (-42.67%)

Mutual labels: word2vec, gensim

AraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models.

Stars: ✭ 239 (+218.67%)

Mutual labels: word2vec, gensim

ProtVec can be used in protein interaction predictions, structure prediction, and protein data visualization.

Stars: ✭ 23 (-69.33%)

Mutual labels: word2vec, gensim

The TensorFlow reference implementation of 'GEMSEC: Graph Embedding with Self Clustering' (ASONAM 2019).

Stars: ✭ 210 (+180%)

Mutual labels: word2vec, gensim

An alternative implementation of Recursive Feature and Role Extraction (KDD11 & KDD12)

Stars: ✭ 52 (-30.67%)

Mutual labels: word2vec, gensim

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Stars: ✭ 177 (+136%)

Mutual labels: word2vec, gensim

Germanwordembeddings

Toolkit to obtain and preprocess german corpora, train models using word2vec (gensim) and evaluate them with generated testsets

Stars: ✭ 189 (+152%)

Mutual labels: word2vec, gensim

wordfish-python

extract relationships from standardized terms from corpus of interest with deep learning 🐟

Stars: ✭ 19 (-74.67%)

Mutual labels: word2vec, gensim

Twitter sentiment analysis word2vec convnet

Twitter Sentiment Analysis with Gensim Word2Vec and Keras Convolutional Network

Stars: ✭ 24 (-68%)

Mutual labels: word2vec, gensim

View All Similar Projects ➔

MUSAE

⠀⠀

The reference implementation of Multi-Scale Attributed Node Embedding.

Abstract

We present network embedding algorithms that capture information about a node from the local distribution over node attributes around it, as observed over random walks following an approach similar to Skip-gram. Observations from neighborhoods of different sizes are either pooled (AE) or encoded distinctly in a multi-scale approach (MUSAE). Capturing attribute-neighborhood relationships over multiple scales is useful for a diverse range of applications, including latent feature identification across disconnected networks with similar attributes. We prove theoretically that matrices of node-feature pointwise mutual information are implicitly factorized by the embeddings. Experiments show that our algorithms are robust, computationally efficient and outperform comparable models on social, web and citation network datasets.

The second-order random walks sampling methods were taken from the reference implementation of Node2Vec.

The datasets are also available on SNAP.

The model is now also available in the package Karate Club.

This repository provides the reference implementations for MUSAE and AE as described in the paper:

Multi-scale Attributed Node Embedding. Benedek Rozemberczki, Carl Allen, and Rik Sarkar. arXiv, 2019. https://arxiv.org/abs/1909.13021

Table of Contents

Citing
Requirements
Datasets
Logging
Options
Examples

Citing

If you find MUSAE useful in your research, please consider citing the following paper:

>@misc{rozemberczki2019multiscale,    
       title = {{Multi-scale Attributed Node Embedding}},   
       author = {Benedek Rozemberczki and Carl Allen and Rik Sarkar},   
       year = {2019},   
       eprint = {1909.13021},  
       archivePrefix = {arXiv},  
       primaryClass = {cs.LG}   
       }

Requirements

The codebase is implemented in Python 3.5.2. package versions used for development are just below.

networkx          2.4
tqdm              4.28.1
numpy             1.15.4
pandas            0.23.4
texttable         1.5.0
scipy             1.1.0
argparse          1.1.0
gensim            3.6.0

Datasets

Logging

The models are defined in a way that parameter settings and runtimes are logged. Specifically we log the followings:

1. Hyperparameter settings.     We save each hyperparameter used in the experiment.
2. Optimization runtime.        We measure the time needed for optimization - measured by seconds.
3. Sampling runtime.            We measure the time needed for sampling - measured by seconds.

Options

Learning the embedding is handled by the src/main.py script which provides the following command line arguments.

Input and output options

  --graph-input      STR   Input edge list csv.     Default is `input/edges/chameleon_edges.csv`.
  --features-input   STR   Input features json.     Default is `input/features/chameleon_features.json`.
  --output           STR   Embedding output path.   Default is `output/chameleon_embedding.csv`.
  --log              STR   Log output path.         Default is `logs/chameleon.json`.

Random walk options

  --sampling      STR       Random walker order (first/second).              Default is `first`.
  --P             FLOAT     Return hyperparameter for second-order walk.     Default is 1.0
  --Q             FLOAT     In-out hyperparameter for second-order walk.     Default is 1.0.
  --walk-number   INT       Walks per source node.                           Default is 5.
  --walk-length   INT       Truncated random walk length.                    Default is 80.

Model options

  --model                 STR        Pooled or multi-scale model (AE/MUSAE).      Default is `musae`.
  --base-model            STR        Use of Doc2Vec base model.                   Default is `null`.
  --approximation-order   INT        Matrix powers approximated.                  Default is 3.
  --dimensions            INT        Number of dimensions.                        Default is 32.
  --down-sampling         FLOAT      Length of random walk per source.            Default is 0.001.
  --exponent              FLOAT      Downsampling exponent of frequency.          Default is 0.75.
  --alpha                 FLOAT      Initial learning rate.                       Default is 0.05.
  --min-alpha             FLOAT      Final learning rate.                         Default is 0.025.
  --min-count             INT        Minimal occurence of features.               Default is 1.
  --negative-samples      INT        Number of negative samples per node.         Default is 5.
  --workers               INT        Number of cores used for optimization.       Default is 4.
  --epochs                INT        Gradient descent epochs.                     Default is 5.

Examples

Training a MUSAE model for a 10 epochs.

$ python src/main.py --epochs 10

Changing the dimension size.

$ python src/main.py --dimensions 32

License

GNU

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 75

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗