benedekrozemberczki / Asne

Licence: gpl-3.0
A sparsity aware and memory efficient implementation of "Attributed Social Network Embedding" (TKDE 2018).

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Asne

autoencoders tensorflow
Automatic feature engineering using deep learning and Bayesian inference using TensorFlow.
Stars: ✭ 66 (-9.59%)
Mutual labels:  feature-extraction, representation-learning
Awesome Feature Engineering
A curated list of resources dedicated to Feature Engineering Techniques for Machine Learning
Stars: ✭ 433 (+493.15%)
Mutual labels:  data-science, feature-extraction
gan tensorflow
Automatic feature engineering using Generative Adversarial Networks using TensorFlow.
Stars: ✭ 48 (-34.25%)
Mutual labels:  feature-extraction, representation-learning
Tsfel
An intuitive library to extract features from time series
Stars: ✭ 202 (+176.71%)
Mutual labels:  data-science, feature-extraction
My Journey In The Data Science World
📢 Ready to learn or review your knowledge!
Stars: ✭ 1,175 (+1509.59%)
Mutual labels:  data-science, feature-extraction
Amazing Feature Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
Stars: ✭ 218 (+198.63%)
Mutual labels:  data-science, feature-extraction
Deltapy
DeltaPy - Tabular Data Augmentation (by @firmai)
Stars: ✭ 344 (+371.23%)
Mutual labels:  data-science, feature-extraction
Codesearchnet
Datasets, tools, and benchmarks for representation learning of code.
Stars: ✭ 1,378 (+1787.67%)
Mutual labels:  data-science, representation-learning
Bagofconcepts
Python implementation of bag-of-concepts
Stars: ✭ 18 (-75.34%)
Mutual labels:  word2vec, representation-learning
Tsfresh
Automatic extraction of relevant features from time series:
Stars: ✭ 6,077 (+8224.66%)
Mutual labels:  data-science, feature-extraction
Danmf
A sparsity aware implementation of "Deep Autoencoder-like Nonnegative Matrix Factorization for Community Detection" (CIKM 2018).
Stars: ✭ 161 (+120.55%)
Mutual labels:  data-science, word2vec
Tadw
An implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).
Stars: ✭ 43 (-41.1%)
Mutual labels:  data-science, word2vec
Gensim
Topic Modelling for Humans
Stars: ✭ 12,763 (+17383.56%)
Mutual labels:  data-science, word2vec
Deep Learning Machine Learning Stock
Stock for Deep Learning and Machine Learning
Stars: ✭ 240 (+228.77%)
Mutual labels:  data-science, feature-extraction
Color recognition
🎨 Color recognition & classification & detection on webcam stream / on video / on single image using K-Nearest Neighbors (KNN) is trained with color histogram features by OpenCV.
Stars: ✭ 154 (+110.96%)
Mutual labels:  data-science, feature-extraction
NMFADMM
A sparsity aware implementation of "Alternating Direction Method of Multipliers for Non-Negative Matrix Factorization with the Beta-Divergence" (ICASSP 2014).
Stars: ✭ 39 (-46.58%)
Mutual labels:  word2vec, feature-extraction
Blurr
Data transformations for the ML era
Stars: ✭ 96 (+31.51%)
Mutual labels:  data-science, feature-extraction
Nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Stars: ✭ 10,698 (+14554.79%)
Mutual labels:  data-science, feature-extraction
Feature Selection
Features selector based on the self selected-algorithm, loss function and validation method
Stars: ✭ 534 (+631.51%)
Mutual labels:  data-science, feature-extraction
Word2vec Win32
A word2vec port for Windows.
Stars: ✭ 41 (-43.84%)
Mutual labels:  word2vec, representation-learning

ASNE

Arxiv codebeat badge repo sizebenedekrozemberczki

An implementation of "Attributed Social Network Embedding". ASNE is a graph embedding algorithm which learns an embedding of nodes and fuses the node representations with node attributes. The procedure places nodes in an abstract feature space where information aboutfrist order proximity is preserved and attributes of a node are also part of the representation. ASNE learns the joint feature-proximal representations using a probabilistic factorization model. In our implementation we assumed that the proximity matrix used in the approximation is sparse, hence the solution runtime can be linear in the number of edges. The model assumes that the node-feature matrix is sparse. Compared to other implementations this specific version has several advantages. Specifically:

  1. Stores the feature matrix as a sparse dictionary.
  2. Uses sparse matrix multiplication to speed up computations.

This repository provides an implementation for ASNE as described in the paper:

Attributed Social Network Embedding. Lizi Liao, Xiangnan He, Hanwang Zhang, Tat-Seng Chua IEEE Transactions on Knowledge and Data Engineering, 2018. https://arxiv.org/abs/1705.04969

A dense TensorFlow implementation is available [here].

Requirements

The codebase is implemented in Python 3.5.2 | Anaconda 4.2.0 (64-bit). Package versions used for development are just below.

networkx          2.4
tensorflow-gpu    1.12.0
tqdm              4.19.5
numpy             1.15.4
pandas            0.23.4
texttable         1.5.0
scipy             1.1.0
argparse          1.1.0

Datasets

The code takes an input graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. Sample graphs for the `Wikipedia Chameleons` and `Wikipedia Giraffes` are included in the `input/` directory.

The feature matrix is a **sparse binary** one it is stored as a json. Nodes are keys of the json and features are the values. For each node feature column ids are stored as elements of a list. The feature matrix is structured as:

{ 0: [0, 1, 38, 1968, 2000, 52727],
  1: [10000, 20, 3],
  2: [],
  ...
  n: [2018, 10000]}

Options

Learning of the embedding is handled by the `asne_src/main.py` script which provides the following command line arguments.

Input and output options

  --edge-path     STR       Input graph path.           Default is `input/edges/chameleon_edges.csv`.
  --features-path STR       Input Features path.        Default is `input/features/chameleon_features.json`.
  --output-path   STR       Embedding path.             Default is `output/chameleon_asne.csv`.

Model options

  --node-embedding-dimensions      INT        Number of node embeding dimensions.           Default is 16.
  --feature-embedding-dimensions   INT        Number of feature embeding dimensions.        Default is 16.
  --batch_size                     INT        Batch size for gradient descent.              Default is 64.
  --epochs                         INT        Number of training epochs.                    Default is 10.
  --alpha                          FLOAT      Matrix mixing parameter for embedding.        Default is 1.0.
  --negative_samples               INT        Number of negative samples.                   Default is 10.

Examples

The following commands learn a graph embedding and write the embedding to disk. The node representations are ordered by the ID.

Creating an ASNE embedding of the default dataset with the default hyperparameter settings. Saving the embedding at the default path.

python asne_src/main.py

Creating an ASNE embedding of the default dataset with 2x128 dimensions.

python asne_src/main.py --node-embedding-dimensions 128  --feature-embedding-dimensions 128

Creating an ASNE embedding of the default dataset with asymmetric mixing.

python asne_src/main.py --batch_size 512

Creating an embedding of another structured dataset the Wikipedia Giraffes. Saving the output in a custom folder.

python asne_src/main.py --edge-path input/edges/giraffe_edges.csv --features-path input/features/giraffe_features.json --output-path output/giraffe_asne.csv

License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].