Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → benedekrozemberczki → Tene

benedekrozemberczki / Tene

Licence: gpl-3.0

A sparsity aware implementation of "Enhanced Network Embedding with Text Information" (ICPR 2018).

Programming Languages

python

139335 projects - #7 most used programming language

Labels

deep-learning machine-learning pytorch tensorflow feature-extraction matrix-factorization

Projects that are alternatives of or similar to Tene

NMFADMM

A sparsity aware implementation of "Alternating Direction Method of Multipliers for Non-Negative Matrix Factorization with the Beta-Divergence" (ICASSP 2014).

Stars: ✭ 39 (-43.48%)

Mutual labels: matrix-factorization, feature-extraction

Seg Mentor

TFslim based semantic segmentation models, modular&extensible boutique design

Stars: ✭ 43 (-37.68%)

Mutual labels: feature-extraction

Suitesparse

SuiteSparse: a suite of sparse matrix packages by T. A. Davis et al. (This repository contains copies of the official versions.)

Stars: ✭ 19 (-72.46%)

Mutual labels: matrix-factorization

Orange3 Recommendation

🍊 👎 Add-on for Orange3 to support recommender systems.

Stars: ✭ 21 (-69.57%)

Mutual labels: matrix-factorization

Neanderthal

Fast Clojure Matrix Library

Stars: ✭ 927 (+1243.48%)

Mutual labels: matrix-factorization

Protr

Comprehensive toolkit for generating various numerical features of protein sequences

Stars: ✭ 30 (-56.52%)

Mutual labels: feature-extraction

Tfidf

Simple TF IDF Library

Stars: ✭ 6 (-91.3%)

Mutual labels: feature-extraction

Php Ml

PHP-ML - Machine Learning library for PHP

Stars: ✭ 7,900 (+11349.28%)

Mutual labels: feature-extraction

Tadw

An implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).

Stars: ✭ 43 (-37.68%)

Mutual labels: matrix-factorization

Mrsr

MRSR - Matlab Recommender Systems Research is a software framework for evaluating collaborative filtering recommender systems in Matlab.

Stars: ✭ 13 (-81.16%)

Mutual labels: matrix-factorization

Tuna

🐟 A streaming ETL for fish

Stars: ✭ 11 (-84.06%)

Mutual labels: feature-extraction

Robust Nmf

Python PyTorch (GPU) and NumPy (CPU)-based port of Févotte and Dobigeon's robust-NMF algorithm appearing in "Nonlinear hyperspectral unmixing with robust nonnegative matrix factorization."

Stars: ✭ 25 (-63.77%)

Mutual labels: matrix-factorization

Graphrole

Automatic feature extraction and node role assignment for transfer learning on graphs (ReFeX & RolX)

Stars: ✭ 38 (-44.93%)

Mutual labels: feature-extraction

Fastfm

fastFM: A Library for Factorization Machines

Stars: ✭ 908 (+1215.94%)

Mutual labels: matrix-factorization

Recoder

Large scale training of factorization models for Collaborative Filtering with PyTorch

Stars: ✭ 46 (-33.33%)

Mutual labels: matrix-factorization

Speechpy

💬 SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/

Stars: ✭ 833 (+1107.25%)

Mutual labels: feature-extraction

Cbir System

Content-Based Image Retrieval system (KTH DD2476 Project)

Stars: ✭ 9 (-86.96%)

Mutual labels: feature-extraction

Deeprec

An Open-source Toolkit for Deep Learning based Recommendation with Tensorflow.

Stars: ✭ 954 (+1282.61%)

Mutual labels: matrix-factorization

Edge extraction

Fast and robust algorithm to extract edges in unorganized point clouds

Stars: ✭ 68 (-1.45%)

Mutual labels: feature-extraction

Elliot

Comprehensive and Rigorous Framework for Reproducible Recommender Systems Evaluation

Stars: ✭ 49 (-28.99%)

Mutual labels: matrix-factorization

View All Similar Projects ➔

TENE

⠀

Abstract

A sparsity aware implementation of **Enhanced Network Embedding with Text Information**. Network embedding aims at learning the low dimensional and continuous vector representation for each node in networks, which is useful in many real applications. While most existing network embedding methods only focus on the network structure, the rich text information associated with nodes, which is often closely related to network structure, is widely neglected. Thus, how to effectively incorporate text information into network embedding is a problem worth studying. To solve the problem, we propose a Text Enhanced Network Embedding (TENE) method under the framework of non-negative matrix factorization to integrate network structure and text information together. We explore the consistent relationship between node representations and text cluster structure to make the network embedding more informative and discriminative. TENE learns the representations of nodes under the guidance of both proximity matrix which captures the network structure and text cluster membership matrix derived from clustering for text information. We evaluate the quality of network embedding on the task of multi-class classification of nodes. Experimental results on all three real-world datasets show the superior performance of TENE compared with baselines.

The model is now also available in the package Karate Club.

This repository provides an implementation for TENE as described in the paper:

Enhanced Network Embedding with Text Information. Shuang Yang, Bo Yang ICPR, 2018. https://ieeexplore.ieee.org/abstract/document/8545577

Requirements

The codebase is implemented in Python 3.5.2. package versions used for development are just below.

networkx          2.4
tqdm              4.28.1
numpy             1.15.4
pandas            0.23.4
texttable         1.5.0
scipy             1.1.0
argparse          1.1.0

Datasets

The code takes an input graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. Sample graphs for the `Wikipedia Chameleons` and `Wikipedia Giraffes` are included in the `input/` directory.

The feature matrix can be stored two ways:

If the feature matrix is a **sparse binary** one it is stored as a json. Nodes are keys of the json and features are the values. For each node feature column ids are stored as elements of a list. The feature matrix is structured as:

{ 0: [0, 1, 38, 1968, 2000, 52727],
  1: [10000, 20, 3],
  2: [],
  ...
  n: [2018, 10000]}

If the feature matrix is **dense** it is assumed that it is stored as csv with comma separators. It has a header, the first column contains node identifiers and it is sorted by these identifers. It should look like this:

NODE ID	Feature 1	Feature 2	Feature 3	Feature 4
0	3	0	1.37	1
1	1	1	2.54	-11
2	2	0	1.08	-12
3	1	1	1.22	-4
...	...	...	...	...
n	5	0	2.47	21

Options

Learning of the embedding is handled by the src/main.py script which provides the following command line arguments.

Input and output options

  --edge-path    STR        Input graph path.           Default is `input/chameleon_edges.csv`.
  --feature-path STR        Input Features path.        Default is `input/chameleon_features.json`.
  --output-path  STR        Embedding path.             Default is `output/chameleon_tene.csv`.

Model options

  --features       STR         Structure of the feature matrix.                   Default is `sparse`. 
  --dimensions     INT         Number of embeding dimensions.                     Default is 32.
  --order          INT         Order of adjacency matrix powers.                  Default is 3.
  --iterations     INT         Number of power interations.                       Default is 500.
  --alpha          FLOAT       Alignment parameter for feature matrix.            Default is 1.0.
  --beta           FLOAT       Alignment parameter for feature-node embeddings.   Default is 1.0.
  --lower-control  FLOAT       Overflow control parameter.                        Default is 10**-15.

Examples

The following commands learn a graph embedding and write the embedding to disk. The node representations are ordered by the ID.

Creating a TENE embedding of the default dataset with the default hyperparameter settings. Saving the embedding at the default path.

$ python src/main.py

Creating a TENE embedding of the default dataset with 128 dimensions and approximation order 1.

$ python src/main.py --dimensions 128 --order 1

Creating an embedding of an other dense structured dataset the Wikipedia Giraffes. Saving the output in a custom folder.

$ python src/main.py --edge-path input/giraffe_edges.csv --feature-path input/giraffe_features.csv --output-path output/giraffe_tene.csv --features dense

License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 69

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗