benedekrozemberczki / M-NMF

Licence: GPL-3.0 license
An implementation of "Community Preserving Network Embedding" (AAAI 2017)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to M-NMF

RolX
An alternative implementation of Recursive Feature and Role Extraction (KDD11 & KDD12)
Stars: ✭ 52 (-56.3%)
Mutual labels:  deepwalk, matrix-factorization, factorization, unsupervised-learning, node2vec, struc2vec, implicit-factorization, semisupervised-learning
Awesome Community Detection
A curated list of community detection research papers with implementations.
Stars: ✭ 1,874 (+1474.79%)
Mutual labels:  clustering, community-detection, deepwalk, matrix-factorization, factorization, unsupervised-learning, node2vec
NMFADMM
A sparsity aware implementation of "Alternating Direction Method of Multipliers for Non-Negative Matrix Factorization with the Beta-Divergence" (ICASSP 2014).
Stars: ✭ 39 (-67.23%)
Mutual labels:  deepwalk, matrix-factorization, factorization, unsupervised-learning, nmf, node2vec
Self Supervised Learning Overview
📜 Self-Supervised Learning from Images: Up-to-date reading list.
Stars: ✭ 73 (-38.66%)
Mutual labels:  clustering, representation-learning, unsupervised-learning
LabelPropagation
A NetworkX implementation of Label Propagation from a "Near Linear Time Algorithm to Detect Community Structures in Large-Scale Networks" (Physical Review E 2008).
Stars: ✭ 101 (-15.13%)
Mutual labels:  clustering, community-detection, unsupervised-learning
Bagofconcepts
Python implementation of bag-of-concepts
Stars: ✭ 18 (-84.87%)
Mutual labels:  clustering, representation-learning, unsupervised-learning
FEATHER
The reference implementation of FEATHER from the CIKM '20 paper "Characteristic Functions on Graphs: Birds of a Feather, from Statistical Descriptors to Parametric Models".
Stars: ✭ 34 (-71.43%)
Mutual labels:  deepwalk, representation-learning, node2vec
Gemsec
The TensorFlow reference implementation of 'GEMSEC: Graph Embedding with Self Clustering' (ASONAM 2019).
Stars: ✭ 210 (+76.47%)
Mutual labels:  clustering, matrix-factorization, unsupervised-learning
FSCNMF
An implementation of "Fusing Structure and Content via Non-negative Matrix Factorization for Embedding Information Networks".
Stars: ✭ 16 (-86.55%)
Mutual labels:  deepwalk, nmf, node2vec
RcppML
Rcpp Machine Learning: Fast robust NMF, divisive clustering, and more
Stars: ✭ 52 (-56.3%)
Mutual labels:  clustering, matrix-factorization, nmf
Revisiting-Contrastive-SSL
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [NeurIPS 2021]
Stars: ✭ 81 (-31.93%)
Mutual labels:  clustering, representation-learning, unsupervised-learning
Graphembedding
Implementation and experiments of graph embedding algorithms.
Stars: ✭ 2,461 (+1968.07%)
Mutual labels:  deepwalk, node2vec, struc2vec
Unsupervised Classification
SCAN: Learning to Classify Images without Labels (ECCV 2020), incl. SimCLR.
Stars: ✭ 605 (+408.4%)
Mutual labels:  clustering, representation-learning, unsupervised-learning
Compress
Compressing Representations for Self-Supervised Learning
Stars: ✭ 43 (-63.87%)
Mutual labels:  clustering, representation-learning
Text Summarizer
Python Framework for Extractive Text Summarization
Stars: ✭ 96 (-19.33%)
Mutual labels:  clustering, unsupervised-learning
Danmf
A sparsity aware implementation of "Deep Autoencoder-like Nonnegative Matrix Factorization for Community Detection" (CIKM 2018).
Stars: ✭ 161 (+35.29%)
Mutual labels:  clustering, unsupervised-learning
Spectralcluster
Python re-implementation of the spectral clustering algorithm in the paper "Speaker Diarization with LSTM"
Stars: ✭ 220 (+84.87%)
Mutual labels:  clustering, unsupervised-learning
Minisom
🔴 MiniSom is a minimalistic implementation of the Self Organizing Maps
Stars: ✭ 801 (+573.11%)
Mutual labels:  clustering, unsupervised-learning
Keras deep clustering
How to do Unsupervised Clustering with Keras
Stars: ✭ 202 (+69.75%)
Mutual labels:  clustering, unsupervised-learning
DESOM
🌐 Deep Embedded Self-Organizing Map: Joint Representation Learning and Self-Organization
Stars: ✭ 76 (-36.13%)
Mutual labels:  clustering, representation-learning

M-NMF

codebeat badge repo sizebenedekrozemberczki

Abstract

Network embedding, aiming to learn the low-dimensional representations of nodes in networks, is of paramount importance in many real applications. One basic requirement of network embedding is to preserve the structure and inherent properties of the networks. While previous network embedding methods primarily preserve the microscopic structure, such as the first- and second-order proximities of nodes, the mesoscopic community structure, which is one of the most prominent feature of networks, is largely ignored. In this paper, we propose a novel Modularized Nonnegative Matrix Factorization (M-NMF) model to incorporate the community structure into network embedding. We exploit the consensus relationship between the representations of nodes and community structure, and then jointly optimize NMF based representation learning model and modularity based community detection model in a unified framework, which enables the learned representations of nodes to preserve both of the microscopic and community structures. We also provide efficient updating rules to infer the parameters of our model, together with the correctness and convergence guarantees. Extensive experimental results on a variety of real-world networks show the superior performance of the proposed method over the state-of-the-arts.

The model is now also available in the package Karate Club.

This repository provides a TensorFlow implementation for M-NMF as it is described in:

Community Preserving Network Embedding. Xiao Wang, Peng Cui, Jing Wang, Jain Pei, WenWu Zhu, Shiqiang Yang. Proceedings of the Thirsty-First AAAI conference on Artificial Intelligence (AAAI-17).

A reference MatLab implementation is available [here].

Requirements

The codebase is implemented in Python 3.5.2. Package versions used for development are just below.

networkx          2.4
tqdm              4.19.5
numpy             1.13.3
pandas            0.20.3
tensorflow-gpu    1.12.0
jsonschema        2.6.0
texttable         1.2.1
python-louvain    0.11

Datasets

The code takes an input graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. A sample graph for the `Facebook Politicians` dataset is included in the `data/` directory.

Logging

The models are defined in a way that parameter settings and cluster quality is logged in every single epoch. Specifically we log the followings:

1. Hyperparameter settings.     We save each hyperparameter used in the experiment.
2. Cluster quality.             Measured by modularity. We calculate it in every epoch.
3. Runtime.                     We measure the time needed for optimization -- measured by seconds.

Options

Learning of the embedding is handled by the src/main.py script which provides the following command line arguments.

Input and output options

  --input                STR         Input graph path.                                 Default is `data/food_edges.csv`.
  --embedding-output     STR         Embeddings path.                                  Default is `output/embeddings/food_embedding.csv`.
  --cluster-mean-output  STR         Cluster centers path.                             Default is `output/cluster_means/food_means.csv'`.
  --log-output           STR         Log path.                                         Default is `output/logs/food.log`.
  --assignment-output    STR         Node-cluster assignment dictionary path.          Default is `output/assignments/food.json`.
  --dump-matrices        BOOL        Whether the trained model should be saved.        Default is `True`.

Model options

  --dimensions        INT         Number of dimensions.                             Default is 16.
  --clusters          INT         Number of clusters.                               Default is 20.
  --lambd             FLOAT       KKT penalty.			                                Default is 0.2.
  --alpha             FLOAT       Clustering penalty.                               Default is 0.05.
  --beta              FLOAT       Modularity regularization penalty.                Default is 0.05.
  --eta               FLOAT       Similarity mixing parameter.                      Default is 5.0.
  --lower-control     FLOAT       Floating point overflow control.                  Default is 10**-15.
  --iteration-number  INT         Number of power iterations.                       Default is 200.
  --early-stopping    INT         Early stopping round number based on modularity.  Default is 3.

Examples

The following commands learn a graph embedding, cluster centers and writes them to disk. The node representations are ordered by the ID.

Creating an MNMF embedding of the default dataset with the default hyperparameter settings. Saving the embedding, cluster centres and the log file at the default path.

$ python src/main.py

Turning off the model saving.

$ python src/main.py --dump-matrices False

Creating an embedding of an other dataset the Facebook Companies. Saving the output and the log in a custom place.

$ python src/main.py --input data/company_edges.csv  --embedding-output output/embeddings/company_embedding.csv --cluster-mean-output output/cluster_means/company_means.csv

Creating a clustered embedding of the default dataset in 128 dimensions and 10 cluster centers.

$ python src/main.py --dimensions 128 --clusters 10

License


Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].