Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → benedekrozemberczki → Danmf

benedekrozemberczki / Danmf

Licence: gpl-3.0

A sparsity aware implementation of "Deep Autoencoder-like Nonnegative Matrix Factorization for Community Detection" (CIKM 2018).

Programming Languages

python

139335 projects - #7 most used programming language

Labels

deep-learning machine-learning data-science clustering unsupervised-learning word2vec autoencoder sklearn

Projects that are alternatives of or similar to Danmf

Gemsec

The TensorFlow reference implementation of 'GEMSEC: Graph Embedding with Self Clustering' (ASONAM 2019).

Stars: ✭ 210 (+30.43%)

Mutual labels: unsupervised-learning, word2vec, clustering

Qlik Py Tools

Data Science algorithms for Qlik implemented as a Python Server Side Extension (SSE).

Stars: ✭ 135 (-16.15%)

Mutual labels: data-science, sklearn, clustering

Bagofconcepts

Python implementation of bag-of-concepts

Stars: ✭ 18 (-88.82%)

Mutual labels: unsupervised-learning, word2vec, clustering

Awesome Community Detection

A curated list of community detection research papers with implementations.

Stars: ✭ 1,874 (+1063.98%)

Mutual labels: data-science, unsupervised-learning, clustering

Pyod

A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)

Stars: ✭ 5,083 (+3057.14%)

Mutual labels: data-science, unsupervised-learning, autoencoder

Tadw

An implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).

Stars: ✭ 43 (-73.29%)

Mutual labels: data-science, unsupervised-learning, word2vec

Text Summarizer

Python Framework for Extractive Text Summarization

Stars: ✭ 96 (-40.37%)

Mutual labels: unsupervised-learning, word2vec, clustering

Tybalt

Training and evaluating a variational autoencoder for pan-cancer gene expression data

Stars: ✭ 126 (-21.74%)

Mutual labels: unsupervised-learning, autoencoder

Ds Ai Tech Notes

📖 [译] 数据科学和人工智能技术笔记

Stars: ✭ 131 (-18.63%)

Mutual labels: data-science, sklearn

Automl alex

State-of-the art Automated Machine Learning python library for Tabular Data

Stars: ✭ 132 (-18.01%)

Mutual labels: data-science, sklearn

Pt Dec

PyTorch implementation of DEC (Deep Embedding Clustering)

Stars: ✭ 132 (-18.01%)

Mutual labels: autoencoder, clustering

Role2vec

A scalable Gensim implementation of "Learning Role-based Graph Embeddings" (IJCAI 2018).

Stars: ✭ 134 (-16.77%)

Mutual labels: word2vec, sklearn

Machine Learning Projects

This repository consists of all my Machine Learning Projects.

Stars: ✭ 135 (-16.15%)

Mutual labels: sklearn, clustering

Stanford Cs 229 Machine Learning

VIP cheatsheets for Stanford's CS 229 Machine Learning

Stars: ✭ 12,827 (+7867.08%)

Mutual labels: data-science, unsupervised-learning

Calc

Convolutional Autoencoder for Loop Closure

Stars: ✭ 119 (-26.09%)

Mutual labels: unsupervised-learning, autoencoder

Ml Email Clustering

Email clustering with machine learning

Stars: ✭ 116 (-27.95%)

Mutual labels: data-science, clustering

Dat8

General Assembly's 2015 Data Science course in Washington, DC

Stars: ✭ 1,516 (+841.61%)

Mutual labels: data-science, clustering

Matrixprofile

A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.

Stars: ✭ 141 (-12.42%)

Mutual labels: data-science, clustering

Complete Life Cycle Of A Data Science Project

Complete-Life-Cycle-of-a-Data-Science-Project

Stars: ✭ 140 (-13.04%)

Mutual labels: data-science, unsupervised-learning

Graphwavemachine

A scalable implementation of "Learning Structural Node Embeddings Via Diffusion Wavelets (KDD 2018)".

Stars: ✭ 151 (-6.21%)

Mutual labels: unsupervised-learning, word2vec

View All Similar Projects ➔

DANMF

⠀⠀

An implementation of Deep Autoencoder-like Nonnegative Matrix Factorization for Community Detection (CIKM 2018).

Abstract

Community structure is ubiquitous in real-world complex networks. The task of community detection over these networks is of paramount importance in a variety of applications. Recently, nonnegative matrix factorization (NMF) has been widely adopted for community detection due to its great interpretability and its natural fitness for capturing the community membership of nodes. However, the existing NMF-based community detection approaches are shallow methods. They learn the community assignment by mapping the original network to the community membership space directly. Considering the complicated and diversified topology structures of real-world networks, it is highly possible that the mapping between the original network and the community membership space contains rather complex hierarchical information, which cannot be interpreted by classic shallow NMF-based approaches. Inspired by the unique feature representation learning capability of deep autoencoder, we propose a novel model, named Deep Autoencoder-like NMF (DANMF), for community detection. Similar to deep autoencoder, DANMF consists of an encoder component and a decoder component. This architecture empowers DANMF to learn the hierarchical mappings between the original network and the final community assignment with implicit low-to-high level hidden attributes of the original network learnt in the intermediate layers. Thus, DANMF should be better suited to the community detection task. Extensive experiments on benchmark datasets demonstrate that DANMF can achieve better performance than the state-of-the-art NMF-based community detection approaches.

The model is now also available in the package Karate Club.

This repository provides an implementation for DANMF as described in the paper:

Deep Autoencoder-like Nonnegative Matrix Factorization for Community Detection. Fanghua Ye, Chuan Chen, and Zibin Zheng. CIKM, 2018. [Paper]

A MatLab reference implementation is available [here].

Requirements

The codebase is implemented in Python 3.5.2. package versions used for development are just below.

networkx          1.11
tqdm              4.28.1
numpy             1.15.4
pandas            0.23.4
texttable         1.5.0
scipy             1.1.0
argparse          1.1.0
sklearn           0.20.0

Datasets

The code takes an input graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. Sample graphs for the `Twitch Brasilians` ,`Wikipedia Chameleons` and `Wikipedia Giraffes` are included in the `input/` directory.

Options

Learning of the embedding is handled by the `src/main.py` script which provides the following command line arguments.

Input and output options

  --edge-path         STR    Input graph path.       Default is `input/ptbr_edges.csv`.
  --membership-path   STR    Membership path.        Default is `output/ptbr_membership.json`.
  --output-path       STR    Embedding path.         Default is `output/ptbr_danmf.csv`.

Model options

  --iterations            INT         Number of epochs.                     Default is 100.
  --pre-iterations        INT         Layer-wise epochs.                    Default is 100.
  --seed                  INT         Random seed value.                    Default is 42.
  --lamb                  FLOAT       Regularization parameter.             Default is 0.01.
  --layers                LST         Layer sizes in autoencoder model.     Default is [32, 8]
  --calculate-loss        BOOL        Loss calculation for the model.       Default is False.

Examples

The following commands learn a graph embedding and write this embedding to disk. The node representations are ordered by node identifiers. Layer sizes are always set manually.

Creating a DANMF embedding of the default dataset with a 128-64-32-16 architecture. Saving the embedding at the default path.

python src/main.py --layers 128 64 32 16

Creating a DANMF embedding of the default dataset with a 96-8 architecture and calculationg the loss.

python src/main.py --layers 96 8 --calculate-loss

Creating a single layer DANMF embedding with 32 factors.

python src/main.py --layers 32

Creating an embedding with some custom cluster number in the bottleneck layer.

python src/main.py --layers 128 64 7

Creating an embedding of another dataset the Wikipedia Chameleons. Saving the output in a custom folder.

python src/main.py --layers 32 8 --edge-path input/chameleon_edges.csv --output-path output/chameleon_danmf.csv --membership-path output/chameleon_membership.json

License

GNU License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 161

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗