All Projects → benedekrozemberczki → EgoSplitting

benedekrozemberczki / EgoSplitting

Licence: GPL-3.0 license
A NetworkX implementation of "Ego-splitting Framework: from Non-Overlapping to Overlapping Clusters" (KDD 2017).

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to EgoSplitting

Awesome Community Detection
A curated list of community detection research papers with implementations.
Stars: ✭ 1,874 (+2302.56%)
Mutual labels:  clustering, community-detection, networkx, igraph, graph-clustering
watset-java
An implementation of the Watset clustering algorithm in Java.
Stars: ✭ 24 (-69.23%)
Mutual labels:  clustering, community-detection, graph-clustering, fuzzy-clustering
LabelPropagation
A NetworkX implementation of Label Propagation from a "Near Linear Time Algorithm to Detect Community Structures in Large-Scale Networks" (Physical Review E 2008).
Stars: ✭ 101 (+29.49%)
Mutual labels:  clustering, community-detection, graph-clustering
M-NMF
An implementation of "Community Preserving Network Embedding" (AAAI 2017)
Stars: ✭ 119 (+52.56%)
Mutual labels:  clustering, community-detection
Heart disease prediction
Heart Disease prediction using 5 algorithms
Stars: ✭ 43 (-44.87%)
Mutual labels:  clustering, machine-learning-algorithms
fuzzy-c-means
Fuzzy c-means Clustering
Stars: ✭ 34 (-56.41%)
Mutual labels:  clustering, fuzzy-clustering
genie
Genie: A Fast and Robust Hierarchical Clustering Algorithm (this R package has now been superseded by genieclust)
Stars: ✭ 21 (-73.08%)
Mutual labels:  clustering, machine-learning-algorithms
genieclust
Genie++ Fast and Robust Hierarchical Clustering with Noise Point Detection - for Python and R
Stars: ✭ 34 (-56.41%)
Mutual labels:  clustering, machine-learning-algorithms
Data mining
The Ruby DataMining Gem, is a little collection of several Data-Mining-Algorithms
Stars: ✭ 10 (-87.18%)
Mutual labels:  clustering, machine-learning-algorithms
Lda Topic Modeling
A PureScript, browser-based implementation of LDA topic modeling.
Stars: ✭ 91 (+16.67%)
Mutual labels:  clustering, machine-learning-algorithms
Ml Dl Scripts
The repository provides usefull python scripts for ML and data analysis
Stars: ✭ 119 (+52.56%)
Mutual labels:  clustering, machine-learning-algorithms
Stellargraph
StellarGraph - Machine Learning on Graphs
Stars: ✭ 2,235 (+2765.38%)
Mutual labels:  machine-learning-algorithms, networkx
watchman
Watchman: An open-source social-media event-detection system
Stars: ✭ 18 (-76.92%)
Mutual labels:  clustering, community-detection
Moa
MOA is an open source framework for Big Data stream mining. It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation.
Stars: ✭ 409 (+424.36%)
Mutual labels:  clustering, machine-learning-algorithms
Hdbscan
A high performance implementation of HDBSCAN clustering.
Stars: ✭ 2,032 (+2505.13%)
Mutual labels:  clustering, machine-learning-algorithms
Machine-Learning-Algorithms
All Machine Learning Algorithms
Stars: ✭ 24 (-69.23%)
Mutual labels:  clustering, machine-learning-algorithms
regression-python
In this repository you can find many different, small, projects which demonstrate regression techniques using python programming language
Stars: ✭ 15 (-80.77%)
Mutual labels:  machine-learning-algorithms
rabbitmq-clusterer
This project is ABANDONWARE. Use https://www.rabbitmq.com/cluster-formation.html instead.
Stars: ✭ 72 (-7.69%)
Mutual labels:  clustering
tsp-essay
A fun study of some heuristics for the Travelling Salesman Problem.
Stars: ✭ 15 (-80.77%)
Mutual labels:  clustering
Social-Network-Analysis-in-Python
Social Network Facebook Analysis (Python, Networkx)
Stars: ✭ 26 (-66.67%)
Mutual labels:  networkx

Ego-Splitting Framework

codebeat badge repo sizebenedekrozemberczki

A NetworkX implementation of Ego-splitting Framework: from Non-Overlapping to Overlapping Clusters (KDD 2017).

Abstract

We propose a new framework called Ego-Splitting for detecting clusters in complex networks which leverage the local structures known as ego-nets (i.e. the subgraph induced by the neighborhood of each node) to de-couple overlapping clusters. Ego-Splitting is a highly scalable and flexible framework, with provable theoretical guarantees, that reduces the complex overlapping clustering problem to a simpler and more amenable non-overlapping (partitioning) problem. We can solve community detection in graphs with tens of billions of edges and outperform previous solutions based on ego-nets analysis.

More precisely, our framework works in two steps: a local ego-net analysis phase, and a global graph partitioning phase . In the local step, we first partition the nodes’ ego-nets using a partitioning algorithm. We then use the computed clusters to split each node into its persona nodes that represent the instantiations of the node in its communities. Then, in the global step, we partition the newly created graph to obtain an overlapping clustering of the original graph.

This repository provides a lightweight NetworkX implementation of Ego-splitting as described in the paper:

Ego-splitting Framework: from Non-Overlapping to Overlapping Clusters. Alessandro Epasto, Silvio Lattanzi, and Renato Paes Leme. KDD, 2017. [Paper]

A reference implementation is available [here].

Requirements

The codebase is implemented in Python 3.5.2. package versions used for development are just below.

networkx          2.4
tqdm              4.28.1
pandas            0.23.4
texttable         1.5.0
argparse          1.1.0
python-louvain    0.13.0

Datasets

The code takes the **edge list** of the graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. Sample graphs for `Facebook Politicians` and `Facebook TV Shows` are included in the `input/` directory.

Options

Training an Ego-splitter model is handled by the src/main.py script which provides the following command line arguments.

Input and output options

  --edge-path       STR     Edge list csv.            Default is `input/tvshow_edges.csv`.
  --features-path   STR     Membership json.          Default is `output/tvshow_cluster_memberships.json`.
  --resolution      FLOAT   Validation set size.      Default is 1.0.

Examples

The following commands create an egonet splitted overlapping community assignment. (Training a model on the default dataset.)

python src/main.py

Training a model with a higher resolution.

python src/main.py --resolution 2.5

Training a model with a lower resolution.

python src/main.py --resolution 0.5

Training a model on the Facebook TV shows dataset.

python src/main.py --edge-path input/tvshow_edges.csv --output-path output/tvshow_cluster_memberships.json

License


Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].