Alternatives and detailed information of graphgrove

nmonath / graphgrove

Licence: Apache-2.0 License

A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search

Programming Languages

C++

36643 projects - #6 most used programming language

50402 projects - #5 most used programming language

Projects that are alternatives of or similar to graphgrove

Clustering-in-Python

Clustering methods in Machine Learning includes both theory and python code of each algorithm. Algorithms include K Mean, K Mode, Hierarchical, DB Scan and Gaussian Mixture Model GMM. Interview questions on clustering are also added in the end.

Stars: ✭ 27 (-6.9%)

Mutual labels: clustering, hierarchical-clustering

genieclust

Genie++ Fast and Robust Hierarchical Clustering with Noise Point Detection - for Python and R

Stars: ✭ 34 (+17.24%)

Mutual labels: clustering, hierarchical-clustering

adventures-with-ann

All the code for a series of Medium articles on Approximate Nearest Neighbors

Stars: ✭ 40 (+37.93%)

Mutual labels: nearest-neighbor-search, nearest-neighbors

Lopq

Training of Locally Optimized Product Quantization (LOPQ) models for approximate nearest neighbor search of high dimensional data in Python and Spark.

Stars: ✭ 530 (+1727.59%)

Mutual labels: clustering, nearest-neighbor-search

Smile

Statistical Machine Intelligence & Learning Engine

Stars: ✭ 5,412 (+18562.07%)

Mutual labels: clustering, nearest-neighbor-search

pynanoflann

Unofficial python wrapper to the nanoflann k-d tree

Stars: ✭ 24 (-17.24%)

Mutual labels: nearest-neighbor-search, nearest-neighbors

R-stats-machine-learning

Misc Statistics and Machine Learning codes in R

Stars: ✭ 33 (+13.79%)

Mutual labels: clustering, nearest-neighbors

hierarchical-clustering

A Python implementation of divisive and hierarchical clustering algorithms. The algorithms were tested on the Human Gene DNA Sequence dataset and dendrograms were plotted.

Stars: ✭ 62 (+113.79%)

Mutual labels: clustering, hierarchical-clustering

LIUM

Scripts for LIUM SpkDiarization tools

Stars: ✭ 28 (-3.45%)

Mutual labels: clustering

rabbitmq-peer-discovery-aws

AWS-based peer discovery backend for RabbitMQ 3.7.0+

Stars: ✭ 23 (-20.69%)

Mutual labels: clustering

dropClust

Version 2.1.0 released

Stars: ✭ 19 (-34.48%)

Mutual labels: clustering

lannister

A lightweight MQTT broker w/ full spec,Clustering,WebSocket,SSL written in Java

Stars: ✭ 20 (-31.03%)

Mutual labels: clustering

treecut

Find nodes in hierarchical clustering that are statistically significant

Stars: ✭ 26 (-10.34%)

Mutual labels: clustering

BPRMeth

Modelling DNA methylation profiles

Stars: ✭ 18 (-37.93%)

Mutual labels: clustering

dbscan-python

[New Version] Theoretically Efficient and Practical Parallel DBSCAN

Stars: ✭ 18 (-37.93%)

Mutual labels: clustering

kdtree

A k-d tree implementation in Go.

Stars: ✭ 98 (+237.93%)

Mutual labels: nearest-neighbor-search

Unsupervised-Learning-in-R

Workshop (6 hours): Clustering (Hdbscan, LCA, Hopach), dimension reduction (UMAP, GLRM), and anomaly detection (isolation forests).

Stars: ✭ 34 (+17.24%)

Mutual labels: clustering

sentences-similarity-cluster

Calculate similarity of sentences & Cluster the result.

Stars: ✭ 14 (-51.72%)

Mutual labels: hierarchical-clustering

realtimemap-dotnet

A showcase for Proto.Actor - an ultra-fast distributed actors solution for Go, C#, and Java/Kotlin.

Stars: ✭ 47 (+62.07%)

Mutual labels: clustering

ex united

Easily spawn Elixir nodes (supervising, Mix configured, easy asserted / refuted) within ExUnit tests

Stars: ✭ 40 (+37.93%)

Mutual labels: clustering

View All Similar Projects ➔

Install

Linux wheels available (python >=3.6) on pypi:

pip install graphgrove

Building from source:

conda create -n gg python=3.8
conda activate gg
pip install numpy
make

To build your own wheel:

conda create -n gg python=3.8
conda activate gg
pip install numpy
make
pip install build
python -m build --wheel
# which can be used as:
# pip install --force dist/graphgrove-0.0.1-cp37-cp37m-linux_x86_64.whl

Examples

Toy examples of clustering, DAG-structured clustering, and nearest neighbor search are available.

At a high level, incremental clustering can be done as:

import graphgrove as gg
k = 5
num_rounds = 50
thresholds = np.geomspace(1.0, 0.001, num_rounds).astype(np.float32)
scc = gg.vec_scc.Cosine_SCC(k=k, num_rounds=num_rounds, thresholds=thresholds, index_name='cosine_sgtree', cores=cores, verbosity=0)
# data_batches - generator of numpy matrices mini-batch-size by dim
for batch in data_batches:
    scc.partial_fit(batch)

Incremental nearest neighbor search can be done as:

import graphgrove as gg
k=5
cores=4
tree = gg.graph_builder.Cosine_SGTree(k=k, cores=cores)
# data_batches - generator of numpy matrices mini-batch-size by dim
for batch in data_batches:
    tree.insert(batch) # or tree.insert_and_knn(batch)

Algorithms Implemented

Clustering:

Sub-Cluster Component Algorithm (SCC) and its minibatch variant from the paper: Scalable Hierarchical Agglomerative Clustering. Nicholas, Monath, Kumar Avinava Dubey, Guru Guruganesh, Manzil Zaheer, Amr Ahmed, Andrew McCallum, Gokhan Mergen, Marc Najork Mert Terzihan Bryon Tjanaka Yuan Wang Yuchen Wu. KDD. 2021
DAG Structured clustering (LLama) from DAG-Structured Clustering by Nearest Neighbors. Nicholas Monath, Manzil Zaheer, Kumar Avinava Dubey, Amr Ahmed, Andrew McCallum. AISTATS 2021.

Nearest Neighbor Search:

CoverTree: Alina Beygelzimer, Sham Kakade, and John Langford. "Cover trees for nearest neighbor." ICML. 2006.
SGTree: SG-Tree is a new data structure for exact nearest neighbor search inspired from Cover Tree and its improvement, which has been used in the TerraPattern project. At a high level, SG-Tree tries to create a hierarchical tree where each node performs a "coarse" clustering. The centers of these "clusters" become the children and subsequent insertions are recursively performed on these children. When performing the NN query, we prune out solutions based on a subset of the dimensions that are being queried. This is particularly useful when trying to find the nearest neighbor in highly clustered subset of the data, e.g. when the data comes from a recursive mixture of Gaussians or more generally time marginalized coalscent process . The effect of these two optimizations is that our data structure is extremely simple, highly parallelizable and is comparable in performance to existing NN implementations on many data-sets. Manzil Zaheer, Guru Guruganesh, Golan Levin, Alexander Smola. TerraPattern: A Nearest Neighbor Search Service. 2019.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

nmonath / graphgrove

Programming Languages

Labels

Projects that are alternatives of or similar to graphgrove

Install

Examples

Algorithms Implemented