mpraski / clusters

Licence: MIT license

Cluster analysis library for Golang

Programming Languages

31211 projects - #10 most used programming language

Projects that are alternatives of or similar to clusters

genieclust

Genie++ Fast and Robust Hierarchical Clustering with Noise Point Detection - for Python and R

Stars: ✭ 34 (-50%)

Mutual labels: clustering, cluster-analysis, clustering-algorithm

clustering-python

Different clustering approaches applied on different problemsets

Stars: ✭ 36 (-47.06%)

Mutual labels: clustering, cluster-analysis, clustering-algorithm

Hdbscan

A high performance implementation of HDBSCAN clustering.

Stars: ✭ 2,032 (+2888.24%)

Mutual labels: clustering, cluster-analysis, clustering-algorithm

clope

Elixir implementation of CLOPE: A Fast and Effective Clustering Algorithm for Transactional Data

Stars: ✭ 18 (-73.53%)

Mutual labels: clustering, clustering-algorithm

genie

Genie: A Fast and Robust Hierarchical Clustering Algorithm (this R package has now been superseded by genieclust)

Stars: ✭ 21 (-69.12%)

Mutual labels: clustering, cluster-analysis

Clustering

Implements "Clustering a Million Faces by Identity"

Stars: ✭ 128 (+88.24%)

Mutual labels: clustering, clustering-algorithm

dropClust

Version 2.1.0 released

Stars: ✭ 19 (-72.06%)

Mutual labels: clustering, cluster-analysis

Clustering-in-Python

Clustering methods in Machine Learning includes both theory and python code of each algorithm. Algorithms include K Mean, K Mode, Hierarchical, DB Scan and Gaussian Mixture Model GMM. Interview questions on clustering are also added in the end.

Stars: ✭ 27 (-60.29%)

Mutual labels: clustering, clustering-algorithm

clueminer

interactive clustering platform

Stars: ✭ 13 (-80.88%)

Mutual labels: clustering, clustering-algorithm

Clustering4Ever

C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.

Stars: ✭ 126 (+85.29%)

Mutual labels: clustering, clustering-algorithm

CoronaDash

COVID-19 spread shiny dashboard with a forecasting model, countries' trajectories graphs, and cluster analysis tools

Stars: ✭ 20 (-70.59%)

Mutual labels: clustering, cluster-analysis

pyclustertend

A python package to assess cluster tendency

Stars: ✭ 38 (-44.12%)

Mutual labels: clustering, cluster-analysis

Clustering-Datasets

This repository contains the collection of UCI (real-life) datasets and Synthetic (artificial) datasets (with cluster labels and MATLAB files) ready to use with clustering algorithms.

Stars: ✭ 189 (+177.94%)

Mutual labels: clustering

G-SimCLR

This is the code base for paper "G-SimCLR : Self-Supervised Contrastive Learning with Guided Projection via Pseudo Labelling" by Souradip Chakraborty, Aritra Roy Gosthipaty and Sayak Paul.

Stars: ✭ 69 (+1.47%)

Mutual labels: clustering

faythe

An experimental cluster brings Prometheus and OpenStack together

Stars: ✭ 18 (-73.53%)

Mutual labels: clustering

Zeitline

A polylinear timeline with clustering, centred on interactions. — Doc and demo https://octree-gva.github.io/Zeitline/

Stars: ✭ 15 (-77.94%)

Mutual labels: clustering

SpectralClustering.jl

Spectral clustering algorithms written in Julia

Stars: ✭ 46 (-32.35%)

Mutual labels: clustering

protoactor-go

Proto Actor - Ultra fast distributed actors for Go, C# and Java/Kotlin

Stars: ✭ 4,138 (+5985.29%)

Mutual labels: clustering

minicore

Fast and memory-efficient clustering + coreset construction, including fast distance kernels for Bregman and f-divergences.

Stars: ✭ 28 (-58.82%)

Mutual labels: clustering

portfolio allocation js

A JavaScript library to allocate and optimize financial portfolios.

Stars: ✭ 145 (+113.24%)

Mutual labels: clustering

View All Similar Projects ➔

Clusters

Go implementations of several clustering algoritms (k-means++, DBSCAN, OPTICS), as well as utilities for importing data and estimating optimal number of clusters.

The reason

This library was built out of necessity for a collection of performant cluster analysis utilities for Golang. Go, thanks to its numerous advantages (single binary distrubution, relative performance, growing community) seems to become an attractive alternative to languages commonly used in statistical computations and machine learning, yet it still lacks crucial tools and libraries. I use the floats package from the robust Gonum library to perform optimized vector calculations in tight loops.

Install

If you have Go 1.7+

go get github.com/mpraski/clusters

Usage

The currently supported hard clustering algorithms are represented by the HardClusterer interface, which defines several common operations. To show an example we create, train and use a KMeans++ clusterer:

var data [][]float64
var observation []float64

// Create a new KMeans++ clusterer with 1000 iterations, 
// 8 clusters and a distance measurement function of type func([]float64, []float64) float64).
// Pass nil to use clusters.EuclideanDistance
c, e := clusters.KMeans(1000, 8, clusters.EuclideanDistance)
if e != nil {
	panic(e)
}

// Use the data to train the clusterer
if e = c.Learn(data); e != nil {
	panic(e)
}

fmt.Printf("Clustered data set into %d\n", c.Sizes())

fmt.Printf("Assigned observation %v to cluster %d\n", observation, c.Predict(observation))

for index, number := range c.Guesses() {
	fmt.Printf("Assigned data point %v to cluster %d\n", data[index], number)
}

Algorithms currenly supported are KMeans++, DBSCAN and OPTICS.

Algorithms which support online learning can be trained this way using Online() function, which relies on channel communication to coordinate the process:

c, e := clusters.KmeansClusterer(1000, 8, clusters.EuclideanDistance)
if e != nil {
	panic(e)
}

c = c.WithOnline(clusters.Online{
	Alpha:     0.5,
	Dimension: 4,
})

var (
	send   = make(chan []float64)
	finish = make(chan struct{})
)

events := c.Online(send, finish)

go func() {
	for {
		select {
		case e := <-events:
			fmt.Printf("Classified observation %v into cluster: %d\n", e.Observation, e.Cluster)
		}
	}
}()

for i := 0; i < 10000; i++ {
	point := make([]float64, 4)
	for j := 0; j < 4; j++ {
		point[j] = 10 * (rand.Float64() - 0.5)
	}
	send <- point
}

finish <- struct{}{}

fmt.Printf("Clustered data set into %d\n", c.Sizes())

The Estimator interface defines an operation of guessing an optimal number of clusters in a dataset. As of now the KMeansEstimator is implemented using gap statistic and k-means++ as the clustering algorithm (see https://web.stanford.edu/~hastie/Papers/gap.pdf):

var data [][]float64

// Create a new KMeans++ estimator with 1000 iterations, 
// a maximum of 8 clusters and default (EuclideanDistance) distance measurement
c, e := clusters.KMeansEstimator(1000, 8, clusters.EuclideanDistance)
if e != nil {
	panic(e)
}

r, e := c.Estimate(data)
if e != nil {
	panic(e)
}

fmt.Printf("Estimated number of clusters: %d\n", r)

The library also provides an Importer to load data from file (as of now the CSV importer is implemented):

// Import first three columns from data.csv
d, e := i.Import("data.csv", 0, 2)
if e != nil {
	panic(e)
}

Development

The list of project goals include:

Implement commonly used hard clustering algorithms
Implement commonly used soft clustering algorithms
Devise reliable tests of performance and quality of each algorithm

Benchmarks

Soon to come.

Licence

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

mpraski / clusters

Programming Languages

Labels

Projects that are alternatives of or similar to clusters

Clusters

The reason

Install

Usage

Development

Benchmarks

Licence