All Projects → gbuesing → kmeans-clusterer

gbuesing / kmeans-clusterer

Licence: MIT license
k-means clustering in Ruby

Programming Languages

ruby
36898 projects - #4 most used programming language

Projects that are alternatives of or similar to kmeans-clusterer

machine-learning-course
Machine Learning Course @ Santa Clara University
Stars: ✭ 17 (-80.68%)
Mutual labels:  clustering, kmeans-clustering
ParallelKMeans.jl
Parallel & lightning fast implementation of available classic and contemporary variants of the KMeans clustering algorithm
Stars: ✭ 45 (-48.86%)
Mutual labels:  clustering, kmeans-clustering
Clustering-in-Python
Clustering methods in Machine Learning includes both theory and python code of each algorithm. Algorithms include K Mean, K Mode, Hierarchical, DB Scan and Gaussian Mixture Model GMM. Interview questions on clustering are also added in the end.
Stars: ✭ 27 (-69.32%)
Mutual labels:  clustering, kmeans-clustering
k-means-quantization-js
🎨 Apply color quantization to images using k-means clustering.
Stars: ✭ 27 (-69.32%)
Mutual labels:  clustering, kmeans-clustering
tsp-essay
A fun study of some heuristics for the Travelling Salesman Problem.
Stars: ✭ 15 (-82.95%)
Mutual labels:  clustering, kmeans-clustering
swanager
A high-level Docker Services management tool built on top of Swarm
Stars: ✭ 12 (-86.36%)
Mutual labels:  clustering
WatsonCluster
A simple C# class using Watson TCP to enable a one-to-one high availability cluster.
Stars: ✭ 18 (-79.55%)
Mutual labels:  clustering
scSeqR
This package has migrated to https://github.com/rezakj/iCellR please use iCellR instead of scSeqR for more functionalities and updates.
Stars: ✭ 16 (-81.82%)
Mutual labels:  clustering
kohonen-maps
Implementation of SOM and GSOM
Stars: ✭ 62 (-29.55%)
Mutual labels:  clustering
mathematics-statistics-for-data-science
Mathematical & Statistical topics to perform statistical analysis and tests; Linear Regression, Probability Theory, Monte Carlo Simulation, Statistical Sampling, Bootstrapping, Dimensionality reduction techniques (PCA, FA, CCA), Imputation techniques, Statistical Tests (Kolmogorov Smirnov), Robust Estimators (FastMCD) and more in Python and R.
Stars: ✭ 56 (-36.36%)
Mutual labels:  clustering
clustering-python
Different clustering approaches applied on different problemsets
Stars: ✭ 36 (-59.09%)
Mutual labels:  clustering
RcppML
Rcpp Machine Learning: Fast robust NMF, divisive clustering, and more
Stars: ✭ 52 (-40.91%)
Mutual labels:  clustering
snATAC
<<------ Use SnapATAC!!
Stars: ✭ 23 (-73.86%)
Mutual labels:  clustering
Heart disease prediction
Heart Disease prediction using 5 algorithms
Stars: ✭ 43 (-51.14%)
Mutual labels:  clustering
consul role
Ansible role to install Consul (cluster of) server/agent
Stars: ✭ 14 (-84.09%)
Mutual labels:  clustering
syncflux
SyncFlux is an Open Source InfluxDB Data synchronization and replication tool for migration purposes or HA clusters
Stars: ✭ 145 (+64.77%)
Mutual labels:  clustering
EgoSplitting
A NetworkX implementation of "Ego-splitting Framework: from Non-Overlapping to Overlapping Clusters" (KDD 2017).
Stars: ✭ 78 (-11.36%)
Mutual labels:  clustering
clusterix
Visual exploration of clustered data.
Stars: ✭ 44 (-50%)
Mutual labels:  clustering
A-quantum-inspired-genetic-algorithm-for-k-means-clustering
Implementation of a Quantum inspired genetic algorithm proposed by A quantum-inspired genetic algorithm for k-means clustering paper.
Stars: ✭ 28 (-68.18%)
Mutual labels:  clustering
Revisiting-Contrastive-SSL
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [NeurIPS 2021]
Stars: ✭ 81 (-7.95%)
Mutual labels:  clustering

KMeansClusterer

k-means clustering in Ruby. Uses NArray under the hood for fast calculations.

Jump to the examples directory to see this in action.

Features

  • Runs multiple clustering attempts to find optimal solution (single runs are susceptible to falling into non-optimal local minima)
  • Initializes centroids via k-means++ algorithm, for faster convergence
  • Calculates silhouette score for evaluation
  • Option to scale data before clustering, so that output isn't biased by different feature scales
  • Works with high-dimensional data

Install

gem install kmeans-clusterer

Usage

Simple example:

require 'kmeans-clusterer'

data = [[40.71,-74.01],[34.05,-118.24],[39.29,-76.61],
        [45.52,-122.68],[38.9,-77.04],[36.11,-115.17]]

labels = ['New York', 'Los Angeles', 'Baltimore', 
          'Portland', 'Washington DC', 'Las Vegas']

k = 2 # find 2 clusters in data

kmeans = KMeansClusterer.run k, data, labels: labels, runs: 5

kmeans.clusters.each do |cluster|
  puts  cluster.id.to_s + '. ' + 
        cluster.points.map(&:label).join(", ") + "\t" +
        cluster.centroid.to_s
end

# Use existing clusters for prediction with new data:
predicted = kmeans.predict [[41.85,-87.65]] # Chicago
puts "\nClosest cluster to Chicago: #{predicted[0]}"

# Clustering quality score. Value between -1.0..1.0 (1.0 is best)
puts "\nSilhouette score: #{kmeans.silhouette.round(2)}"

Output of simple example:

0. New York, Baltimore, Washington DC [39.63, -75.89]
1. Los Angeles, Portland, Las Vegas [38.56, -118.7]

Closest cluster to Chicago: 0

Silhouette score: 0.91

Options

The following options can be passed in to KMeansClusterer.run:

option default description
:labels nil optional array of Ruby objects to collate with data array
:runs 10 number of times to run kmeans
:log false print stats after each run
:init :kmpp algorithm for picking initial cluster centroids. Accepts :kmpp, :random, or an array of k centroids
:scale_data false scales features before clustering using formula (data - mean) / std
:float_precision :double float precision to use. :double or :single
:max_iter 300 max iterations per run
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].