All Projects → cmtt → kmpp

cmtt / kmpp

Licence: other
k-means clustering algorithm with k-means++ initialization.

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to kmpp

Clustering-in-Python
Clustering methods in Machine Learning includes both theory and python code of each algorithm. Algorithms include K Mean, K Mode, Hierarchical, DB Scan and Gaussian Mixture Model GMM. Interview questions on clustering are also added in the end.
Stars: ✭ 27 (-3.57%)
Mutual labels:  clustering-algorithm, kmeans-algorithm
skmeans
Super fast simple k-means implementation for unidimiensional and multidimensional data.
Stars: ✭ 59 (+110.71%)
Mutual labels:  kmeans-algorithm
genieclust
Genie++ Fast and Robust Hierarchical Clustering with Noise Point Detection - for Python and R
Stars: ✭ 34 (+21.43%)
Mutual labels:  clustering-algorithm
Clustering4Ever
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Stars: ✭ 126 (+350%)
Mutual labels:  clustering-algorithm
cyoptics-clustering
Fast OPTICS clustering in Cython + gradient cluster extraction
Stars: ✭ 23 (-17.86%)
Mutual labels:  clustering-algorithm
Statistical-Learning-using-R
This is a Statistical Learning application which will consist of various Machine Learning algorithms and their implementation in R done by me and their in depth interpretation.Documents and reports related to the below mentioned techniques can be found on my Rpubs profile.
Stars: ✭ 27 (-3.57%)
Mutual labels:  clustering-algorithm
clope
Elixir implementation of CLOPE: A Fast and Effective Clustering Algorithm for Transactional Data
Stars: ✭ 18 (-35.71%)
Mutual labels:  clustering-algorithm
Genetic-Algorithm-on-K-Means-Clustering
Implementing Genetic Algorithm on K-Means and compare with K-Means++
Stars: ✭ 37 (+32.14%)
Mutual labels:  clustering-algorithm
Project17-C-Map
Map SDK를 활용한 POI Clustering Interaction Dev.
Stars: ✭ 41 (+46.43%)
Mutual labels:  clustering-algorithm
tsp-essay
A fun study of some heuristics for the Travelling Salesman Problem.
Stars: ✭ 15 (-46.43%)
Mutual labels:  kmeans-algorithm
clueminer
interactive clustering platform
Stars: ✭ 13 (-53.57%)
Mutual labels:  clustering-algorithm
Hdbscan
A high performance implementation of HDBSCAN clustering.
Stars: ✭ 2,032 (+7157.14%)
Mutual labels:  clustering-algorithm
Project17-B-Map
Map SDK를 활용한 POI Clustering Interaction Dev
Stars: ✭ 78 (+178.57%)
Mutual labels:  clustering-algorithm
Clustering
Implements "Clustering a Million Faces by Identity"
Stars: ✭ 128 (+357.14%)
Mutual labels:  clustering-algorithm
Study-of-David-Mackay-s-book-
David Mackay's book review and problem solvings and own python codes, mathematica files
Stars: ✭ 46 (+64.29%)
Mutual labels:  clustering-algorithm
ST-DBSCAN
Implementation of ST-DBSCAN algorithm based on Birant 2007
Stars: ✭ 25 (-10.71%)
Mutual labels:  clustering-algorithm
clustering-python
Different clustering approaches applied on different problemsets
Stars: ✭ 36 (+28.57%)
Mutual labels:  clustering-algorithm
kmeans-dbscan-tutorial
A clustering tutorial with scikit-learn for beginners.
Stars: ✭ 20 (-28.57%)
Mutual labels:  clustering-algorithm
neural clustering process
Implementation of the Neural Clustering Process algorithm in Pytorch
Stars: ✭ 24 (-14.29%)
Mutual labels:  clustering-algorithm
kelp-core
www.kelp-ml.org
Stars: ✭ 19 (-32.14%)
Mutual labels:  clustering-algorithm

kmpp

Travis CI

When dealing with lots of data points, clustering algorithms may be used to group them. The k-means algorithm partitions n data points into k clusters and finds the centroids of these clusters incrementally.

The algorithm assigns data points to the closest cluster, and the centroids of each cluster are re-calculated. These steps are repeated until the centroids do not changing anymore.

The basic k-means algorithm is initialized with k centroids at random positions. This implementation addresses some disadvantages of the arbitrary initialization method with the k-means++ algorithm (see "Further reading" at the end).

Installation

Installing via npm

Install kmpp as Node.js module via NPM:

$ npm install kmpp

Example

var kmpp = require('kmpp');

kmpp([
  [x1, y1, ...],
  [x2, y2, ...],
  [x3, y3, ...],
  ...
], {
  k: 4
});

// =>
// { converged: true,
//   centroids: [[xm1, ym1, ...], [xm2, ym2, ...], [xm3, ym3, ...]],
//   counts: [ 7, 6, 7 ],
//   assignments: [ 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1 ]
// }

API

kmpp(points[, opts)

Exectes the k-means++ algorithm on points.

Arguments:

  • points (Array): An array-of-arrays containing the points in format [[x1, y1, ...], [x2, y2, ...], [x3, y3, ...], ...]
  • opts: object containing configuration parameters. Parameters are
    • distance (function): Optional function that takes two points and returns the distance between them.
    • initialize (Boolean): Perform initialization. If false, uses the initial state provided in centroids and assignments. Otherwise discards any initial state and performs initialization.
    • k (Number): number of centroids. If not provided, sqrt(n / 2) is used, where n is the number of points.
    • kmpp (Boolean, default: true): If true, uses k-means++ initialization. Otherwise uses naive random assignment.
    • maxIterations (Number, default: 100): Maximum allowed number of iterations.
    • norm (Number, default: 2): L-norm used for distance computation. 1 is Manhattan norm, 2 is Euclidean norm. Ignored if distance function is provided.
    • centroids (Array): An array of centroids. If initialize is false, used as initialization for the algorithm, otherwise overwritten in-place if of the correct size.
    • assignments (Array): An array of assignments. Used for initialization, otherwise overwritten.
    • counts (Array): An output array used to avoid extra allocation. Values are discarded and overwritten.

Returns an object containing information about the centroids and point assignments. Values are:

  • converged: true if the algorithm converged successfully
  • centroids: a list of centroids
  • counts: the number of points assigned to each respective centroid
  • assignments: a list of integer assignments of each point to the respective centroid
  • iterations: number of iterations used

Credits

  • Jared Harkins improved the performance by reducing the amount of function calls, reverting to Manhattan distance for measurements and improved the random initialization by choosing from points

  • Ricky Reusser refactored API

Further reading

License

© 2017-2019. MIT License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].