All Projects → gittar → breathing-k-means

gittar / breathing-k-means

Licence: MIT license
The "breathing k-means" algorithm with datasets and example notebooks

Programming Languages

Jupyter Notebook
11667 projects
HTML
75241 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to breathing-k-means

online-course-recommendation-system
Built on data from Pluralsight's course API fetched results. Works with model trained with K-means unsupervised clustering algorithm.
Stars: ✭ 31 (-58.11%)
Mutual labels:  k-means
ClusterAnalysis.jl
Cluster Algorithms from Scratch with Julia Lang. (K-Means and DBSCAN)
Stars: ✭ 22 (-70.27%)
Mutual labels:  k-means
MoTIS
Mobile(iOS) Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP). Accepted at NAACL 2022.
Stars: ✭ 60 (-18.92%)
Mutual labels:  k-means
Genetic-Algorithm-on-K-Means-Clustering
Implementing Genetic Algorithm on K-Means and compare with K-Means++
Stars: ✭ 37 (-50%)
Mutual labels:  k-means
skmeans
Super fast simple k-means implementation for unidimiensional and multidimensional data.
Stars: ✭ 59 (-20.27%)
Mutual labels:  k-means
topic modelling financial news
Topic modelling on financial news with Natural Language Processing
Stars: ✭ 51 (-31.08%)
Mutual labels:  k-means
R-stats-machine-learning
Misc Statistics and Machine Learning codes in R
Stars: ✭ 33 (-55.41%)
Mutual labels:  k-means
A-quantum-inspired-genetic-algorithm-for-k-means-clustering
Implementation of a Quantum inspired genetic algorithm proposed by A quantum-inspired genetic algorithm for k-means clustering paper.
Stars: ✭ 28 (-62.16%)
Mutual labels:  k-means
MachineLearningSeries
Vídeos e códigos do Universo Discreto ensinando o fundamental de Machine Learning em Python. Para mais detalhes, acompanhar a playlist listada.
Stars: ✭ 20 (-72.97%)
Mutual labels:  k-means
k-means
Code accompanying my blog post on k-means in Python, C++ and CUDA
Stars: ✭ 56 (-24.32%)
Mutual labels:  k-means
pytorch kmeans
Implementation of the k-means algorithm in PyTorch that works for large datasets
Stars: ✭ 38 (-48.65%)
Mutual labels:  k-means
tf-example-models
TensorFlow-based implementation of (Gaussian) Mixture Model and some other examples.
Stars: ✭ 42 (-43.24%)
Mutual labels:  k-means
text-cluster
🍡 文本聚类 k-means算法及实战
Stars: ✭ 40 (-45.95%)
Mutual labels:  k-means
SparsifiedKMeans
KMeans for big data using preconditioning and sparsification, Matlab implementation. Aka k-means
Stars: ✭ 50 (-32.43%)
Mutual labels:  k-means

The Breathing K-Means Algorithm (with examples)

The Breathing K-Means is an approximation algorithm for the k-means problem that (on average) is better (higher solution quality) and faster (lower CPU time usage) than k-means++.

Preprint: https://arxiv.org/abs/2006.15666 (submitted for publication)

Upon request comparative experiments with the "Hartigan-Wong" algorithm (the default k-means method in R) were made (post-submission and confirming the choice of k-means++ as point of reference).

Typical results for the "Birch1" data set (100000 points drawn from a mixture of 100 circular Gaussians). k=100 Birch1 data set

Can you spot the mistakes? :-)

Installation from pypi

pip install bkmeans

Local installation to run the examples

Clone the repository

git clone https://github.com/gittar/breathing-k-means

Enter the top directory.

cd breathing-k-means

Create the conda environment 'bkm' (or any other name) via

conda env create -n bkm -f environment.yml

Activate the created environment via

conda activate bkm

To run a jupyter notebook with examples, type, e.g.:

jupyter lab notebooks/2D.ipynb

Content

The top level folder contains the following subfolders

  • data/ - data sets used in the notebooks

  • notebooks/ - jupyter notebooks with all examples from the preprint

  • src/

    • bkmeans.py - reference implementation of breathing k-means
  • misc/

    • aux.py - auxiliary functions
    • dataset.py - general class to administer and plot data sets
    • runfunctions.py - wrapper functions used in the notebook

API

The included class BKMeans is subclassed from scikit-learn's KMeans class and has, therefore, the same API. It can be used as a plug-in replacement for scikit-learn's KMeans.

There is one new parameters which can be ignored (left at default) for normal usage:

  • m (breathing depth), default: 5

The parameter m can also be used, however, to generate faster ( 1 < m < 5) or better (m>5) solutions. For details see the preprint.

Example 1: running on simple random data set

Code:

import numpy as np
from bkmeans import BKMeans

# generate random data set
X=np.random.rand(1000,2)

# create BKMeans instance
bkm = BKMeans(n_clusters=100)

# run the algorithm
bkm.fit(X)

# print SSE (inertia in scikit-learn terms)
print(bkm.inertia_)

Output:

1.1775040547902602

Example 2: comparison with k-means++ (multiple runs)

Code:

import numpy as np
from sklearn.cluster import KMeans
from bkmeans import BKMeans

# random 2D data set
X=np.random.rand(1000,2)

# number of centroids
k=100

for i in range(5):
    # kmeans++
    km = KMeans(n_clusters=k)
    km.fit(X)

    # breathing k-means
    bkm = BKMeans(n_clusters=k)
    bkm.fit(X)

    # relative SSE improvement of bkm over km++
    imp = 1 - bkm.inertia_/km.inertia_
    print(f"SSE improvement over k-means++: {imp:.2%}")

Output:

SSE improvement over k-means++: 3.38%
SSE improvement over k-means++: 4.16%
SSE improvement over k-means++: 6.14%
SSE improvement over k-means++: 6.79%
SSE improvement over k-means++: 4.76%

Acknowledgements

Kudos go the scikit-learn team for their excellent sklearn.cluster.KMeans class, also to the developers and maintainers of the other packages used: numpy, scipy, matplotlib, jupyterlab

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].