All Projects → Behrouz-Babaki → Cop Kmeans

Behrouz-Babaki / Cop Kmeans

Licence: mit
A Python implementation of COP-KMEANS algorithm

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Cop Kmeans

Multilingual Latent Dirichlet Allocation Lda
A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.
Stars: ✭ 64 (-27.27%)
Mutual labels:  clustering
Self Supervised Learning Overview
📜 Self-Supervised Learning from Images: Up-to-date reading list.
Stars: ✭ 73 (-17.05%)
Mutual labels:  clustering
Stringlifier
Stringlifier is on Opensource ML Library for detecting random strings in raw text. It can be used in sanitising logs, detecting accidentally exposed credentials and as a pre-processing step in unsupervised ML-based analysis of application text data.
Stars: ✭ 85 (-3.41%)
Mutual labels:  clustering
Weka Jruby
Machine Learning & Data Mining with JRuby
Stars: ✭ 64 (-27.27%)
Mutual labels:  clustering
Hazelcast Cpp Client
Hazelcast IMDG C++ Client
Stars: ✭ 67 (-23.86%)
Mutual labels:  clustering
Lithosphere Docker
The docker for lithosphere project
Stars: ✭ 76 (-13.64%)
Mutual labels:  clustering
Bottleneck
Job scheduler and rate limiter, supports Clustering
Stars: ✭ 1,113 (+1164.77%)
Mutual labels:  clustering
N2d
A deep clustering algorithm. Code to reproduce results for the paper N2D: (Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding.
Stars: ✭ 88 (+0%)
Mutual labels:  clustering
Pt Sdae
PyTorch implementation of SDAE (Stacked Denoising AutoEncoder)
Stars: ✭ 72 (-18.18%)
Mutual labels:  clustering
Supercluster
A very fast geospatial point clustering library for browsers and Node.
Stars: ✭ 1,246 (+1315.91%)
Mutual labels:  clustering
Cluster
Easy Map Annotation Clustering 📍
Stars: ✭ 1,132 (+1186.36%)
Mutual labels:  clustering
Contrastive Clustering
Code for the paper "Contrastive Clustering" (AAAI 2021)
Stars: ✭ 67 (-23.86%)
Mutual labels:  clustering
Ml code
A repository for recording the machine learning code
Stars: ✭ 75 (-14.77%)
Mutual labels:  clustering
Ekka
Autocluster and Autoheal for EMQ X Broker
Stars: ✭ 63 (-28.41%)
Mutual labels:  clustering
Ml
A high-level machine learning and deep learning library for the PHP language.
Stars: ✭ 1,270 (+1343.18%)
Mutual labels:  clustering
Mnesiac
Mnesia autoclustering made easy!
Stars: ✭ 62 (-29.55%)
Mutual labels:  clustering
Tgcontest
Telegram Data Clustering contest solution by Mindful Squirrel
Stars: ✭ 74 (-15.91%)
Mutual labels:  clustering
Machine learning code
机器学习与深度学习算法示例
Stars: ✭ 88 (+0%)
Mutual labels:  clustering
Libcluster
Automatic cluster formation/healing for Elixir applications
Stars: ✭ 1,280 (+1354.55%)
Mutual labels:  clustering
Icellr
Single (i) Cell R package (iCellR) is an interactive R package to work with high-throughput single cell sequencing technologies (i.e scRNA-seq, scVDJ-seq, ST and CITE-seq).
Stars: ✭ 80 (-9.09%)
Mutual labels:  clustering

COP-Kmeans

DOI

This is an implementations of the Constrained K-means algorithm, introduced by Wagstaff et al. This implementation is developed according to the description of algorithm as presented in [1].

The COP-Kmeans algorithm

This is the COP-Kmeans algorithm, as described in [1]:

Usage

usage: run_ckm.py [-h] [--ofile OFILE] [--n_rep N_REP] [--m_iter M_ITER] [--tol TOL] dfile cfile k

Run COP-Kmeans algorithm

positional arguments:
  dfile            data file
  cfile            constraint file
  k                number of clusters

optional arguments:
  -h, --help       show this help message and exit
  --ofile OFILE    file to store the output
  --n_rep N_REP    number of times to repeat the algorithm
  --m_iter M_ITER  maximum number of iterations of the main loop
  --tol TOL        tolerance for deciding on convergence

To see a run of the algorithm on example data and constraints, run the script runner.sh in the examples directory.

Package install

Run this command to install this package.

% python setup.py install

Here is simple example to call this module

import numpy
from copkmeans.cop_kmeans import cop_kmeans
input_matrix = numpy.random.rand(100, 500)
must_link = [(0, 10), (0, 20), (0, 30)]
cannot_link = [(1, 10), (2, 10), (3, 10)]
clusters, centers = cop_kmeans(dataset=input_matrix, k=5, ml=must_link,cl=cannot_link)

In the variable, clusters, you could see list of integer which has cluster number according to index of given data.

Citing

If you want to cite this implementation, you can use the following bibtex entry (other formats are also available):

@misc{behrouz_babaki_2017_831850,
  author       = {Behrouz Babaki},
  title        = {COP-Kmeans version 1.5},
  month        = jul,
  year         = 2017,
  doi          = {10.5281/zenodo.831850},
  url          = {https://doi.org/10.5281/zenodo.831850}
}

There's more ...

Other implementations

Other types of constraints

There is another version of constrained Kmeans that handles size constraints [2]. A python implementation of the algorithm (and its extensions) is available here.

Exact algorithms for constrained clustering

In 2013-14, I was working on developing an integer linear programming formulation for an instance of the constrained clustering problem. The approach that I chose was Branch-and-Price (also referred to as column-generation). In the initialization step of my algorithm, I needed another algorithm that can produce solutions of reasonably good quality very quickly. The algorithm COP-Kmeans turned out to be exactly what I was looking for. Interested in knowing more about my own work? Go to my homepage, from where you can access my paper [3] and the corresponding code.

There is also a body of work on using constraint programming for exact constrained clustering. In particular, [4] is the state-of-the art in exact constrained clustering.

References

  1. Wagstaff, K., Cardie, C., Rogers, S., & Schrödl, S. (2001, June). Constrained k-means clustering with background knowledge. In ICML (Vol. 1, pp. 577-584).

  2. Bradley, P. S., K. P. Bennett, and Ayhan Demiriz. "Constrained k-means clustering." Microsoft Research, Redmond (2000): 1-8.

  3. Babaki, B., Guns, T., & Nijssen, S. (2014). Constrained clustering using column generation. In Integration of AI and OR Techniques in Constraint Programming (pp. 438-454). Springer International Publishing.

  4. Guns, Tias, Christel Vrain, and Khanh-Chuong Duong. "Repetitive branch-and-bound using constraint programming for constrained minimum sum-of-squares clustering." 22nd European Conference on Artificial Intelligence. 2016.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].