All Projects → varun-suresh → Clustering

varun-suresh / Clustering

Licence: MIT License
Implements "Clustering a Million Faces by Identity"

Programming Languages

python
139335 projects - #7 most used programming language
HTML
75241 projects

Projects that are alternatives of or similar to Clustering

Hdbscan
A high performance implementation of HDBSCAN clustering.
Stars: ✭ 2,032 (+1487.5%)
Mutual labels:  clustering, clustering-algorithm
genieclust
Genie++ Fast and Robust Hierarchical Clustering with Noise Point Detection - for Python and R
Stars: ✭ 34 (-73.44%)
Mutual labels:  clustering, clustering-algorithm
Clustering-in-Python
Clustering methods in Machine Learning includes both theory and python code of each algorithm. Algorithms include K Mean, K Mode, Hierarchical, DB Scan and Gaussian Mixture Model GMM. Interview questions on clustering are also added in the end.
Stars: ✭ 27 (-78.91%)
Mutual labels:  clustering, clustering-algorithm
clueminer
interactive clustering platform
Stars: ✭ 13 (-89.84%)
Mutual labels:  clustering, clustering-algorithm
Clustering4Ever
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Stars: ✭ 126 (-1.56%)
Mutual labels:  clustering, clustering-algorithm
clusters
Cluster analysis library for Golang
Stars: ✭ 68 (-46.87%)
Mutual labels:  clustering, clustering-algorithm
clustering-python
Different clustering approaches applied on different problemsets
Stars: ✭ 36 (-71.87%)
Mutual labels:  clustering, clustering-algorithm
clope
Elixir implementation of CLOPE: A Fast and Effective Clustering Algorithm for Transactional Data
Stars: ✭ 18 (-85.94%)
Mutual labels:  clustering, clustering-algorithm
scclusteval
Single Cell Cluster Evaluation
Stars: ✭ 57 (-55.47%)
Mutual labels:  clustering
lannister
A lightweight MQTT broker w/ full spec,Clustering,WebSocket,SSL written in Java
Stars: ✭ 20 (-84.37%)
Mutual labels:  clustering
pytorch kmeans
Implementation of the k-means algorithm in PyTorch that works for large datasets
Stars: ✭ 38 (-70.31%)
Mutual labels:  clustering
kmeans
K-Means clustering
Stars: ✭ 51 (-60.16%)
Mutual labels:  clustering
retinal-exudates-detection
exudates detection using hybrid approach (Image Morphology & Machine Learning)
Stars: ✭ 53 (-58.59%)
Mutual labels:  clustering
Python-Machine-Learning-Fundamentals
D-Lab's 6 hour introduction to machine learning in Python. Learn how to perform classification, regression, clustering, and do model selection using scikit-learn and TPOT.
Stars: ✭ 46 (-64.06%)
Mutual labels:  clustering
realtimemap-dotnet
A showcase for Proto.Actor - an ultra-fast distributed actors solution for Go, C#, and Java/Kotlin.
Stars: ✭ 47 (-63.28%)
Mutual labels:  clustering
Linux-admin
Shell scripts to automate download of GitHub traffic statistics, cluster administration, and create an animated GIF.
Stars: ✭ 23 (-82.03%)
Mutual labels:  clustering
Fred
A fast, scalable and light-weight C++ Fréchet distance library, exposed to python and focused on (k,l)-clustering of polygonal curves.
Stars: ✭ 13 (-89.84%)
Mutual labels:  clustering
ex united
Easily spawn Elixir nodes (supervising, Mix configured, easy asserted / refuted) within ExUnit tests
Stars: ✭ 40 (-68.75%)
Mutual labels:  clustering
rabbitmq-peer-discovery-aws
AWS-based peer discovery backend for RabbitMQ 3.7.0+
Stars: ✭ 23 (-82.03%)
Mutual labels:  clustering
dropClust
Version 2.1.0 released
Stars: ✭ 19 (-85.16%)
Mutual labels:  clustering

Approximate Rank Order Clustering

This repository contains an implementation of this paper.

What's in this repository

clustering.py - Contains the implementaion of the clustering algorithm.

demo.py - An example to demonstrate usage. To run this, you need to download the LFW data from here. For the face vectors, I used the results from Alfred Xiang Wu's Face Verification Experiment. Also evaluates clustering on the LFW dataset using evaluation.py.

evaluation.py - Script to calculate pairwise precision and recall as explained in the paper. TODO

server.py - Script to visualize the results.

Setup

You will need cmake for this installation.

Step 1:

Create a new virtual environment and clone the repository.

mkvirtualenv (env-name)
workon (env-name)
git clone https:github.com/varun-suresh/Clustering.git

Step 2:

Follow the instructions here to install pyflann.

Step 3:

For the demo, download the LFW data and the face vectors as mentioned above and run

cd Clustering
python demo.py --lfw_path path_to_lfw_dir -v vector_file

Results

Visualization

There is a very basic visualization script in place to examine the clusters. To use the script, download the LFW images and store them in yourpath/Clustering/ directory.

Before you can run the visualization script, you must run the demo script to save the clusters. I have also uploaded the clusters file. You can download that and visualize the clusters as well.

python visualize.py --lfw_path lfw/

On your browser, open this link and you should see the clusters.

Clusters Page

Single Cluster

f1 score:

We get a f1 score of 0.88 ~ 0.9 on the LFW dataset.

Contributions

Thanks Mengyue for looking closely at the precision drop and correcting the error.

Timing:

Using python's multiprocessing module, clustering LFW faces took about ~40 seconds. I did this on an 8-core machine using 4 processes(Using all 8 does not improve it by much because some cores are needed for background processes). The same experiment took 7 seconds on a 20 core machine.

Citations

You should cite the following paper if you use the algorithm.

@ARTICLE{2016arXiv160400989O,
   author = {{Otto}, C. and {Wang}, D. and {Jain}, A.~K.},
    title = "{Clustering Millions of Faces by Identity}",
  journal = {ArXiv e-prints},
archivePrefix = "arXiv",
   eprint = {1604.00989},

Face verification experiment

@article{wulight,
  title={A Light CNN for Deep Face Representation with Noisy Labels},
  author={Wu, Xiang and He, Ran and Sun, Zhenan and Tan, Tieniu}
  journal={arXiv preprint arXiv:1511.02683},
  year={2015}
}

If you use this implementation, please consider citing this implementation and code repository.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].