Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → DwangoMediaVillage → Pqkmeans

DwangoMediaVillage / Pqkmeans

Licence: mit

Fast and memory-efficient clustering

Labels

jupyter-notebook computer-vision scikit-learn clustering

Projects that are alternatives of or similar to Pqkmeans

Machine Learning With Python

Practice and tutorial-style notebooks covering wide variety of machine learning techniques

Stars: ✭ 2,197 (+1062.43%)

Mutual labels: jupyter-notebook, scikit-learn, clustering

Python Clustering Exercises

Jupyter Notebook exercises for k-means clustering with Python 3 and scikit-learn

Stars: ✭ 153 (-19.05%)

Mutual labels: jupyter-notebook, scikit-learn, clustering

Text Analytics With Python

Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.

Stars: ✭ 1,132 (+498.94%)

Mutual labels: jupyter-notebook, scikit-learn, clustering

Practical Machine Learning With Python

Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.

Stars: ✭ 1,868 (+888.36%)

Mutual labels: jupyter-notebook, scikit-learn, clustering

Dat8

General Assembly's 2015 Data Science course in Washington, DC

Stars: ✭ 1,516 (+702.12%)

Mutual labels: jupyter-notebook, scikit-learn, clustering

Ml Forex Prediction

Predicting Forex Future Price with Machine Learning

Stars: ✭ 142 (-24.87%)

Mutual labels: jupyter-notebook, scikit-learn

Python Machine Learning Book

The "Python Machine Learning (1st edition)" book code repository and info resource

Stars: ✭ 11,428 (+5946.56%)

Mutual labels: jupyter-notebook, scikit-learn

Hands On Machine Learning With Scikit Learn Keras And Tensorflow

Notes & exercise solutions of Part I from the book: "Hands-On ML with Scikit-Learn, Keras & TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems" by Aurelien Geron

Stars: ✭ 151 (-20.11%)

Mutual labels: jupyter-notebook, scikit-learn

Qlik Py Tools

Data Science algorithms for Qlik implemented as a Python Server Side Extension (SSE).

Stars: ✭ 135 (-28.57%)

Mutual labels: scikit-learn, clustering

Hdbscan

A high performance implementation of HDBSCAN clustering.

Stars: ✭ 2,032 (+975.13%)

Mutual labels: jupyter-notebook, clustering

Cheatsheets.pdf

📚 Various cheatsheets in PDF

Stars: ✭ 159 (-15.87%)

Mutual labels: jupyter-notebook, scikit-learn

Py4chemoinformatics

Python for chemoinformatics

Stars: ✭ 140 (-25.93%)

Mutual labels: jupyter-notebook, scikit-learn

Python Machine Learning Book 3rd Edition

The "Python Machine Learning (3rd edition)" book code repository

Stars: ✭ 2,883 (+1425.4%)

Mutual labels: jupyter-notebook, scikit-learn

Ml Workspace

🛠 All-in-one web-based IDE specialized for machine learning and data science.

Stars: ✭ 2,337 (+1136.51%)

Mutual labels: jupyter-notebook, scikit-learn

Interactive machine learning

IPython widgets, interactive plots, interactive machine learning

Stars: ✭ 140 (-25.93%)

Mutual labels: jupyter-notebook, scikit-learn

Machine Learning And Reinforcement Learning In Finance

Machine Learning and Reinforcement Learning in Finance New York University Tandon School of Engineering

Stars: ✭ 173 (-8.47%)

Mutual labels: jupyter-notebook, scikit-learn

Bert Sklearn

a sklearn wrapper for Google's BERT model

Stars: ✭ 182 (-3.7%)

Mutual labels: jupyter-notebook, scikit-learn

Clustergrammer

An interactive heatmap visualization built using D3.js

Stars: ✭ 188 (-0.53%)

Mutual labels: jupyter-notebook, clustering

Hep ml

Machine Learning for High Energy Physics.

Stars: ✭ 133 (-29.63%)

Mutual labels: jupyter-notebook, scikit-learn

Machine Learning Projects

This repository consists of all my Machine Learning Projects.

Stars: ✭ 135 (-28.57%)

Mutual labels: jupyter-notebook, clustering

View All Similar Projects ➔

PQk-means

Project | Paper | Tutorial

A 2D example using both k-means and PQk-means	Large-scale evaluation

PQk-means [Matsui, Ogaki, Yamasaki, and Aizawa, ACMMM 17] is a Python library for efficient clustering of large-scale data. By first compressing input vectors into short product-quantized (PQ) codes, PQk-means achieves fast and memory-efficient clustering, even for high-dimensional vectors. Similar to k-means, PQk-means repeats the assignment and update steps, both of which can be performed in the PQ-code domain.

For a comparison, we provide the ITQ encoding for the binary conversion and Binary k-means [Gong+, CVPR 15] for the clustering of binary codes.

The library is written in C++ for the main algorithm with wrappers for Python. All encoding/clustering codes are compatible with scikit-learn.

Summary of features

Approximation of k-means
Tens to hundreds of times faster than k-means
Tens to hundreds of times more memory efficient than k-means
Compatible with scikit-learn
Portable; one-line installation

Installation

Requisites

CMake
- brew install cmake for OS X
- sudo apt install cmake for Ubuntu
OpenMP (Optional)
- If openmp is installed, it will be automatically used to parallelize the algorithm for faster calculation.

Build & install

You can install the library from PyPI:

pip install pqkmeans

Or, if you would like to use the current master version, you can manually build and install the library by:

git clone --recursive https://github.com/DwangoMediaVillage/pqkmeans.git
cd pqkmeans
python setup.py install

Run samples

# with artificial data
python bin/run_experiment.py --dataset artificial --algorithm bkmeans pqkmeans --k 100
# with texmex dataset (http://corpus-texmex.irisa.fr/)
python bin/run_experiment.py --dataset siftsmall --algorithm bkmeans pqkmeans --k 100

Test

python setup.py test

Usage

For PQk-means

import pqkmeans
import numpy as np
X = np.random.random((100000, 128)) # 128 dimensional 100,000 samples

# Train a PQ encoder.
# Each vector is divided into 4 parts and each part is
# encoded with log256 = 8 bit, resulting in a 32 bit PQ code.
encoder = pqkmeans.encoder.PQEncoder(num_subdim=4, Ks=256)
encoder.fit(X[:1000])  # Use a subset of X for training

# Convert input vectors to 32-bit PQ codes, where each PQ code consists of four uint8.
# You can train the encoder and transform the input vectors to PQ codes preliminary.
X_pqcode = encoder.transform(X)

# Run clustering with k=5 clusters.
kmeans = pqkmeans.clustering.PQKMeans(encoder=encoder, k=5)
clustered = kmeans.fit_predict(X_pqcode)

# Then, clustered[0] is the id of assigned center for the first input PQ code (X_pqcode[0]).

Note that an instance of PQ-encoder (encoder) and an instance of clustering (kmeans) can be pickled and reused later.

import pickle

# An instance of PQ-encoder.
pickle.dump(encoder, open('encoder.pkl', 'wb'))
encoder_dumped = pickle.load(open('encoder.pkl', 'rb'))

# An instance of clustering. This can be reused as a vector quantizer later.
pickle.dump(kmeans, open('kmeans.pkl', 'wb'))
kmeans_dumped = pickle.load(open('kmeans.pkl', 'rb'))

For Bk-means

In almost the same manner as for PQk-means,

import pqkmeans
import numpy as np
X = np.random.random((100000, 128)) # 128 dimensional 100,000 samples

# Train an ITQ binary encoder
encoder = pqkmeans.encoder.ITQEncoder(num_bit=32)
encoder.fit(X[:1000])  # Use a subset of X for training

# Convert input vectors to binary codes
X_itq = encoder.transform(X)

# Run clustering
kmeans = pqkmeans.clustering.BKMeans(k=5, input_dim=32)
clustered = kmeans.fit_predict(X_itq)

Please see more examples on a tutorial

Note

This repository contains the re-implemented version of the PQk-means with the Python interface. There can be the difference between this repository and the pure c++ implementation used in the paper.
We tested this library with Python3, on OS X and Ubuntu 16.04.

Authors

Keisuke Ogaki designed the whole structure of the library, and implemented most of the Bk-means clustering
Yusuke Matsui implemented most of the PQk-means clustering

Reference

@inproceedings{pqkmeans,
    author = {Yusuke Matsui and Keisuke Ogaki and Toshihiko Yamasaki and Kiyoharu Aizawa},
    title = {PQk-means: Billion-scale Clustering for Product-quantized Codes},
    booktitle = {ACM International Conference on Multimedia (ACMMM)},
    year = {2017},
}

Todo

Evaluation script for billion-scale data
Nearest neighbor search with PQTable
Documentation

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 189

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗