A Java library implementing practical nearest neighbour search algorithm for multidimensional vectors that operates in sublinear time. It implements Locality-sensitive Hashing (LSH) and multi index hashing for hamming space.

Stars: ✭ 179 (+272.92%)

Mutual labels: nearest-neighbor-search

Knn Matting

Source Code for KNN Matting, CVPR 2012 / TPAMI 2013. MATLAB code ready to run. Simple and robust implementation under 40 lines.

Stars: ✭ 130 (+170.83%)

Mutual labels: nearest-neighbor-search

Neighbor

Nearest neighbor search for Rails and Postgres

Stars: ✭ 114 (+137.5%)

Mutual labels: nearest-neighbor-search

Annoy

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Stars: ✭ 9,262 (+19195.83%)

Mutual labels: nearest-neighbor-search

Vald

Vald. A Highly Scalable Distributed Vector Search Engine

Stars: ✭ 158 (+229.17%)

Mutual labels: nearest-neighbor-search

Awesome Cbir Papers

📝Awesome and classical image retrieval papers

Stars: ✭ 1,114 (+2220.83%)

Mutual labels: nearest-neighbor-search

Mrpt

Fast and lightweight header-only C++ library (with Python bindings) for approximate nearest neighbor search

Stars: ✭ 210 (+337.5%)

Mutual labels: nearest-neighbor-search

Deep Mihash

Code for papers "Hashing with Mutual Information" (TPAMI 2019) and "Hashing with Binary Matrix Pursuit" (ECCV 2018)

Stars: ✭ 13 (-72.92%)

Mutual labels: nearest-neighbor-search

Nanopq

Pure python implementation of product quantization for nearest neighbor search

Stars: ✭ 145 (+202.08%)

Mutual labels: nearest-neighbor-search

scikit-hubness

A Python package for hubness analysis and high-dimensional data mining

Stars: ✭ 41 (-14.58%)

Mutual labels: nearest-neighbor-search

Rii

Fast and memory-efficient ANN with a subset-search functionality

Stars: ✭ 96 (+100%)

Mutual labels: nearest-neighbor-search

Faiss tips

Some useful tips for faiss

Stars: ✭ 170 (+254.17%)

Mutual labels: nearest-neighbor-search

View All Similar Projects ➔

PQTable

Product Quantization Table (PQTable) is a fast nearest neighbor search method for product-quantized codes via hash-tables. PQTable achieves one of the fastest search performances on a single CPU to date (2017) with significantly efficient memory usage (0.059 ms per query over 10^9 data points with just 5.5 GB memory consumption).

References:

Project page: http://yusukematsui.me/project/pqtable/pqtable.html
Y. Matsui, T. Yamasaki, and K. Aizawa, "PQTable: Nonexhaustive Fast Search for Product-Quantized Codes Using Hash Tables", IEEE Transactions on Multimedia 2018. [paper] (extended version of ICCV paper)
Y. Matsui, T. Yamasaki, and K. Aizawa, "PQTable: Fast Exact Asymmetric Distance Neighbor Search for Product Quantization using Hash Tables", ICCV 2015. [paper][supplementary]

Building

Requisites:

C++11
CMake
OpenCV 3.X
TCMalloc (optional, but strongly recommended). On ubuntu, you can install TCMalloc by sudo apt-get install libgoogle-perftools-dev. Then link it via the "-ltcmalloc" linker flag. This makes the search ~2x faster.

Build as usual with CMake:

$ git clone https://github.com/matsui528/pqtable.git
$ cd pqtable
$ mkdir build && cd build && cmake ..
$ make

Testing

Demo using the siftsmall dataset

You can try a small demo using the siftsmall data. This does not take time. First, cd to the top directory of this project, then download the siftsmall vectors on data/.

$ bash scripts/download_siftsmall.sh

Go to the bin directory, and run the demo

$ cd build/bin
$ ./demo_siftsmall

The program trains a product-quantizer, encodes vectors, builds PQTable, and runs the search. The final results will be somthing like:

...
93th query: nearest_id=6950, dist=49826.5
94th query: nearest_id=4053, dist=59649.2
95th query: nearest_id=7271, dist=76137.4
96th query: nearest_id=7512, dist=71723.1
97th query: nearest_id=7927, dist=68105
98th query: nearest_id=6289, dist=15346.3
99th query: nearest_id=8082, dist=54443.8
0.113859 [msec/query]

Note that these results (the nearest_ids and the distances) might be slightly different from yours because the training step includes a random process.

Demo using the sift1b dataset

You can try a large-scale demo using the sift1b data. This demo reproduces the results of Table 3 in the paper. Let's cd to the top directory of this project, and download the sift1b vectors on data/.

$ bash scripts/download_sift1b.sh

This would take several hours. Since the data is large, ~250 GB disk space is required. Next, go to the bin dir, and run the code for training a product quantizer.

$ cd build/bin
$ ./demo_sift1b_train

This also takes several hours. If you do not care about the quality of the product quantizer and just try the demo, please change top_n in the demo_sift1b_train.cpp to the small number such as 10000, and compile it again. After finishing the training, the trained file codewords.txt is created on the build/bin. Next, encode input vectors into PQ codes.

$ ./demo_sift1b_encode

This would take one or two hours. You can get codes.bin, which is a set of PQ codes of base vectors in the sift1b data. It takes 4 GB (if M=4). Then, build a PQTable using codes.bin.

$ ./demo_sift1b_build_table

A directory pqtable will be created, which contains the codewords and the table itself. Using these files, you can run the search.

$ ./demo_sift1b_search

You will have a result like this:

top_k: 1
0.0579275 [msec/query]

The is the runtime per query for the sift1b data. You will also have score.txt, which contains the searched IDs. You can check a recall rate using an evaluation script.

$ cd ../..
$ python scripts/eval.py build/bin/score.txt data/gnd/idx_1000M.ivecs

This scripts evaluates the search result using the groundtruth annotation. The reuslt will be:

Recall@1: 0.002

Note that you can see any top-k results by passing the argment in the search function, e.g.,

$ ./demo_sift1b_search 100   # This creates top-100 results on score.txt
$ cd ../..
$ python scripts/eval.py build/bin/score.txt data/gnd/idx_1000M.ivecs

Then you will see Recall@1, 2, 5, ..., 100.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

matsui528 / pqtable

Programming Languages

Labels

Projects that are alternatives of or similar to pqtable

PQTable

Building

Testing

Demo using the siftsmall dataset

Demo using the sift1b dataset