All Projects → una-dinosauria → Rayuela.jl

una-dinosauria / Rayuela.jl

Licence: MIT license
Code for my PhD thesis. Library of quantization-based methods for fast similarity search in high dimensions. Presented at ECCV 18.

Programming Languages

julia
2034 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Rayuela.jl

Rii
Fast and memory-efficient ANN with a subset-search functionality
Stars: ✭ 96 (+77.78%)
Mutual labels:  nearest-neighbor-search
Tarsoslsh
A Java library implementing practical nearest neighbour search algorithm for multidimensional vectors that operates in sublinear time. It implements Locality-sensitive Hashing (LSH) and multi index hashing for hamming space.
Stars: ✭ 179 (+231.48%)
Mutual labels:  nearest-neighbor-search
wordvector be
Web服务:使用腾讯 800 万词向量模型和 spotify annoy 引擎得到相似关键词
Stars: ✭ 92 (+70.37%)
Mutual labels:  nearest-neighbor-search
Knn Matting
Source Code for KNN Matting, CVPR 2012 / TPAMI 2013. MATLAB code ready to run. Simple and robust implementation under 40 lines.
Stars: ✭ 130 (+140.74%)
Mutual labels:  nearest-neighbor-search
Vald
Vald. A Highly Scalable Distributed Vector Search Engine
Stars: ✭ 158 (+192.59%)
Mutual labels:  nearest-neighbor-search
Mrpt
Fast and lightweight header-only C++ library (with Python bindings) for approximate nearest neighbor search
Stars: ✭ 210 (+288.89%)
Mutual labels:  nearest-neighbor-search
Annoy
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
Stars: ✭ 9,262 (+17051.85%)
Mutual labels:  nearest-neighbor-search
lshensemble
LSH index for approximate set containment search
Stars: ✭ 48 (-11.11%)
Mutual labels:  nearest-neighbor-search
Faiss tips
Some useful tips for faiss
Stars: ✭ 170 (+214.81%)
Mutual labels:  nearest-neighbor-search
pqtable
Fast search algorithm for product-quantized codes via hash-tables
Stars: ✭ 48 (-11.11%)
Mutual labels:  nearest-neighbor-search
Elastiknn
Elasticsearch plugin for nearest neighbor search. Store vectors and run similarity search using exact and approximate algorithms.
Stars: ✭ 139 (+157.41%)
Mutual labels:  nearest-neighbor-search
Pgann
Fast Approximate Nearest Neighbor (ANN) searches with a PostgreSQL database.
Stars: ✭ 156 (+188.89%)
Mutual labels:  nearest-neighbor-search
quick-adc
Quick ADC
Stars: ✭ 20 (-62.96%)
Mutual labels:  nearest-neighbor-search
Neighbor
Nearest neighbor search for Rails and Postgres
Stars: ✭ 114 (+111.11%)
Mutual labels:  nearest-neighbor-search
pynanoflann
Unofficial python wrapper to the nanoflann k-d tree
Stars: ✭ 24 (-55.56%)
Mutual labels:  nearest-neighbor-search
Gann
gann(go-approximate-nearest-neighbor) is a library for Approximate Nearest Neighbor Search written in Go
Stars: ✭ 75 (+38.89%)
Mutual labels:  nearest-neighbor-search
Scanns
A scalable nearest neighbor search library in Apache Spark
Stars: ✭ 190 (+251.85%)
Mutual labels:  nearest-neighbor-search
kdtree-rs
K-dimensional tree in Rust for fast geospatial indexing and lookup
Stars: ✭ 137 (+153.7%)
Mutual labels:  nearest-neighbor-search
awesome-vector-search
Collections of vector search related libraries, service and research papers
Stars: ✭ 460 (+751.85%)
Mutual labels:  nearest-neighbor-search
scikit-hubness
A Python package for hubness analysis and high-dimensional data mining
Stars: ✭ 41 (-24.07%)
Mutual labels:  nearest-neighbor-search

Rayuela.jl

This is the code for the paper

Julieta Martinez, Shobhit Zakhmi, Holger H. Hoos, James J. Little. LSQ++: Lower running time and higher recall in multi-codebook quantization. In ECCV 2018. [CVF].

Rayuela.jl is a package that implements non-orthogonal multi-codebook quantization methods (MCQ). These methods are useful for fast search of high-dimensional (d=100s-1000s) dense vectors. Rayuela is written in Julia.

This is not a production-ready library -- if you are looking for something like that, maybe look at faiss. Do note that the methods implemented by faiss and Rayuela.jl are almost entire orthogonal, and that the libraries are distributed under different licenses as of May 27 2019, FAISS is MIT licensed.

Rayuela implements the main contributions that I made to this problem during my PhD at UBC, as well as multiple baselines for MCQ. The package is my attempt to make my research reproducible and accessible, and to make it easier for other people, specially newcomers, to contribute to this field, where lack of reproducibility is a major barrier of entry IMO.

I originally intended to incorporate these contributions on top of faiss (see faiss/#185), but I soon realized that doing so would considerably delay the completion of my PhD. Julia is also more accessible (albeit a bit less performant) to quickly try and test new research ideas. In the future, savvier C++ programmers may port the most useful methods to faiss.

Authors

The code in this package was written by Julieta Martinez, with help from Joris Clement and Shobhit Zakhmi.

Requirements

This package is written in Julia 1.0, with some extension in C++ and CUDA. You also need a CUDA-ready GPU. We have tested this code on an Nvidia Titan Xp GPU.

Installing

Before all else, make sure that you have the g++ compiler available from the command line, and the nvcc compiler availible at path /usr/local/cuda/bin/nvcc.

Then, open julia and type ] to enter Package mode:

julia>
(v1.0) pkg>

Now you can clone our repo:

(v1.0) pkg> develop https://github.com/una-dinosauria/Rayuela.jl.git

This should put our code under ~/.julia/dev/Rayuela.

Due to an open bug with the package manager, you have to manually pull the latest changes:

cd ~/.julia/dev/Rayuela
git pull

Demo and data

You may explore the library with SIFT1M, a classical retrieval dataset:

mkdir data
cd data
wget ftp://ftp.irisa.fr/local/texmex/corpus/sift.tar.gz
tar -xvzf sift.tar.gz
rm sift.tar.gz
cd ..

Also make a directory for the results

mkdir -p results/sift1m

Finally, run the demo:

julia> include("~/.julia/dev/Rayuela/demos/demos_train_query_base.jl")

For query/base/protocol (example by default runs on SIFT1M), or

julia> include("~/.julia/dev/Rayuela/demos/demos_query_base.jl")

For query/base protocol (example by default runs on LabelMe22K)

This will showcase PQ, OPQ, RVQ, ERVQ, ChainQ and LSQ/LSQ++ (SR-C and SR-D).

The rest of the datasets used in our ECCV'18 publication can be found on gdrive.

Roadmap

Implemented

TODO

Things I'd like to get around implementing / porting / wrapping some day (PRs are welcome!)

TODO (no code, low priority)

I would like to implement these methods. Some of them report really good results but, to the best of my knowledge, the authors have never released code. Also, my time is not infinite so ¯\_(ツ)_/¯

  • Sparse Composite Quantization -- CVPR'15
  • Tree Quantization with Gurobi optimization -- CVPR'15
  • Joint K-means quantization -- ICPR'16 (pay-walled)
  • Pyramid encoding quantization -- EUSIPCO'17
  • Arborescence coding -- ICCV'17

Citing

If you find this code useful, consider citing our work:

Julieta Martinez, Shobhit Zakhmi, Holger H. Hoos, James J. Little. "LSQ++:
Lower running time and higher recall in multi-codebook quantization",
ECCV 2018.

or

Julieta Martinez. "Algorithms for Large-Scale Multi-Codebook Quantization".
PhD thesis, 2018. (Coming soon)

As well as the corresponding paper of the method that you are using (see above).

Notes

1: The original implementation of Tree Quantization requires a mixed-integer programming solver such as Gurobi for updating the codebooks. We implement a special version of TQ that always create a chain (not a general tree); thus encoding can be done with the Viterbi algorithm, and codebook update is simpler and faster. This method should have been a baseline in the TQ paper IMO.

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].