All Projects → lmcinnes → Pynndescent

lmcinnes / Pynndescent

Licence: bsd-2-clause
A Python nearest neighbor descent for approximate nearest neighbors

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pynndescent

scikit-hubness
A Python package for hubness analysis and high-dimensional data mining
Stars: ✭ 41 (-89.12%)
Mutual labels:  nearest-neighbor-search
annoy.rb
annoy-rb provides Ruby bindings for the Annoy (Approximate Nearest Neighbors Oh Yeah).
Stars: ✭ 23 (-93.9%)
Mutual labels:  nearest-neighbor-search
docarray
The data structure for unstructured data
Stars: ✭ 561 (+48.81%)
Mutual labels:  nearest-neighbor-search
wordvector be
Web服务:使用腾讯 800 万词向量模型和 spotify annoy 引擎得到相似关键词
Stars: ✭ 92 (-75.6%)
Mutual labels:  nearest-neighbor-search
kdtree-rs
K-dimensional tree in Rust for fast geospatial indexing and lookup
Stars: ✭ 137 (-63.66%)
Mutual labels:  nearest-neighbor-search
knn-cpp
A header-only C++ library for k nearest neighbor search with Eigen3.
Stars: ✭ 25 (-93.37%)
Mutual labels:  nearest-neighbor-search
Mrpt
Fast and lightweight header-only C++ library (with Python bindings) for approximate nearest neighbor search
Stars: ✭ 210 (-44.3%)
Mutual labels:  nearest-neighbor-search
lbvh
an implementation of parallel linear BVH (LBVH) on GPU
Stars: ✭ 67 (-82.23%)
Mutual labels:  nearest-neighbor-search
Rayuela.jl
Code for my PhD thesis. Library of quantization-based methods for fast similarity search in high dimensions. Presented at ECCV 18.
Stars: ✭ 54 (-85.68%)
Mutual labels:  nearest-neighbor-search
graphgrove
A framework for building (and incrementally growing) graph-based data structures used in hierarchical or DAG-structured clustering and nearest neighbor search
Stars: ✭ 29 (-92.31%)
Mutual labels:  nearest-neighbor-search
pynanoflann
Unofficial python wrapper to the nanoflann k-d tree
Stars: ✭ 24 (-93.63%)
Mutual labels:  nearest-neighbor-search
lshensemble
LSH index for approximate set containment search
Stars: ✭ 48 (-87.27%)
Mutual labels:  nearest-neighbor-search
ann-benchmarks
Benchmarking approximate nearest neighbors. Note: This is an archived version from our SISAP 2017 paper, see below.
Stars: ✭ 30 (-92.04%)
Mutual labels:  nearest-neighbor-search
pqtable
Fast search algorithm for product-quantized codes via hash-tables
Stars: ✭ 48 (-87.27%)
Mutual labels:  nearest-neighbor-search
adventures-with-ann
All the code for a series of Medium articles on Approximate Nearest Neighbors
Stars: ✭ 40 (-89.39%)
Mutual labels:  nearest-neighbor-search
quick-adc
Quick ADC
Stars: ✭ 20 (-94.69%)
Mutual labels:  nearest-neighbor-search
Dolphinn
High Dimensional Approximate Near(est) Neighbor
Stars: ✭ 32 (-91.51%)
Mutual labels:  nearest-neighbor-search
Mlpack
mlpack: a scalable C++ machine learning library --
Stars: ✭ 3,859 (+923.61%)
Mutual labels:  nearest-neighbor-search
pgvector
Open-source vector similarity search for Postgres
Stars: ✭ 482 (+27.85%)
Mutual labels:  nearest-neighbor-search
kdtree
A k-d tree implementation in Go.
Stars: ✭ 98 (-74.01%)
Mutual labels:  nearest-neighbor-search

.. image:: https://travis-ci.org/lmcinnes/pynndescent.svg :target: https://travis-ci.org/lmcinnes/pynndescent :alt: Travis Build Status .. image:: https://ci.appveyor.com/api/projects/status/github/lmcinnes/pynndescent?branch=master&svg=true :target: https://ci.appveyor.com/project/lmcinnes/pynndescent :alt: AppVeyor Build Status .. image:: https://img.shields.io/lgtm/alerts/g/lmcinnes/pynndescent.svg :target: https://lgtm.com/projects/g/lmcinnes/pynndescent/alerts :alt: LGTM Alerts .. image:: https://img.shields.io/lgtm/grade/python/g/lmcinnes/pynndescent.svg :target: https://lgtm.com/projects/g/lmcinnes/pynndescent/context:python :alt: LGTM Grade .. image:: https://readthedocs.org/projects/pynndescent/badge/?version=latest :target: https://pynndescent.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status

=========== PyNNDescent

PyNNDescent is a Python nearest neighbor descent for approximate nearest neighbors. It provides a python implementation of Nearest Neighbor Descent for k-neighbor-graph construction and approximate nearest neighbor search, as per the paper:

Dong, Wei, Charikar Moses, and Kai Li. "Efficient k-nearest neighbor graph construction for generic similarity measures." Proceedings of the 20th international conference on World wide web. ACM, 2011.

This library supplements that approach with the use of random projection trees for initialisation. This can be particularly useful for the metrics that are amenable to such approaches (euclidean, minkowski, angular, cosine, etc.). Graph diversification is also performed, pruning the longest edges of any triangles in the graph.

Currently this library targets relatively high accuracy (80%-100% accuracy rate) approximate nearest neighbor searches.


Why use PyNNDescent?

PyNNDescent provides fast approximate nearest neighbor queries. The ann-benchmarks <https://github.com/erikbern/ann-benchmarks>_ system puts it solidly in the mix of top performing ANN libraries:

GIST-960 Euclidean

.. image:: https://camo.githubusercontent.com/142a48c992ba689b8ea9e62636b5281a97322f74/68747470733a2f2f7261772e6769746875622e636f6d2f6572696b6265726e2f616e6e2d62656e63686d61726b732f6d61737465722f726573756c74732f676973742d3936302d6575636c696465616e2e706e67 :alt: ANN benchmark performance for GIST 960 dataset

NYTimes-256 Angular

.. image:: https://camo.githubusercontent.com/6120a35a9db64104eaa1c95cb4803c2fc4cd2679/68747470733a2f2f7261772e6769746875622e636f6d2f6572696b6265726e2f616e6e2d62656e63686d61726b732f6d61737465722f726573756c74732f6e7974696d65732d3235362d616e67756c61722e706e67 :alt: ANN benchmark performance for NYTimes 256 dataset

While PyNNDescent is among fastest ANN library, it is also both easy to install (pip and conda installable) with no platform or compilation issues, and is very flexible, supporting a wide variety of distance metrics by default:

Minkowski style metrics

  • euclidean
  • manhattan
  • chebyshev
  • minkowski

Miscellaneous spatial metrics

  • canberra
  • braycurtis
  • haversine

Normalized spatial metrics

  • mahalanobis
  • wminkowski
  • seuclidean

Angular and correlation metrics

  • cosine
  • dot
  • correlation
  • spearmanr
  • tsss
  • true_angular

Probability metrics

  • hellinger
  • wasserstein

Metrics for binary data

  • hamming
  • jaccard
  • dice
  • russelrao
  • kulsinski
  • rogerstanimoto
  • sokalmichener
  • sokalsneath
  • yule

and also custom user defined distance metrics while still retaining performance.

PyNNDescent also integrates well with Scikit-learn, including providing support for the KNeighborTransformer as a drop in replacement for algorithms that make use of nearest neighbor computations.


How to use PyNNDescent

PyNNDescent aims to have a very simple interface. It is similar to (but more limited than) KDTrees and BallTrees in sklearn. In practice there are only two operations -- index construction, and querying an index for nearest neighbors.

To build a new search index on some training data data you can do something like

.. code:: python

from pynndescent import NNDescent
index = NNDescent(data)

You can then use the index for searching (and can pickle it to disk if you wish). To search a pynndescent index for the 15 nearest neighbors of a test data set query_data you can do something like

.. code:: python

index.query(query_data, k=15)

and that is pretty much all there is to it. You can find more details in the documentation <https://pynndescent.readthedocs.org>_.


Installing

PyNNDescent is designed to be easy to install being a pure python module with relatively light requirements:

  • numpy
  • scipy
  • scikit-learn >= 0.22
  • numba >= 0.51

all of which should be pip or conda installable. The easiest way to install should be via conda:

.. code:: bash

conda install -c conda-forge pynndescent

or via pip:

.. code:: bash

pip install pynndescent

To manually install this package:

.. code:: bash

wget https://github.com/lmcinnes/pynndescent/archive/master.zip
unzip master.zip
rm master.zip
cd pynndescent-master
python setup.py install

Help and Support

This project is still young. The documentation is still growing. In the meantime please open an issue <https://github.com/lmcinnes/pynndescent/issues/new>_ and I will try to provide any help and guidance that I can. Please also check the docstrings on the code, which provide some descriptions of the parameters.


License

The pynndescent package is 2-clause BSD licensed. Enjoy.


Contributing

Contributions are more than welcome! There are lots of opportunities for potential projects, so please get in touch if you would like to help out. Everything from code to notebooks to examples and documentation are all equally valuable so please don't feel you can't contribute. To contribute please fork the project <https://github.com/lmcinnes/pynndescent/issues#fork-destination-box>_ make your changes and submit a pull request. We will do our best to work through any issues with you and get your code merged into the main branch.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].