All Projects → benhoyt → Pybktree

benhoyt / Pybktree

Licence: mit
Python BK-tree data structure to allow fast querying of "close" matches

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pybktree

Data Structures
Go datastructures.
Stars: ✭ 336 (+187.18%)
Mutual labels:  data-structures, tree
Algorithms and data structures
180+ Algorithm & Data Structure Problems using C++
Stars: ✭ 4,667 (+3888.89%)
Mutual labels:  data-structures, tree
Java Algorithms Implementation
Algorithms and Data Structures implemented in Java
Stars: ✭ 3,927 (+3256.41%)
Mutual labels:  data-structures, tree
Leetcode
LeetCode Solutions: A Record of My Problem Solving Journey.( leetcode题解,记录自己的leetcode解题之路。)
Stars: ✭ 45,650 (+38917.09%)
Mutual labels:  data-structures, tree
Ki
Go language (golang) full strength tree structures (ki in Japanese)
Stars: ✭ 61 (-47.86%)
Mutual labels:  data-structures, tree
Data Structures Algorithms
My implementation of 85+ popular data structures and algorithms and interview questions in Python 3 and C++
Stars: ✭ 273 (+133.33%)
Mutual labels:  data-structures, tree
C Sharp Algorithms
📚 📈 Plug-and-play class-library project of standard Data Structures and Algorithms in C#
Stars: ✭ 4,684 (+3903.42%)
Mutual labels:  data-structures, tree
Interview Questions
List of all the Interview questions practiced from online resources and books
Stars: ✭ 187 (+59.83%)
Mutual labels:  data-structures, tree
Geeksforgeeks Dsa 2
This repository contains all the assignments and practice questions solved during the Data Structures and Algorithms course in C++ taught by the Geeks For Geeks team.
Stars: ✭ 53 (-54.7%)
Mutual labels:  data-structures, tree
Algorithms
Study cases for Algorithms and Data Structures.
Stars: ✭ 32 (-72.65%)
Mutual labels:  data-structures, tree
Javascript Datastructures Algorithms
📚 collection of JavaScript and TypeScript data structures and algorithms for education purposes. Source code bundle of JavaScript algorithms and data structures book
Stars: ✭ 3,221 (+2652.99%)
Mutual labels:  data-structures, tree
Splay Tree
Fast splay-tree data structure
Stars: ✭ 80 (-31.62%)
Mutual labels:  data-structures, tree
Computer Science In Javascript
Computer science reimplemented in JavaScript
Stars: ✭ 2,590 (+2113.68%)
Mutual labels:  data-structures, tree
Rubytree
A General Purpose Tree Data Structure for Ruby
Stars: ✭ 300 (+156.41%)
Mutual labels:  data-structures, tree
Libdict
C library of key-value data structures.
Stars: ✭ 234 (+100%)
Mutual labels:  data-structures, tree
Algodeck
An Open-Source Collection of 200+ Algorithmic Flash Cards to Help you Preparing your Algorithm & Data Structure Interview 💯
Stars: ✭ 4,441 (+3695.73%)
Mutual labels:  data-structures, tree
Merkle Tree
Merkle Trees and Merkle Inclusion Proofs
Stars: ✭ 130 (+11.11%)
Mutual labels:  data-structures, tree
Data Structures
Common data structures and algorithms implemented in JavaScript
Stars: ✭ 139 (+18.8%)
Mutual labels:  data-structures, tree
Dsa.js Data Structures Algorithms Javascript
🥞Data Structures and Algorithms explained and implemented in JavaScript + eBook
Stars: ✭ 6,251 (+5242.74%)
Mutual labels:  data-structures, tree
Buckets Js
A complete, fully tested and documented data structure library written in pure JavaScript.
Stars: ✭ 1,128 (+864.1%)
Mutual labels:  data-structures, tree

pybktree

pybktree is a generic, pure Python implementation of a BK-tree_ data structure, which allows fast querying of "close" matches (for example, matches with small hamming distance or Levenshtein distance). This module is based on the algorithm by Nick Johnson in his blog article on BK-trees_.

The library is on the Python Package Index (PyPI)_ and works on both Python 3 and Python 2.7. To install it, fire up a command prompt, activate your virtual environment if you're using one, and type:

::

pip install pybktree

Example usage:

.. code:: python

>>> tree = pybktree.BKTree(pybktree.hamming_distance, [0, 4, 5, 14])
>>> tree.add(15)              # add element 15
>>> sorted(tree)              # BKTree instances are iterable
[0, 4, 5, 14, 15]
>>> sorted(tree.find(13, 1))  # find elements at most 1 bit away from element 13
[(1, 5), (1, 15)]

If you need to track the ID, key, or filename of the original item, use a tuple or namedtuple. Repeating the above example with an Item namedtuple:

.. code:: python

>>> import collections
>>> Item = collections.namedtuple('Item', 'bits id')
>>> def item_distance(x, y):
...     return pybktree.hamming_distance(x.bits, y.bits)
>>> tree = pybktree.BKTree(item_distance, [Item(0, 'a'), Item(4, 'b'),
                                           Item(5, 'c'), Item(14, 'd')])
>>> tree.add(Item(15, 'e'))
>>> sorted(tree.find(Item(13, 'x'), 1))
[(1, Item(bits=5, id='c')), (1, Item(bits=15, id='e'))]

For large trees and fairly small N when calling find(), using a BKTree is much faster than doing a linear search. This is especially good when you're de-duping a few hundred thousand photos -- with a linear search that would become a very slow, O(N) for every photo, so O(N²) in total.

A lookup in a BKTree is much faster than linear for small distance thresholds -- though it goes up to O(N) far large distance thresholds, so won't be valuable in those cases. See Maximilian Knespel's detailed analysis_.

Read the code in pybktree.py_ for more details – it's pretty small!

Other BK-tree modules I found on GitHub while writing this one:

  • ahupp/bktree_: this one is pretty good, but it's not on PyPI, and it's recursive
  • ryanfox/bktree_: this one is hard to customize, search() doesn't return distances, it's slower, and was buggy (though I think he fixed it recently)

pybktree was written by Ben Hoyt_ and is licensed with a permissive MIT license (see LICENSE.txt_).

.. _BK-tree: https://en.wikipedia.org/wiki/BK-tree .. _blog article on BK-trees: http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees .. _on the Python Package Index (PyPI): https://pypi.python.org/pypi/pybktree .. _pybktree.py: https://github.com/benhoyt/pybktree/blob/master/pybktree.py .. _ahupp/bktree: https://github.com/ahupp/bktree .. _ryanfox/bktree: https://github.com/ryanfox/bktree .. _Ben Hoyt: http://benhoyt.com/ .. _LICENSE.txt: https://github.com/benhoyt/pybktree/blob/master/LICENSE.txt .. _detailed analysis: https://github.com/benhoyt/pybktree/issues/5

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].