All Projects → KristofferC → Nearestneighbors.jl

KristofferC / Nearestneighbors.jl

Licence: other
High performance nearest neighbor data structures and algorithms for Julia.

Programming Languages

julia
2034 projects

Projects that are alternatives of or similar to Nearestneighbors.jl

Pyrsistent
Persistent/Immutable/Functional data structures for Python
Stars: ✭ 1,621 (+664.62%)
Mutual labels:  datastructures
Golang Set
A simple set type for the Go language. Trusted by Docker, 1Password, Ethereum and Hashicorp.
Stars: ✭ 2,168 (+922.64%)
Mutual labels:  datastructures
Interview Questions
List of all the Interview questions practiced from online resources and books
Stars: ✭ 187 (-11.79%)
Mutual labels:  datastructures
Containers
This library provides various containers. Each container has utility functions to manipulate the data it holds. This is an abstraction as to not have to manually manage and reallocate memory.
Stars: ✭ 125 (-41.04%)
Mutual labels:  datastructures
Umbrella
"A collection of functional programming libraries that can be composed together. Unlike a framework, thi.ng is a suite of instruments and you (the user) must be the composer of. Geared towards versatility, not any specific type of music." — @loganpowell via Twitter
Stars: ✭ 2,186 (+931.13%)
Mutual labels:  datastructures
Algorithm
The repository algorithms implemented on the Go
Stars: ✭ 163 (-23.11%)
Mutual labels:  datastructures
Ngenerics
Data structures and algorithms for .NET
Stars: ✭ 115 (-45.75%)
Mutual labels:  datastructures
C Macro Collections
Easy to use, header only, macro generated, generic and type-safe Data Structures in C
Stars: ✭ 192 (-9.43%)
Mutual labels:  datastructures
Javainterview
最全的Java技术知识点,以及Java源码分析。为开源贡献自己的一份力。
Stars: ✭ 154 (-27.36%)
Mutual labels:  datastructures
Cosmos
Hacktoberfest 2021 | World's largest Contributor driven code dataset | Algorithms that run our universe | Your personal library of every algorithm and data structure code that you will ever encounter |
Stars: ✭ 12,936 (+6001.89%)
Mutual labels:  datastructures
Iter
Go implementation of C++ STL iterators and algorithms.
Stars: ✭ 132 (-37.74%)
Mutual labels:  datastructures
Competitive Programming
VastoLorde95's solutions to 2000+ competitive programming problems from various online judges
Stars: ✭ 147 (-30.66%)
Mutual labels:  datastructures
Coding Ninjas Competitive
This will have all the solutions to the competitive programming course's problems by Coding ninjas. Star the repo if you like it.
Stars: ✭ 168 (-20.75%)
Mutual labels:  datastructures
Android Interview Questions
A repository containing interview questions on DS, Java & Android based on my experiences.
Stars: ✭ 125 (-41.04%)
Mutual labels:  datastructures
Algocasts Js
DSA in JavaScript ✅
Stars: ✭ 189 (-10.85%)
Mutual labels:  datastructures
Aardvark.base
Aardvark is an open-source platform for visual computing, real-time graphics and visualization. This repository is the basis for most platform libraries and provides basic functionality such as data-structures, math and much more.
Stars: ✭ 117 (-44.81%)
Mutual labels:  datastructures
Python data structures and algorithms
Python 中文数据结构和算法教程
Stars: ✭ 2,194 (+934.91%)
Mutual labels:  datastructures
Competitive Programming Resources
This repository consists of data helpful for ACM ICPC programming contest, in general competitive programming.
Stars: ✭ 199 (-6.13%)
Mutual labels:  datastructures
Data Structures And Algorithms
Data Structures and Algorithms implementation in Go
Stars: ✭ 2,272 (+971.7%)
Mutual labels:  datastructures
Matlab Octave
This repository contains algorithms written in MATLAB/Octave. Developing algorithms in the MATLAB environment empowers you to explore and refine ideas, and enables you test and verify your algorithm.
Stars: ✭ 180 (-15.09%)
Mutual labels:  datastructures

NearestNeighbors.jl

Build Status codecov.io DOI

NearestNeighbors.jl is a package written in Julia to perform high performance nearest neighbor searches in arbitrarily high dimensions.


Creating a tree

The abstract tree type that the trees in this package are a subtype of is called a NNTree. A NNTree is created by the function:

NNTree(data, metric; leafsize, reorder)
  • data: The data, i.e., the points to build up the tree from. It can either be
    • a matrix of size nd × np with the points to insert in the tree where nd is the dimensionality of the points and np is the number of points
    • a vector of vectors with fixed dimensionality, nd, which must be part of the type. Specifically, data should be a Vector{V}, where V is itself a subtype of an AbstractVector and such that eltype(V) and length(V) are defined. (For example, with 3D points, V = SVector{3, Float64} works because eltype(V) = Float64 and length(V) = 3 are defined in V.)
  • metric: The metric to use, defaults to Euclidean. This is one of the Metric types defined in the Distances.jl packages. It is possible to define your own metrics by simply creating new types that are subtypes of Metric.
  • leafsize (keyword argument): Determines at what number of points to stop splitting the tree further. There is a trade-off between traversing the tree and having to evaluate the metric function for increasing number of points.
  • reorder (keyword argument): While building the tree this will put points close in distance close in memory since this helps with cache locality. In this case, a copy of the original data will be made so that the original data is left unmodified. This can have a significant impact on performance and is by default set to true.

There are currently three types of trees available:

  • BruteTree: Not actually a tree. It linearly searches all points in a brute force fashion. Works with any Metric.
  • KDTree: In a kd tree the points are recursively split into groups using hyper-planes. Therefore a KDTree only works with axis aligned metrics which are: Euclidean, Chebyshev, Minkowski and Cityblock.
  • BallTree: Points are recursively split into groups bounded by hyper-spheres. Works with any Metric.

All trees in NearestNeighbors.jl are static which means that points can not be added or removed from an already created tree.

Here are a few examples of creating trees:

using NearestNeighbors
data = rand(3, 10^4)

# Create trees
kdtree = KDTree(data; leafsize = 10)
balltree = BallTree(data, Minkowski(3.5); reorder = false)
brutetree = BruteTree(data)

k Nearest Neighbor (kNN) searches

A kNN search finds the k nearest neighbors to given point(s). This is done with the method:

knn(tree, points, k, sortres = false, skip = always_false) -> idxs, dists
  • tree: The tree instance
  • points: A vector or matrix of points to find the k nearest neighbors to. If points is a vector of numbers then this represents a single point, if points is a matrix then the k nearest neighbors to each point (column) will be computed. points can also be a vector of other vectors where each element in the outer vector is considered a point.
  • sortres (optional): Determines if the results should be sorted before returning. In this case the results will be sorted in order of increasing distance to the point.
  • skip (optional): A predicate to determine if a given point should be skipped, for example if iterating over points and a point has already been visited.

It is generally better for performance to query once with a large number of points than to query multiple times with one point per query.

As a convenience, if you only want the closest nearest neighbor, you can call nn instead for a cleaner result:

nn(tree, points, skip = always_false) -> idxs, dists

Some examples:

using NearestNeighbors
data = rand(3, 10^4)
k = 3
point = rand(3)

kdtree = KDTree(data)
idxs, dists = knn(kdtree, point, k, true)

idxs
# 3-element Array{Int64,1}:
#  4683
#  6119
#  3278

dists
# 3-element Array{Float64,1}:
#  0.039032201026256215
#  0.04134193711411951
#  0.042974090446474184

# Multiple points
points = rand(3, 4);

idxs, dists = knn(kdtree, points, k, true);

idxs
# 4-element Array{Array{Int64,1},1}:
#  [3330, 4072, 2696]
#  [1825, 7799, 8358]
#  [3497, 2169, 3737]
#  [1845, 9796, 2908]

# dists
# 4-element Array{Array{Float64,1},1}:
#  [0.0298932, 0.0327349, 0.0365979]
#  [0.0348751, 0.0498355, 0.0506802]
#  [0.0318547, 0.037291, 0.0421208]
#  [0.03321, 0.0360935, 0.0411951]

# Static vectors
v = @SVector[0.5, 0.3, 0.2];

idxs, dists = knn(kdtree, v, k, true);

idxs
# 3-element Array{Int64,1}:
#   842
#  3075
#  3046

dists
# 3-element Array{Float64,1}:
#  0.04178677766255837
#  0.04556078331418939
#  0.049967238112417205

Range searches

A range search finds all neighbors within the range r of given point(s). This is done with the method:

inrange(tree, points, r, sortres = false) -> idxs

Note that for performance reasons the distances are not returned. The arguments to inrange are the same as for knn except that sortres just sorts the returned index vector.

An example:

using NearestNeighbors
data = rand(3, 10^4)
r = 0.05
point = rand(3)

balltree = BallTree(data)
idxs = inrange(balltree, point, r, true)

# 4-element Array{Int64,1}:
#  317
#  983
# 4577
# 8675

Using on-disk data sets

By default, the trees store a copy of the data provided during construction. This is problematic in case you want to work on data sets which are larger than available memory, thus wanting to mmap the data or want to store the data in a different place, outside the tree.

DataFreeTree can be used to strip a constructed tree of its data field and re-link it with that data at a later stage. An example of using a large on-disk data set looks like this:

using Mmap
ndim = 2; ndata = 10_000_000_000
data = Mmap.mmap(datafilename, Matrix{Float32}, (ndim, ndata))
data[:] = rand(Float32, ndim, ndata)  # create example data
dftree = DataFreeTree(KDTree, data)

dftree now only stores the indexing data structures. It can be passed around, saved and reloaded independently of data.

To perform look-ups, dftree is re-linked to the underlying data:

tree = injectdata(dftree, data)  # yields a KDTree
knn(tree, data[:,1], 3)  # perform operations as usual

Author

Kristoffer Carlsson - @KristofferC - [email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].