All Projects → ritchie46 → lsh-rs

ritchie46 / lsh-rs

Licence: MIT license
Locality Sensitive Hashing in Rust with Python bindings

Programming Languages

rust
11053 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to lsh-rs

awesome-face-recognition
this repo include paper review, code in face recognition
Stars: ✭ 16 (-75%)
Mutual labels:  cosine-similarity
H2 ALSH
Accurate and Fast ALSH for Maximum Inner Product Search (KDD 2018)
Stars: ✭ 18 (-71.87%)
Mutual labels:  lsh
bns-short-text-similarity
📖 Use Bi-normal Separation to find document vectors which is used to compute similarity for shorter sentences.
Stars: ✭ 24 (-62.5%)
Mutual labels:  cosine-similarity
keras-knn
Code for the blog post Nearest Neighbors with Keras and CoreML
Stars: ✭ 25 (-60.94%)
Mutual labels:  cosine-similarity
image-ndd-lsh
Near-duplicate image detection using Locality Sensitive Hashing
Stars: ✭ 42 (-34.37%)
Mutual labels:  lsh
minhash-lsh
Minhash LSH in Golang
Stars: ✭ 20 (-68.75%)
Mutual labels:  lsh
set-sketch-paper
SetSketch: Filling the Gap between MinHash and HyperLogLog
Stars: ✭ 23 (-64.06%)
Mutual labels:  cosine-similarity
cisip-FIRe
Fast Image Retrieval (FIRe) is an open source project to promote image retrieval research. It implements most of the major binary hashing methods to date, together with different popular backbone networks and public datasets.
Stars: ✭ 40 (-37.5%)
Mutual labels:  cosine-similarity
product-quantization
🙃Implementation of vector quantization algorithms, codes for Norm-Explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search.
Stars: ✭ 40 (-37.5%)
Mutual labels:  lsh
lsh
Locality Sensitive Hashing for Go (Multi-probe LSH, LSH Forest, basic LSH)
Stars: ✭ 92 (+43.75%)
Mutual labels:  lsh
Java String Similarity
Implementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity ...
Stars: ✭ 2,403 (+3654.69%)
Mutual labels:  cosine-similarity
MoTIS
Mobile(iOS) Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP). Accepted at NAACL 2022.
Stars: ✭ 60 (-6.25%)
Mutual labels:  lsh
Datasketch
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble
Stars: ✭ 1,635 (+2454.69%)
Mutual labels:  lsh
Simple-Plagiarism-Checker
Web Application for checking the similarity between query and document using the concept of Cosine Similarity.
Stars: ✭ 47 (-26.56%)
Mutual labels:  cosine-similarity
Content-based-Recommender-System
It is a content based recommender system that uses tf-idf and cosine similarity for N Most SImilar Items from a dataset
Stars: ✭ 64 (+0%)
Mutual labels:  cosine-similarity
AI-for-Trading
📈This repo contains detailed notes and multiple projects implemented in Python related to AI and Finance. Follow the blog here: https://purvasingh.medium.com
Stars: ✭ 59 (-7.81%)
Mutual labels:  cosine-similarity
Dolphinn
High Dimensional Approximate Near(est) Neighbor
Stars: ✭ 32 (-50%)
Mutual labels:  lsh
Neural-Scam-Artist
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Stars: ✭ 18 (-71.87%)
Mutual labels:  lsh
lsh-semantic-similarity
Locality Sensitive Hashing for semantic similarity (Python 3.x)
Stars: ✭ 16 (-75%)
Mutual labels:  lsh
koolsla
Food recommendation tool with Machine learning.
Stars: ✭ 21 (-67.19%)
Mutual labels:  cosine-similarity

lsh-rs (Locality Sensitive Hashing)

rust docs Build Status

Locality sensitive hashing can help retrieving Approximate Nearest Neighbors in sub-linear time.

For more information on the subject see:

Implementations

  • Base LSH
    • Signed Random Projections (Cosine similarity)
    • L2 distance
    • MIPS (Dot products/ Maximum Inner Product Search)
    • MinHash (Jaccard Similarity)
  • Multi Probe LSH
    • Step wise probing
      • SRP (only bit shifts)
    • Query directed probing
      • L2
      • MIPS
  • Generic numeric types

Getting started

use lsh_rs::LshMem;
// 2 rows w/ dimension 3.
let p = &[vec![1., 1.5, 2.],
        vec![2., 1.1, -0.3]];

// Do one time expensive preprocessing.
let n_projections = 9;
let n_hash_tables = 30;
let dim = 10;
let dim = 3;
let mut lsh = LshMem::new(n_projections, n_hash_tables, dim)
    .srp()
    .unwrap();
lsh.store_vecs(p);

// Query in sublinear time.
let query = &[1.1, 1.2, 1.2];
lsh.query_bucket(query);

Documentation

Python

At the moment, the Python bindings are only compiled for Linux x86_64 systems.

$ pip install floky

from floky import SRP
import numpy as np

N = 10000
n = 100
dim = 10

# Generate some random data points
data_points = np.random.randn(N, dim)

# Do a one time (expensive) fit.
lsh = SRP(n_projections=19, n_hash_tables=10)
lsh.fit(data_points)

# Query approximated nearest neigbors in sub-linear time
query = np.random.randn(n, dim)
results = lsh.predict(query)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].