All Projects → ekzhu → Datasketch

ekzhu / Datasketch

Licence: mit
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Datasketch

set-sketch-paper
SetSketch: Filling the Gap between MinHash and HyperLogLog
Stars: ✭ 23 (-98.59%)
Mutual labels:  minhash, locality-sensitive-hashing, jaccard-similarity, hyperloglog
lshensemble
LSH index for approximate set containment search
Stars: ✭ 48 (-97.06%)
Mutual labels:  lsh, lsh-forest, lsh-ensemble
bagminhash
BagMinHash - Minwise Hashing Algorithm for Weighted Sets
Stars: ✭ 24 (-98.53%)
Mutual labels:  minhash, locality-sensitive-hashing, jaccard-similarity
lsh-semantic-similarity
Locality Sensitive Hashing for semantic similarity (Python 3.x)
Stars: ✭ 16 (-99.02%)
Mutual labels:  lsh, jaccard-similarity
lsh
Locality Sensitive Hashing for Go (Multi-probe LSH, LSH Forest, basic LSH)
Stars: ✭ 92 (-94.37%)
Mutual labels:  lsh, lsh-forest
HyperMinHash-java
Union, intersection, and set cardinality in loglog space
Stars: ✭ 48 (-97.06%)
Mutual labels:  minhash, hyperloglog
Neural-Scam-Artist
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Stars: ✭ 18 (-98.9%)
Mutual labels:  lsh, minhash
mkmh
Generate kmers/minimizers/hashes/MinHash signatures, including with multiple kmer sizes.
Stars: ✭ 21 (-98.72%)
Mutual labels:  minhash, locality-sensitive-hashing
hyperloglog-sketch-estimation-paper
Paper about the estimation of cardinalities from HyperLogLog sketches
Stars: ✭ 48 (-97.06%)
Mutual labels:  hyperloglog, data-sketches
image-ndd-lsh
Near-duplicate image detection using Locality Sensitive Hashing
Stars: ✭ 42 (-97.43%)
Mutual labels:  lsh, locality-sensitive-hashing
minhash-lsh
Minhash LSH in Golang
Stars: ✭ 20 (-98.78%)
Mutual labels:  lsh, minhash
Search
PHP search-systems made possible
Stars: ✭ 101 (-93.82%)
Mutual labels:  search
Rxsuggestions
⌨️ RxJava library to fetch suggestions for keywords using Google Suggest API
Stars: ✭ 93 (-94.31%)
Mutual labels:  search
Txtai.js
AI-powered search engine for JavaScript
Stars: ✭ 93 (-94.31%)
Mutual labels:  search
Algoliasearch Client Android
Algolia Search API Client for Android
Stars: ✭ 92 (-94.37%)
Mutual labels:  search
Ds2i
A library of inverted index data structures
Stars: ✭ 104 (-93.64%)
Mutual labels:  search
Node Sonic Channel
🦉 Sonic Channel integration for Node. Used in pair with Sonic, the fast, lightweight and schema-less search backend.
Stars: ✭ 101 (-93.82%)
Mutual labels:  search
Monster
The Art of Template MetaProgramming (TMP) in Modern C++♦️
Stars: ✭ 90 (-94.5%)
Mutual labels:  search
Searchwp Live Ajax Search
[WordPress Plugin] Enhance your search forms with live search (utilizes SearchWP if installed)
Stars: ✭ 91 (-94.43%)
Mutual labels:  search
Npmarket
🛒 More efficient search for node packages.
Stars: ✭ 91 (-94.43%)
Mutual labels:  search

datasketch: Big Data Looks Small

datasketch gives you probabilistic data structures that can process and search very large amount of data super fast, with little loss of accuracy.

This package contains the following data sketches:

Data Sketch Usage
MinHash estimate Jaccard similarity and cardinality
Weighted MinHash estimate weighted Jaccard similarity
HyperLogLog estimate cardinality
HyperLogLog++ estimate cardinality

The following indexes for data sketches are provided to support sub-linear query time:

Index For Data Sketch Supported Query Type
MinHash LSH MinHash, Weighted MinHash Jaccard Threshold
MinHash LSH Forest MinHash, Weighted MinHash Jaccard Top-K
MinHash LSH Ensemble MinHash Containment Threshold

datasketch must be used with Python 2.7 or above, NumPy 1.11 or above, and Scipy.

Note that MinHash LSH and MinHash LSH Ensemble also support Redis and Cassandra storage layer (see MinHash LSH at Scale).

Install

To install datasketch using pip:

pip install datasketch

This will also install NumPy as dependency.

To install with Redis dependency:

pip install datasketch[redis]

To install with Cassandra dependency:

pip install datasketch[cassandra]
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].