All Projects → oertl → bagminhash

oertl / bagminhash

Licence: other
BagMinHash - Minwise Hashing Algorithm for Weighted Sets

Programming Languages

C++
36643 projects - #6 most used programming language
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to bagminhash

set-sketch-paper
SetSketch: Filling the Gap between MinHash and HyperLogLog
Stars: ✭ 23 (-4.17%)
Mutual labels:  minhash, locality-sensitive-hashing, jaccard-similarity, jaccard-similarity-estimation, minwise-hashing, minwise-hashing-algorithm
Datasketch
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble
Stars: ✭ 1,635 (+6712.5%)
Mutual labels:  minhash, locality-sensitive-hashing, jaccard-similarity
mkmh
Generate kmers/minimizers/hashes/MinHash signatures, including with multiple kmer sizes.
Stars: ✭ 21 (-12.5%)
Mutual labels:  minhash, locality-sensitive-hashing
lsh-semantic-similarity
Locality Sensitive Hashing for semantic similarity (Python 3.x)
Stars: ✭ 16 (-33.33%)
Mutual labels:  jaccard-similarity
intertext
Detect and visualize text reuse
Stars: ✭ 97 (+304.17%)
Mutual labels:  minhash
image-ndd-lsh
Near-duplicate image detection using Locality Sensitive Hashing
Stars: ✭ 42 (+75%)
Mutual labels:  locality-sensitive-hashing
spark-stringmetric
Spark functions to run popular phonetic and string matching algorithms
Stars: ✭ 51 (+112.5%)
Mutual labels:  jaccard-similarity
ExpressionMatrix2
Software for exploration of gene expression data from single-cell RNA sequencing.
Stars: ✭ 29 (+20.83%)
Mutual labels:  locality-sensitive-hashing
rkmh
Classify sequencing reads using MinHash.
Stars: ✭ 42 (+75%)
Mutual labels:  minhash
recommendation-retrieval
A tutorial on scalable retrieval of matrix factorization recommendations
Stars: ✭ 27 (+12.5%)
Mutual labels:  locality-sensitive-hashing
tika-similarity
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
Stars: ✭ 92 (+283.33%)
Mutual labels:  jaccard-similarity
tlsh
TLSH lib in Golang
Stars: ✭ 110 (+358.33%)
Mutual labels:  locality-sensitive-hashing
Text-Similarity
A text similarity computation using minhashing and Jaccard distance on reuters dataset
Stars: ✭ 15 (-37.5%)
Mutual labels:  jaccard-similarity
learning2hash.github.io
Website for "A survey of learning to hash for Computer Vision" https://learning2hash.github.io
Stars: ✭ 14 (-41.67%)
Mutual labels:  locality-sensitive-hashing
Neural-Scam-Artist
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Stars: ✭ 18 (-25%)
Mutual labels:  minhash
text-shingles
k-shingling for text to help compare similarity
Stars: ✭ 15 (-37.5%)
Mutual labels:  minhash
strutil
Golang metrics for calculating string similarity and other string utility functions
Stars: ✭ 114 (+375%)
Mutual labels:  jaccard-similarity
stringdistance
A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
Stars: ✭ 60 (+150%)
Mutual labels:  jaccard-similarity
catch-me-if-you-can
plagiarism detector
Stars: ✭ 16 (-33.33%)
Mutual labels:  minhash
minhash-lsh
Minhash LSH in Golang
Stars: ✭ 20 (-16.67%)
Mutual labels:  minhash
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].