set-sketch-paperSetSketch: Filling the Gap between MinHash and HyperLogLog
Stars: ✭ 23 (-4.17%)
Mutual labels: minhash, locality-sensitive-hashing, jaccard-similarity, jaccard-similarity-estimation, minwise-hashing, minwise-hashing-algorithm
DatasketchMinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble
Stars: ✭ 1,635 (+6712.5%)
Mutual labels: minhash, locality-sensitive-hashing, jaccard-similarity
mkmhGenerate kmers/minimizers/hashes/MinHash signatures, including with multiple kmer sizes.
Stars: ✭ 21 (-12.5%)
Mutual labels: minhash, locality-sensitive-hashing
lsh-semantic-similarityLocality Sensitive Hashing for semantic similarity (Python 3.x)
Stars: ✭ 16 (-33.33%)
Mutual labels: jaccard-similarity
intertextDetect and visualize text reuse
Stars: ✭ 97 (+304.17%)
Mutual labels: minhash
image-ndd-lshNear-duplicate image detection using Locality Sensitive Hashing
Stars: ✭ 42 (+75%)
Mutual labels: locality-sensitive-hashing
spark-stringmetricSpark functions to run popular phonetic and string matching algorithms
Stars: ✭ 51 (+112.5%)
Mutual labels: jaccard-similarity
ExpressionMatrix2Software for exploration of gene expression data from single-cell RNA sequencing.
Stars: ✭ 29 (+20.83%)
Mutual labels: locality-sensitive-hashing
rkmhClassify sequencing reads using MinHash.
Stars: ✭ 42 (+75%)
Mutual labels: minhash
recommendation-retrievalA tutorial on scalable retrieval of matrix factorization recommendations
Stars: ✭ 27 (+12.5%)
Mutual labels: locality-sensitive-hashing
tika-similarityTika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
Stars: ✭ 92 (+283.33%)
Mutual labels: jaccard-similarity
tlshTLSH lib in Golang
Stars: ✭ 110 (+358.33%)
Mutual labels: locality-sensitive-hashing
Text-SimilarityA text similarity computation using minhashing and Jaccard distance on reuters dataset
Stars: ✭ 15 (-37.5%)
Mutual labels: jaccard-similarity
learning2hash.github.ioWebsite for "A survey of learning to hash for Computer Vision" https://learning2hash.github.io
Stars: ✭ 14 (-41.67%)
Mutual labels: locality-sensitive-hashing
Neural-Scam-ArtistWeb Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Stars: ✭ 18 (-25%)
Mutual labels: minhash
text-shinglesk-shingling for text to help compare similarity
Stars: ✭ 15 (-37.5%)
Mutual labels: minhash
strutilGolang metrics for calculating string similarity and other string utility functions
Stars: ✭ 114 (+375%)
Mutual labels: jaccard-similarity
stringdistanceA fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
Stars: ✭ 60 (+150%)
Mutual labels: jaccard-similarity
minhash-lshMinhash LSH in Golang
Stars: ✭ 20 (-16.67%)
Mutual labels: minhash