All Projects → bagminhash → Similar Projects or Alternatives

23 Open source projects that are alternatives of or similar to bagminhash

set-sketch-paper
SetSketch: Filling the Gap between MinHash and HyperLogLog
Stars: ✭ 23 (-4.17%)
Datasketch
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble
Stars: ✭ 1,635 (+6712.5%)
mkmh
Generate kmers/minimizers/hashes/MinHash signatures, including with multiple kmer sizes.
Stars: ✭ 21 (-12.5%)
catch-me-if-you-can
plagiarism detector
Stars: ✭ 16 (-33.33%)
Mutual labels:  minhash
spark-stringmetric
Spark functions to run popular phonetic and string matching algorithms
Stars: ✭ 51 (+112.5%)
Mutual labels:  jaccard-similarity
minhash-lsh
Minhash LSH in Golang
Stars: ✭ 20 (-16.67%)
Mutual labels:  minhash
text-shingles
k-shingling for text to help compare similarity
Stars: ✭ 15 (-37.5%)
Mutual labels:  minhash
rkmh
Classify sequencing reads using MinHash.
Stars: ✭ 42 (+75%)
Mutual labels:  minhash
tika-similarity
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
Stars: ✭ 92 (+283.33%)
Mutual labels:  jaccard-similarity
Sampled-MinHashing
A method to mine beyond-pairwise relationships using Min-Hashing for large-scale pattern discovery
Stars: ✭ 24 (+0%)
Mutual labels:  minhash
learning2hash.github.io
Website for "A survey of learning to hash for Computer Vision" https://learning2hash.github.io
Stars: ✭ 14 (-41.67%)
image-ndd-lsh
Near-duplicate image detection using Locality Sensitive Hashing
Stars: ✭ 42 (+75%)
strutil
Golang metrics for calculating string similarity and other string utility functions
Stars: ✭ 114 (+375%)
Mutual labels:  jaccard-similarity
recommendation-retrieval
A tutorial on scalable retrieval of matrix factorization recommendations
Stars: ✭ 27 (+12.5%)
tlsh
TLSH lib in Golang
Stars: ✭ 110 (+358.33%)
stringdistance
A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
Stars: ✭ 60 (+150%)
Mutual labels:  jaccard-similarity
HyperMinHash-java
Union, intersection, and set cardinality in loglog space
Stars: ✭ 48 (+100%)
Mutual labels:  minhash
Text-Similarity
A text similarity computation using minhashing and Jaccard distance on reuters dataset
Stars: ✭ 15 (-37.5%)
Mutual labels:  jaccard-similarity
intertext
Detect and visualize text reuse
Stars: ✭ 97 (+304.17%)
Mutual labels:  minhash
Neural-Scam-Artist
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Stars: ✭ 18 (-25%)
Mutual labels:  minhash
lsh-semantic-similarity
Locality Sensitive Hashing for semantic similarity (Python 3.x)
Stars: ✭ 16 (-33.33%)
Mutual labels:  jaccard-similarity
ExpressionMatrix2
Software for exploration of gene expression data from single-cell RNA sequencing.
Stars: ✭ 29 (+20.83%)
Annoy
Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
Stars: ✭ 9,262 (+38491.67%)
1-23 of 23 similar projects