DatasketchMinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble
Stars: ✭ 1,635 (+7685.71%)
set-sketch-paperSetSketch: Filling the Gap between MinHash and HyperLogLog
Stars: ✭ 23 (+9.52%)
bagminhashBagMinHash - Minwise Hashing Algorithm for Weighted Sets
Stars: ✭ 24 (+14.29%)
rkmhClassify sequencing reads using MinHash.
Stars: ✭ 42 (+100%)
tlshTLSH lib in Golang
Stars: ✭ 110 (+423.81%)
stringMLSTFast k-mer based tool for multi locus sequence typing (MLST)
Stars: ✭ 33 (+57.14%)
STingUltrafast sequence typing and gene detection from NGS raw reads
Stars: ✭ 15 (-28.57%)
HyperMinHash-javaUnion, intersection, and set cardinality in loglog space
Stars: ✭ 48 (+128.57%)
intertextDetect and visualize text reuse
Stars: ✭ 97 (+361.9%)
Neural-Scam-ArtistWeb Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
Stars: ✭ 18 (-14.29%)
ExpressionMatrix2Software for exploration of gene expression data from single-cell RNA sequencing.
Stars: ✭ 29 (+38.1%)
text-shinglesk-shingling for text to help compare similarity
Stars: ✭ 15 (-28.57%)
Sampled-MinHashingA method to mine beyond-pairwise relationships using Min-Hashing for large-scale pattern discovery
Stars: ✭ 24 (+14.29%)
AnnoyApproximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
Stars: ✭ 9,262 (+44004.76%)
learning2hash.github.ioWebsite for "A survey of learning to hash for Computer Vision" https://learning2hash.github.io
Stars: ✭ 14 (-33.33%)
image-ndd-lshNear-duplicate image detection using Locality Sensitive Hashing
Stars: ✭ 42 (+100%)
mccortexDe novo genome assembly and multisample variant calling
Stars: ✭ 105 (+400%)
tiptoftPredict plasmids from uncorrected long read data
Stars: ✭ 27 (+28.57%)
unikmerToolkit for k-mer with taxonomic information
Stars: ✭ 46 (+119.05%)
freqgen🎯 Generate DNA sequences with specified amino acid, codon, and k-mer frequencies
Stars: ✭ 16 (-23.81%)