DatasketchMinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble
bagminhashBagMinHash - Minwise Hashing Algorithm for Weighted Sets
tika-similarityTika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
strutilGolang metrics for calculating string similarity and other string utility functions
stringdistanceA fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
Text-SimilarityA text similarity computation using minhashing and Jaccard distance on reuters dataset