All Projects → vkandy → simhash-js

vkandy / simhash-js

Licence: MIT License
Simhash implementation in Javascript

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to simhash-js

Hash Identifier
Software to identify the different types of hashes used to encrypt data and especially passwords
Stars: ✭ 198 (+465.71%)
Mutual labels:  hash-functions
xxHash-Swift
xxHash framework in Swift.
Stars: ✭ 22 (-37.14%)
Mutual labels:  hash-functions
node-metrohash
Node.js bindings for MetroHash
Stars: ✭ 25 (-28.57%)
Mutual labels:  hash-functions
Data.hashfunction
C# library to create a common interface to non-cryptographic hash functions.
Stars: ✭ 226 (+545.71%)
Mutual labels:  hash-functions
crypto-primitives
Interfaces and implementations of cryptographic primitives, along with R1CS constraints for them
Stars: ✭ 76 (+117.14%)
Mutual labels:  hash-functions
prvhash
PRVHASH - Pseudo-Random-Value Hash. Hash functions, PRNG with unlimited period, randomness extractor. (Codename Gradilac/Градилак)
Stars: ✭ 194 (+454.29%)
Mutual labels:  hash-functions
Bitcoin Cryptography Library
Nayuki's implementation of cryptographic primitives used in Bitcoin.
Stars: ✭ 81 (+131.43%)
Mutual labels:  hash-functions
tika-similarity
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
Stars: ✭ 92 (+162.86%)
Mutual labels:  similarity-score
triehash
Generator for order-preserving minimal perfect hash functions in C
Stars: ✭ 36 (+2.86%)
Mutual labels:  hash-functions
prune-horst
Signature scheme submitted to NIST's Post-Quantum Cryptography Project
Stars: ✭ 23 (-34.29%)
Mutual labels:  hash-functions
LibSWIFFT
LibSWIFFT - A fast C/C++ library for the SWIFFT secure homomorphic hash function
Stars: ✭ 23 (-34.29%)
Mutual labels:  hash-functions
Financial-News-Analysis
招商银行FinTech-复赛-财经新闻分析
Stars: ✭ 17 (-51.43%)
Mutual labels:  simhash
XXHash
XXHash - Extremely fast hash algorithm,impl for csharp,can process 11.8 GB/s on modern cpu. impl with net core 2.0 and .net
Stars: ✭ 24 (-31.43%)
Mutual labels:  hash-functions
Python Hashes
Interesting (non-cryptographic) hashes implemented in pure Python.
Stars: ✭ 213 (+508.57%)
Mutual labels:  hash-functions
gravity-sphincs
Signature scheme submitted to NIST's Post-Quantum Cryptography Project
Stars: ✭ 67 (+91.43%)
Mutual labels:  hash-functions
Mum Hash
Hashing functions and PRNGs based on them
Stars: ✭ 117 (+234.29%)
Mutual labels:  hash-functions
nthash
ntHash implementation in Rust
Stars: ✭ 26 (-25.71%)
Mutual labels:  hash-functions
eacirc
Automatic problem solver based on circuit-like representation and genetic programming
Stars: ✭ 13 (-62.86%)
Mutual labels:  hash-functions
rouziclib
This is my personal library of code that is common to my different projects (Photosounder, SplineEQ, Spiral and others)
Stars: ✭ 38 (+8.57%)
Mutual labels:  hash-functions
SHA.jl
A performant, 100% native-julia SHA1, SHA2, and SHA3 implementation
Stars: ✭ 35 (+0%)
Mutual labels:  hash-functions

simhash-js

A Javascript implementation of Charikar's hash for identification of similar documents.

What is Simhash?

Consider two documents A and B that differ in just a single byte.

Hash functions such as SHA-2 or MD5 will hash contents of these two documents into two completely different and unrelated hash values. The Hamming distance between md5(A) and md5(B) would be large. In fact, that is one of the goals of cryptographic hash functions such as SHA-2 or MD5 - to minimize collisions in hash values they generate.

By contrast, Simhash will hash contents of A and B to similar hash values. The Hamming distance between simhash(A) and simhash(B) would be small.

Usage

var sjs = require('simhash-js');
var simhash = new sjs.SimHash();
var x = simhash.hash("This is a test of the Emergency Blogcast System");
var y = simhash.hash("This is a second test of the Emergency Blogcast System");

var s = sjs.Comparator.similarity(x, y); 

To Do

  • Implement an efficient priority queue
  • Accept a list of stop words to be removed from input prior to calculating hash

References

  • Charikar: Similarity Estimation Techniques from Rounding Algorithms, in Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, ACM Press, 2002
  • Manku, Jain, Sarma: Detecting Near-Duplicates for Web Crawling. in Proceedings of the 16th international conference on World Wide Web, ACM Press, 2007

Contributors

Sincere thanks to:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].