All Projects → thaumant → eddie

thaumant / eddie

Licence: other
No description or website provided.

Programming Languages

rust
11053 projects

Projects that are alternatives of or similar to eddie

edits.cr
Edit distance algorithms inc. Jaro, Damerau-Levenshtein, and Optimal Alignment
Stars: ✭ 16 (-11.11%)
Mutual labels:  edit-distance, levenshtein, jaro-winkler, damerau-levenshtein, jaro
strutil
Golang metrics for calculating string similarity and other string utility functions
Stars: ✭ 114 (+533.33%)
Mutual labels:  levenshtein, jaro-winkler, string-similarity, jaro
stringdistance
A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
Stars: ✭ 60 (+233.33%)
Mutual labels:  levenshtein, jaro-winkler, string-similarity, jaro
Symspell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Stars: ✭ 1,976 (+10877.78%)
Mutual labels:  edit-distance, levenshtein, damerau-levenshtein
Jellyfish
🎐 a python library for doing approximate and phonetic matching of strings.
Stars: ✭ 1,571 (+8627.78%)
Mutual labels:  levenshtein, jaro-winkler, hamming
LinSpell
Fast approximate strings search & spelling correction
Stars: ✭ 52 (+188.89%)
Mutual labels:  edit-distance, levenshtein, damerau-levenshtein
UMICollapse
Accelerating the deduplication and collapsing process for reads with Unique Molecular Identifiers (UMI). Heavily optimized for scalability and orders of magnitude faster than a previous tool.
Stars: ✭ 31 (+72.22%)
Mutual labels:  string-similarity, hamming
ceja
PySpark phonetic and string matching algorithms
Stars: ✭ 24 (+33.33%)
Mutual labels:  jaro-winkler, damerau-levenshtein
Java String Similarity
Implementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity ...
Stars: ✭ 2,403 (+13250%)
Mutual labels:  jaro-winkler, damerau-levenshtein
stringosim
String similarity functions, String distance's, Jaccard, Levenshtein, Hamming, Jaro-Winkler, Q-grams, N-grams, LCS - Longest Common Subsequence, Cosine similarity...
Stars: ✭ 47 (+161.11%)
Mutual labels:  levenshtein, jaro-winkler
edit-distance-papers
A curated list of papers dedicated to edit-distance as objective function
Stars: ✭ 49 (+172.22%)
Mutual labels:  edit-distance, levenshtein
Textdistance
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.
Stars: ✭ 2,575 (+14205.56%)
Mutual labels:  levenshtein, damerau-levenshtein
Quickenshtein
Making the quickest and most memory efficient implementation of Levenshtein Distance with SIMD and Threading support
Stars: ✭ 204 (+1033.33%)
Mutual labels:  edit-distance, levenshtein
Levenshtein
The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity
Stars: ✭ 38 (+111.11%)
Mutual labels:  levenshtein, string-similarity
simetric
String similarity metrics for Elixir
Stars: ✭ 59 (+227.78%)
Mutual labels:  levenshtein, jaro-winkler
levenshtein.c
Levenshtein algorithm in C
Stars: ✭ 77 (+327.78%)
Mutual labels:  edit-distance, levenshtein
dice-coefficient
Sørensen–Dice coefficient
Stars: ✭ 37 (+105.56%)
Mutual labels:  edit-distance
Levenshtein
Go implementation to calculate Levenshtein Distance.
Stars: ✭ 125 (+594.44%)
Mutual labels:  levenshtein
levenshtein finder
Similar string search in Levenshtein distance
Stars: ✭ 19 (+5.56%)
Mutual labels:  levenshtein
Dictomaton
Finite state dictionaries in Java
Stars: ✭ 124 (+588.89%)
Mutual labels:  levenshtein

Eddie

Fast and well-tested implementations of edit distance/string similarity metrics:

  • Levenshtein,
  • Damerau-Levenshtein,
  • Hamming,
  • Jaro,
  • Jaro-Winkler.

Documentation

See API reference.

Installation

Add this to your Cargo.toml:

[dependencies]
eddie = "0.4"

Basic usage

Levenshtein:

use eddie::Levenshtein;
let lev = Levenshtein::new();
let dist = lev.distance("martha", "marhta");
assert_eq!(dist, 2);

Damerau-Levenshtein:

use eddie::DamerauLevenshtein;
let damlev = DamerauLevenshtein::new();
let dist = damlev.distance("martha", "marhta");
assert_eq!(dist, 1);

Hamming:

use eddie::Hamming;
let hamming = Hamming::new();
let dist = hamming.distance("martha", "marhta");
assert_eq!(dist, Some(2));

Jaro:

use eddie::Jaro;
let jaro = Jaro::new();
let sim = jaro.similarity("martha", "marhta");
assert!((sim - 0.94).abs() < 0.01);

Jaro-Winkler:

use eddie::JaroWinkler;
let jarwin = JaroWinkler::new();
let sim = jarwin.similarity("martha", "marhta");
assert!((sim - 0.96).abs() < 0.01);

Strings vs slices

The crate exposes two modules containing two sets of implementations:

  • eddie::str for comparing UTF-8 encoded &str and &String values. Implementations are reexported in the root module.
  • eddie::slice for comparing generic slices &[T]. Implementations in this module are significantly faster than those from eddie::str, but will produce incorrect results for UTF-8 and other variable width character encodings.

Usage example:

use eddie::slice::Levenshtein;

let lev = Levenshtein::new();
let dist = lev.distance(&[1, 2, 3], &[1, 3]);
assert_eq!(dist, 1);

Complementary metrics

The main metric methods are complemented with inverted and/or relative versions. The naming convention across the crate is following:

  • distance — a number of edits required to transform one string to the other;
  • rel_dist — a distance between two strings, relative to string length (inversion of similarity);
  • similarity — similarity between two strings (inversion of relative distance).

Performance

At the moment Eddie has the fastest implementations among the alternatives from crates.io that have Unicode support.

For example, when comparing common english words you can expect at least 1.5-2x speedup for any given algorithm except Hamming.

For the detailed measurements tables see Benchmarks page.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].