All Projects → tcrouch → edits.cr

tcrouch / edits.cr

Licence: MIT license
Edit distance algorithms inc. Jaro, Damerau-Levenshtein, and Optimal Alignment

Programming Languages

crystal
512 projects

Projects that are alternatives of or similar to edits.cr

eddie
No description or website provided.
Stars: ✭ 18 (+12.5%)
Mutual labels:  edit-distance, levenshtein, jaro-winkler, damerau-levenshtein, jaro
strutil
Golang metrics for calculating string similarity and other string utility functions
Stars: ✭ 114 (+612.5%)
Mutual labels:  levenshtein, jaro-winkler, jaro
Symspell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Stars: ✭ 1,976 (+12250%)
Mutual labels:  edit-distance, levenshtein, damerau-levenshtein
Java String Similarity
Implementation of various string similarity and distance algorithms: Levenshtein, Jaro-winkler, n-Gram, Q-Gram, Jaccard index, Longest Common Subsequence edit distance, cosine similarity ...
Stars: ✭ 2,403 (+14918.75%)
Mutual labels:  jaro-winkler, similarity-measures, damerau-levenshtein
LinSpell
Fast approximate strings search & spelling correction
Stars: ✭ 52 (+225%)
Mutual labels:  edit-distance, levenshtein, damerau-levenshtein
stringdistance
A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
Stars: ✭ 60 (+275%)
Mutual labels:  levenshtein, jaro-winkler, jaro
stringosim
String similarity functions, String distance's, Jaccard, Levenshtein, Hamming, Jaro-Winkler, Q-grams, N-grams, LCS - Longest Common Subsequence, Cosine similarity...
Stars: ✭ 47 (+193.75%)
Mutual labels:  levenshtein, jaro-winkler
simetric
String similarity metrics for Elixir
Stars: ✭ 59 (+268.75%)
Mutual labels:  levenshtein, jaro-winkler
Jellyfish
🎐 a python library for doing approximate and phonetic matching of strings.
Stars: ✭ 1,571 (+9718.75%)
Mutual labels:  levenshtein, jaro-winkler
Textdistance
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.
Stars: ✭ 2,575 (+15993.75%)
Mutual labels:  levenshtein, damerau-levenshtein
Quickenshtein
Making the quickest and most memory efficient implementation of Levenshtein Distance with SIMD and Threading support
Stars: ✭ 204 (+1175%)
Mutual labels:  edit-distance, levenshtein
ceja
PySpark phonetic and string matching algorithms
Stars: ✭ 24 (+50%)
Mutual labels:  jaro-winkler, damerau-levenshtein
levenshtein.c
Levenshtein algorithm in C
Stars: ✭ 77 (+381.25%)
Mutual labels:  edit-distance, levenshtein
edit-distance-papers
A curated list of papers dedicated to edit-distance as objective function
Stars: ✭ 49 (+206.25%)
Mutual labels:  edit-distance, levenshtein
aframe-bmfont-text-component
A-Frame component for rendering bitmap fonts.
Stars: ✭ 62 (+287.5%)
Mutual labels:  text
spellchecker-wasm
SpellcheckerWasm is an extrememly fast spellchecker for WebAssembly based on SymSpell
Stars: ✭ 46 (+187.5%)
Mutual labels:  levenshtein
textics
📉 JavaScript Text Statistics that counts lines, words, chars, and spaces.
Stars: ✭ 36 (+125%)
Mutual labels:  text
Cancerify
Turn an innocent text into torturous hell
Stars: ✭ 44 (+175%)
Mutual labels:  text
RAE
基于tensorflow搭建的神经网络recursive autuencode,用于实现句子聚类
Stars: ✭ 12 (-25%)
Mutual labels:  text
RubyTextMeshPro
Unity Text Mesh Proでルビ(フリガナ)のタグを追加しました.
Stars: ✭ 61 (+281.25%)
Mutual labels:  text

edits

GitHub Workflow Status (branch) Documentation

A collection of edit distance algorithms in Crystal.

Includes Levenshtein, Restricted Edit (Optimal Alignment) and Damerau-Levenshtein distances, and Jaro and Jaro-Winkler similarity.

Installation

Add this to your application's shard.yml:

dependencies:
  edits:
    github: tcrouch/edits.cr

Usage

require "edits"

Levenshtein variants

Calculate the edit distance between two sequences with variants of the Levenshtein distance algorithm.

Edits::Levenshtein.distance "raked", "bakers"
# => 3
Edits::RestrictedEdit.distance "iota", "atom"
# => 3
Edits::DamerauLevenshtein.distance "acer", "earn"
# => 3
  • Levenshtein edit distance, counting insertion, deletion and substitution.
  • Restricted Damerau-Levenshtein edit distance (aka Optimal Alignment), counting insertion, deletion, substitution and transposition (adjacent symbols swapped). Restricted by the condition that no substring is edited more than once.
  • Damerau-Levenshtein edit distance, counting insertion, deletion, substitution and transposition (adjacent symbols swapped).
Levenshtein Restricted Damerau-Levenshtein Damerau-Levenshtein
"raked" vs. "bakers" 3 3 3
"iota" vs. "atom" 4 3 3
"acer" vs. "earn" 4 4 3

Levenshtein and Restricted Edit distances accept an optional maximum bound.

Edits::Levenshtein.distance "fghijk", "abcde", 3
# => 3

The convenience method most_similar searches for the best match to a given sequence from a collection. It is similar to using min_by, but leverages a maximum bound.

Edits::RestrictedEdit.most_similar "atom", ["iota", "tome", "mown", "tame"]
# => "tome"

Jaro & Jaro-Winkler

Calculate the Jaro and Jaro-Winkler similarity/distance of two sequences.

Edits::Jaro.similarity "information", "informant"
# => 0.90235690235690236
Edits::Jaro.distance "information", "informant"
# => 0.097643097643097643

Edits::JaroWinkler.similarity "information", "informant"
# => 0.94141414141414137
Edits::JaroWinkler.distance "information", "informant"
# => 0.05858585858585863

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

Contributors

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].