All Projects → agext → Levenshtein

agext / Levenshtein

Licence: apache-2.0
Levenshtein distance and similarity metrics with customizable edit costs and Winkler-like bonus for common prefix.

Programming Languages

go
31211 projects - #10 most used programming language

Projects that are alternatives of or similar to Levenshtein

spellchecker-wasm
SpellcheckerWasm is an extrememly fast spellchecker for WebAssembly based on SymSpell
Stars: ✭ 46 (-19.3%)
Mutual labels:  levenshtein
stringosim
String similarity functions, String distance's, Jaccard, Levenshtein, Hamming, Jaro-Winkler, Q-grams, N-grams, LCS - Longest Common Subsequence, Cosine similarity...
Stars: ✭ 47 (-17.54%)
Mutual labels:  levenshtein
Closestmatch
Golang library for fuzzy matching within a set of strings 📃
Stars: ✭ 353 (+519.3%)
Mutual labels:  levenshtein
java-sdk
一些常用的java sdk和工具类(日期工具类,分布式锁,redis缓存,二叉树,反射工具类,线程池,对称/非对称/分段加解密,json序列化,http工具,雪花算法,字符串相似度,集合操作工具,xml解析,重试Retry工具类,Jvm监控等)
Stars: ✭ 26 (-54.39%)
Mutual labels:  levenshtein
Levenshtein
The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity
Stars: ✭ 38 (-33.33%)
Mutual labels:  levenshtein
RepostCheckerBot
Bot for checking reposts on reddit
Stars: ✭ 36 (-36.84%)
Mutual labels:  levenshtein
edit-distance-papers
A curated list of papers dedicated to edit-distance as objective function
Stars: ✭ 49 (-14.04%)
Mutual labels:  levenshtein
Rapidfuzz
Rapid fuzzy string matching in Python using the Levenshtein Distance
Stars: ✭ 809 (+1319.3%)
Mutual labels:  levenshtein
simetric
String similarity metrics for Elixir
Stars: ✭ 59 (+3.51%)
Mutual labels:  levenshtein
Js Levenshtein
The most efficient JS implementation calculating the Levenshtein distance, i.e. the difference between two strings.
Stars: ✭ 269 (+371.93%)
Mutual labels:  levenshtein
edits.cr
Edit distance algorithms inc. Jaro, Damerau-Levenshtein, and Optimal Alignment
Stars: ✭ 16 (-71.93%)
Mutual labels:  levenshtein
strutil
Golang metrics for calculating string similarity and other string utility functions
Stars: ✭ 114 (+100%)
Mutual labels:  levenshtein
similar-english-words
Give me a word and I’ll give you an array of words that differ by a single letter.
Stars: ✭ 25 (-56.14%)
Mutual labels:  levenshtein
LinSpell
Fast approximate strings search & spelling correction
Stars: ✭ 52 (-8.77%)
Mutual labels:  levenshtein
Symspellpy
Python port of SymSpell
Stars: ✭ 420 (+636.84%)
Mutual labels:  levenshtein
stringdistance
A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
Stars: ✭ 60 (+5.26%)
Mutual labels:  levenshtein
hubot-suggest
Suggest hubot commands when not found
Stars: ✭ 29 (-49.12%)
Mutual labels:  levenshtein
Node Damerau Levenshtein
Damerau - Levenstein distance function for node
Stars: ✭ 27 (-52.63%)
Mutual labels:  levenshtein
Stringmetric
🎯 String metrics and phonetic algorithms for Scala (e.g. Dice/Sorensen, Hamming, Jaccard, Jaro, Jaro-Winkler, Levenshtein, Metaphone, N-Gram, NYSIIS, Overlap, Ratcliff/Obershelp, Refined NYSIIS, Refined Soundex, Soundex, Weighted Levenshtein).
Stars: ✭ 481 (+743.86%)
Mutual labels:  levenshtein
Go Edlib
Golang string comparison and edit distance algorithms library, featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...
Stars: ✭ 253 (+343.86%)
Mutual labels:  levenshtein

A Go package for calculating the Levenshtein distance between two strings

Release GoDoc  Build Status Coverage Status Go Report Card

This package implements distance and similarity metrics for strings, based on the Levenshtein measure, in Go.

Project Status

v1.2.3 Stable: Guaranteed no breaking changes to the API in future v1.x releases. Probably safe to use in production, though provided on "AS IS" basis.

This package is being actively maintained. If you encounter any problems or have any suggestions for improvement, please open an issue. Pull requests are welcome.

Overview

The Levenshtein Distance between two strings is the minimum total cost of edits that would convert the first string into the second. The allowed edit operations are insertions, deletions, and substitutions, all at character (one UTF-8 code point) level. Each operation has a default cost of 1, but each can be assigned its own cost equal to or greater than 0.

A Distance of 0 means the two strings are identical, and the higher the value the more different the strings. Since in practice we are interested in finding if the two strings are "close enough", it often does not make sense to continue the calculation once the result is mathematically guaranteed to exceed a desired threshold. Providing this value to the Distance function allows it to take a shortcut and return a lower bound instead of an exact cost when the threshold is exceeded.

The Similarity function calculates the distance, then converts it into a normalized metric within the range 0..1, with 1 meaning the strings are identical, and 0 that they have nothing in common. A minimum similarity threshold can be provided to speed up the calculation of the metric for strings that are far too dissimilar for the purpose at hand. All values under this threshold are rounded down to 0.

The Match function provides a similarity metric, with the same range and meaning as Similarity, but with a bonus for string pairs that share a common prefix and have a similarity above a "bonus threshold". It uses the same method as proposed by Winkler for the Jaro distance, and the reasoning behind it is that these string pairs are very likely spelling variations or errors, and they are more closely linked than the edit distance alone would suggest.

The underlying Calculate function is also exported, to allow the building of other derivative metrics, if needed.

Installation

go get github.com/agext/levenshtein

License

Package levenshtein is released under the Apache 2.0 license. See the LICENSE file for details.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].