All Projects → hbollon → Go Edlib

hbollon / Go Edlib

Licence: mit
Golang string comparison and edit distance algorithms library, featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...

Programming Languages

go
31211 projects - #10 most used programming language
golang
3204 projects

Projects that are alternatives of or similar to Go Edlib

Textdistance
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.
Stars: ✭ 2,575 (+917.79%)
Mutual labels:  algorithms, levenshtein
Phobos
The standard library of the D programming language
Stars: ✭ 1,038 (+310.28%)
Mutual labels:  algorithms, unicode
UniObfuscator
Java obfuscator that hides code in comment tags and Unicode garbage by making use of Java's Unicode escapes.
Stars: ✭ 40 (-84.19%)
Mutual labels:  unicode
quran-data
Unicode-encoded Quran data
Stars: ✭ 67 (-73.52%)
Mutual labels:  unicode
uax29
A tokenizer based on Unicode text segmentation (UAX 29), for Go
Stars: ✭ 26 (-89.72%)
Mutual labels:  unicode
unidecode
Elixir package to transliterate Unicode to ASCII
Stars: ✭ 18 (-92.89%)
Mutual labels:  unicode
widestring-rs
A wide string Rust library for converting to and from wide Unicode strings.
Stars: ✭ 48 (-81.03%)
Mutual labels:  unicode
prettype
An easy to use text stylizer for your desktop!
Stars: ✭ 14 (-94.47%)
Mutual labels:  unicode
similar-english-words
Give me a word and I’ll give you an array of words that differ by a single letter.
Stars: ✭ 25 (-90.12%)
Mutual labels:  levenshtein
StringConvert
A simple C++11 based helper for converting string between a various charset
Stars: ✭ 16 (-93.68%)
Mutual labels:  unicode
b2a
btoa and atob support for node.js or old browsers, with the Unicode Problems fixed
Stars: ✭ 21 (-91.7%)
Mutual labels:  unicode
ngx-emoj
A simple, theme-able emoji mart/picker for angular 4+
Stars: ✭ 18 (-92.89%)
Mutual labels:  unicode
contour
Modern C++ Terminal Emulator
Stars: ✭ 761 (+200.79%)
Mutual labels:  unicode
libWinTF8
The library handling things related to UTF-8 and Unicode when you want to port your program to Windows
Stars: ✭ 18 (-92.89%)
Mutual labels:  unicode
Wordle2Townscaper
Wordle2Townscaper is meant to convert Wordle tweets into Townscaper houses using yellow and green building blocks.
Stars: ✭ 64 (-74.7%)
Mutual labels:  unicode
core
The XP Framework is an all-purpose, object oriented PHP framework.
Stars: ✭ 13 (-94.86%)
Mutual labels:  unicode
unicode-emoji-json
Emoji data from unicode.org as easily consumable JSON files.
Stars: ✭ 149 (-41.11%)
Mutual labels:  unicode
umoji
😄 A lib convert emoji unicode to Surrogate pairs
Stars: ✭ 68 (-73.12%)
Mutual labels:  unicode
unicode-c
A C library for handling Unicode, UTF-8, surrogate pairs, etc.
Stars: ✭ 32 (-87.35%)
Mutual labels:  unicode
no-facebook-emoji
Get rid of those ugly emojis now! [stopped working 😢]
Stars: ✭ 15 (-94.07%)
Mutual labels:  unicode

Go-edlib : Edit distance and string comparison library

Travis CI Test coverage Go Report Card License: MIT Documentation link PkgGoDev

Golang string comparison and edit distance algorithms library featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...


Table of Contents


Requirements

  • Go (v1.13+)

Introduction

Golang open-source library which includes most (and soon all) edit-distance and string comparision algorithms with some extra!
Designed to be fully compatible with Unicode characters!
This library is 100% test covered 😁

Features

  • Levenshtein

  • LCS (Longest common subsequence) with edit distance, backtrack and diff functions ✨

  • Hamming

  • Damerau-Levenshtein, with following variants :

    • OSA (Optimal string alignment) ✨
    • Adjacent transpositions ✨
  • Jaro & Jaro-Winkler similarity algorithms ✨

  • Cosine Similarity algorithm to compare strings ✨

  • Computed similarity percentage functions based on all available edit distance algorithms in this lib ✨

  • Fuzzy search functions based on edit distance with unique or multiples strings output ✨

  • Unicode compatibility ! 🥳

  • And many more to come !

Benchmarks

You can check an interactive Google chart with few benchmark cases for all similarity algorithms in this library through StringSimilarity function here

However, if you want or need more details, you can also viewing benchmark raw output here, which also includes memory allocations and test cases output (similarity result and errors).

If you are on Linux and want to run them on your setup, you can run ./tests/benchmark.sh script.

Installation

Open bash into your project folder and run:

go get github.com/hbollon/go-edlib

And import it into your project:

import (
	"github.com/hbollon/go-edlib"
)

Run tests

If you are on Linux and want to run all unit tests just run ./tests/tests.sh script.

For Windows users you can run:

go test ./... # Add desired parameters to this command if you want

Documentation

You can find all the documentation here : Documentation

Examples

Calculate string similarity index between two string

You can use StringSimilarity(str1, str2, algorithm) function. algorithm parameter must one of the following constants:

// Algorithm identifiers
const (
	Levenshtein Algorithm = iota
	DamerauLevenshtein
	OSADamerauLevenshtein
	Lcs
	Hamming
	Jaro
	JaroWinkler
	Cosine
)

Example with levenshtein:

res, err := edlib.StringsSimilarity("string1", "string2", edlib.Levenshtein)
if err != nil {
  fmt.Println(err)
} else {
  fmt.Printf("Similarity: %f", res)
}

Execute fuzzy search based on string similarity algorithm

1. Most matching unique result without threshold

You can use FuzzySearch(str, strList, algorithm) function.

strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearch("testnig", strList, edlib.Levenshtein)
if err != nil {
  fmt.Println(err)
} else {
  fmt.Printf("Result: %s", res)
}

Result: testing 

2. Most matching unique result with threshold

You can use FuzzySearchThreshold(str, strList, minSimilarity, algorithm) function.

strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearchThreshold("testnig", strList, 0.7, edlib.Levenshtein)
if err != nil {
  fmt.Println(err)
} else {
  fmt.Printf("Result for 'testnig': %s", res)
}

res, err = edlib.FuzzySearchThreshold("hello", strList, 0.7, edlib.Levenshtein)
if err != nil {
  fmt.Println(err)
} else {
  fmt.Printf("Result for 'hello': %s", res)
}

Result for 'testnig': testing
Result for 'hello':

3. Most matching result set without threshold

You can use FuzzySearchSet(str, strList, resultQuantity, algorithm) function.

strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearchSet("testnig", strList, 3, edlib.Levenshtein)
if err != nil {
  fmt.Println(err)
} else {
  fmt.Printf("Results: %s", strings.Join(res, ", "))
}

Results: testing, test, tester 

4. Most matching result set with threshold

You can use FuzzySearchSetThreshold(str, strList, resultQuantity, minSimilarity, algorithm) function.

strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearchSetThreshold("testnig", strList, 3, 0.5, edlib.Levenshtein)
if err != nil {
  fmt.Println(err)
} else {
  fmt.Printf("Result for 'testnig' with '0.5' threshold: %s", strings.Join(res, " "))
}

res, err = edlib.FuzzySearchSetThreshold("testnig", strList, 3, 0.7, edlib.Levenshtein)
if err != nil {
  fmt.Println(err)
} else {
  fmt.Printf("Result for 'testnig' with '0.7' threshold: %s", strings.Join(res, " "))
}

Result for 'testnig' with '0.5' threshold: testing test tester
Result for 'testnig' with '0.7' threshold: testing

Get raw edit distance (Levenshtein, LCS, Damerau–Levenshtein, Hamming)

You can use one of the following function to get an edit distance between two strings :

Example with Levenshtein distance:

res := edlib.LevenshteinDistance("kitten", "sitting")
fmt.Printf("Result: %d", res)
Result: 3

LCS, LCS Backtrack and LCS Diff

1. Compute LCS(Longuest Common Subsequence) between two strings

You can use LCS(str1, str2) function.

lcs := edlib.LCS("ABCD", "ACBAD")
fmt.Printf("Length of their LCS: %d", lcs)
Length of their LCS: 3

2. Backtrack their LCS

You can use LCSBacktrack(str1, str2) function.

res, err := edlib.LCSBacktrack("ABCD", "ACBAD")
if err != nil {
  fmt.Println(err)
} else {
  fmt.Printf("LCS: %s", res)
}
LCS: ABD

3. Backtrack all their LCS

You can use LCSBacktrackAll(str1, str2) function.

res, err := edlib.LCSBacktrackAll("ABCD", "ACBAD")
if err != nil {
  fmt.Println(err)
} else {
  fmt.Printf("LCS: %s", strings.Join(res, ", "))
}
LCS: ABD, ACD

4. Get LCS Diff between two strings

You can use LCSDiff(str1, str2) function.

res, err := edlib.LCSDiff("computer", "houseboat")
if err != nil {
  fmt.Println(err)
} else {
  fmt.Printf("LCS: \n%s\n%s", res[0], res[1])
}
LCS Diff: 
 h c o m p u s e b o a t e r
 + -   - -   + + + + +   - -

Author

👤 Hugo Bollon

🤝 Contributing

Contributions, issues and feature requests are welcome!
Feel free to check issues page.

Show your support

Give a ⭐️ if this project helped you!

📝 License

Copyright © 2020 Hugo Bollon.
This project is MIT License licensed.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].